首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于频域卷积信号盲源分离的乐曲数据库构建*   总被引:1,自引:1,他引:0  
将通过频域卷积信号盲源分离算法从MP3歌曲音频信号中分离出人声主唱信号,再从人声主唱信号中提取出能够表征歌曲的旋律特征构建哼唱检索系统的歌曲数据库。盲源分离要求观测信号数目不小于源信号数目,因此先用小波多分辨率分析构造一路观测信号,再用频域独立成分分析(FDICA)实现MP3歌曲音频信号的盲源分离(BSS)。实验证明,采用FDICA-based BSS从歌曲MP3中分离出的人声主唱信号的旋律特征与待检索的人声哼唱信号的旋律特征有较高的相似度,可以用歌曲MP3构建哼唱检索系统的歌曲数据库。  相似文献   

2.
This paper reports the first results of an innovative approach to modelling music cognition based on the emergent behaviour of interacting autonomous systems. A group of interactive autonomous singing robots were programmed to develop a shared repertoire of songs from scratch, after a period of spontaneous creations, adjustments and memory reinforcements. The robots interact with each other by means of vocal-like sounds. They use real sounds as opposed to software simulation. They are furnished with a physical model of the vocal tract, which synthesises vocal singing-like intonations, and a listening mechanism, which extracts pitch sequences from audio signals. The robots learn to imitate each other by babbling heard intonation patterns in order to evolve vectors of motor control parameters to synthesise the imitations. Models of the basic mechanisms underlying the emergence of songs are of great interest for musicians looking for hitherto unexplored ways to create music with interactive machines.  相似文献   

3.
This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such representation is useful for music information retrieval systems. The main problem in modeling the characteristics of a singing voice is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection. The former makes it possible to calculate feature vectors that represent a spectral envelope of a singing voice after reducing accompaniment sounds. It first extracts the harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by these components. The latter method then estimates the reliability of frame of the obtained melody (i.e., the influence of accompaniment sound) by using two Gaussian mixture models (GMMs) for vocal and nonvocal frames to select the reliable vocal portions of musical pieces. Finally, each song is represented by its GMM consisting of the reliable frames. This new representation of the singing voice is demonstrated to improve the performance of an automatic singer identification system and to achieve an MIR system based on vocal timbre similarity.   相似文献   

4.
Recently,many online Karaoke(KTV)platforms have been released,where music lovers sing songs on these platforms.In the meantime,the system automatically evaluates user proficiency according to their singing behavior.Recommending approximate songs to users can initialize singers5 participation and improve users,loyalty to these platforms.However,this is not an easy task due to the unique characteristics of these platforms.First,since users may be not achieving high scores evaluated by the system on their favorite songs,how to balance user preferences with user proficiency on singing for song recommendation is still open.Second,the sparsity of the user-song interaction behavior may greatly impact the recommendation task.To solve the above two challenges,in this paper,we propose an informationfused song recommendation model by considering the unique characteristics of the singing data.Specifically,we first devise a pseudo-rating matrix by combing users’singing behavior and the system evaluations,thus users'preferences and proficiency are leveraged.Then we mitigate the data sparsity problem by fusing users*and songs'rich information in the matrix factorization process of the pseudo-rating matrix.Finally,extensive experimental results on a real-world dataset show the effectiveness of our proposed model.  相似文献   

5.
Among passerines, Bengali finches are known to sing extremely complex courtship songs with three hierarchical structures: namely, the element, the chunk, and the syntax. In this work, we theoretically studied the mechanism of the song of Bengali finches in aides to provide a dynamic view of the development of birdsong learning. We first constructed a model of the Elman network with chaotic neurons that successfully learned the supervisor signal defined by a simple finite-state syntax. Second, we focused on the process of individual-specific increases in the complexity of song syntax. We propose a new learning algorithm to produce the intrinsic diversification of song syntax without a supervisor on the basis of the itinerant dynamics of chaotic neural networks and the Hebbian learning rule. The emergence of novel syntax modifying the acquired syntax is demonstrated. This work was presented in part at the 11th International Symposium on Artificial Life and Robotics, Oita, Japan, January 23–25, 2006  相似文献   

6.
Imitation is a powerful mechanism whereby knowledge may be transferred between agents (both biological and artificial). Key problems on the topic of imitation have emerged in various areas close to artificial intelligence, including the cognitive and social sciences, animal behavior, robotics, human-computer interaction, embodied intelligence, software engineering, programming by example and machine learning. Artificial systems used to study imitation can both test models of imitation derived from observational or neurobiological data on imitation in animals and then apply them to different kinds of nonbiological systems ranging from robots to software agents. A crucial problem in imitation is the correspondence problem, mapping action sequences of the demonstrator and the imitator agent. This problem becomes particularly obvious when the two agents do not share the same embodiment and affordances. This paper describes a new general imitation mechanism called ALICE (action learning for imitation via correspondence between embodiments) that specifically addresses the correspondence problem. The mechanism is implemented and its efficacy illustrated on the "chessworld" testbed that was created to study imitation from an agent-based perspective, i.e., by a particular agent in a particular environment.  相似文献   

7.
Vibrato is a slightly tremulous effect imparted to vocal or instrumental tone for added warmth and expressiveness through slight variation in pitch. It corresponds to a periodic fluctuation of the fundamental frequency. It is common for a singer to develop a vibrato function to personalize his/her singing style. In this paper, we explore the acoustic features that reflect vibrato information in order to identify singers of popular music. We start with an enhanced vocal detection method that allows us to select vocal segments with high confidence. From the selected vocal segments, the cepstral coefficients which reflect the vibrato characteristics are computed. These coefficients are derived using bandpass filters, such as parabolic and cascaded bandpass filters, spread according to the octave frequency scale. The strategy of our classifier formulation is to utilize the high level musical knowledge of song structure in singer modeling. Singer identification is validated on a database containing 84 popular songs from commercially available CD recordings from 12 singers. We achieve an average error rate of 16.2% in segment level identification  相似文献   

8.
Imitation is an important learning mechanism of widespread utility and common occurrence. This article presents a theory and working computational model of the detailed mechanisms of imitation. The model is in the restricted domain of the learning of pencil and paper procedures. The task that is modelled is of a teacher demonstrating the steps of a procedure, such as long division to a student by means of one or more examples. Such a task can be learned by an imitation-learning mechanism, but the mechanism has a much wider range of application. Imitation is treated as a four-stage process: the events performed by the teacher are segmented by the learner; the events are encoded and explained in terms of spatial relations between objects; repeated patterns in the events are recognized; and finally, different examples are merged together. This model is implemented as a computer program learning algorithms from worked examples (LAWE).  相似文献   

9.
《Advanced Robotics》2013,27(7):647-661
Various ways of vocal sound production are being actively studied. We are constructing a phonetic machine with a vocal chord and a vocal tract based on mechatronics technology. Mechanical construction of a human vocal system is considered to generate natural voice so that it can be advantageously applied to singing voice production. In voice generation, analysis and mechanical realization of the behaviors of the vocal chords and vocal tract are required. Furthermore, the fluid mechanical system is less stable, thus making control more difficult. Several motors are employed to manipulate the mechanical vocal system. Mappings between motor positions and the produced vocal sounds are automatically established in the learning phase. In the singing performance, the system is able to sing while vocal pitches and phonemes are adaptively controlled by an auditory feedback process. This paper presents the latest mechanisms of our mechanical vocal system together with adaptive tuning algorithms of the physical mechanism with an auditory system.  相似文献   

10.
We present an approach to music identification based on weighted finite-state transducers and Gaussian mixture models, inspired by techniques used in large-vocabulary speech recognition. Our modeling approach is based on learning a set of elementary music sounds in a fully unsupervised manner. While the space of possible music sound sequences is very large, our method enables the construction of a compact and efficient representation for the song collection using finite-state transducers. This paper gives a novel and substantially faster algorithm for the construction of factor transducers, the key representation of song snippets supporting our music identification technique. The complexity of our algorithm is linear with respect to the size of the suffix automaton constructed. Our experiments further show that it helps speed up the construction of the weighted suffix automaton in our task by a factor of 17 with respect to our previous method using the intermediate steps of determinization and minimization. We show that, using these techniques, a large-scale music identification system can be constructed for a database of over 15 000 songs while achieving an identification accuracy of 99.4% on undistorted test data, and performing robustly in the presence of noise and distortions.   相似文献   

11.
模仿学习是机器人仿生机制研究的主要内容之一,即通过观察、理解、学习、模仿示教行为实现机器人的仿生特性。基于高斯过程分别表达采集离散示教信号所构成的示教轨迹和含有未知参数策略的模仿轨迹,构建模仿学习方法框架,将概率模型匹配引入到模仿学习中,以KL散度为代价函数比较两种轨迹的概率分布,运用梯度下降法寻求使KL散度最小的最优模仿控制策略,将策略应用于模仿机器人以完成与示教相同的模仿任务。以关节型机器人的机械臂摆动行为模仿为学习任务进行仿真,结果表明基于概率轨迹匹配的模仿学习方法能够实现机械臂摆动行为模仿,学习过程较传统方法简易且学习效果较好。  相似文献   

12.
Music and songs are integral parts of Bollywood movies. Every movie of two to three hours, contains three to ten songs, each song is 3–10 min long. Music lovers like to listen music and songs of a movie, however it is time consuming and error prone to search manually all the songs in a movie. Moreover, the task becomes much harder when songs are to be extracted from a huge archived movies’ database containing hundreds of movies. This paper presents an approach to automatically extract music and songs from archived musical movies. We used song grammar to construct Markov Chain Model that differentiates song scenes from dialogue and action scenes in a movie. We tested our system on Bollywood, Hollywood, Pakistani, Bengali, and Tamil movies. A total of 20 movies from different industries were selected for the experiments. On Bollywood movies, we achieved 97.22% recall in song extraction, whereas the recall on Hollywood musical movies is 80%. The test result on Pakistani, Tamil and Bengali movies is 87.09%.  相似文献   

13.
This research is aimed to devise an anthropomorphic robotic head with a human-like face and a sheet of artificial skin that can read a randomly composed simplified musical notation and sing the corresponding content of the song once. The face robot is composed of an artificial facial skin that can express a number of facial expressions via motions driven by internal servo motors. Two cameras, each of them installed inside each eyeball of the face, provide vision capability for reading simplified musical notations. Computer vision techniques are subsequently used to interpret simplified musical notations and lyrics of their corresponding songs. Voice synthesis techniques are implemented to enable the face robot to sing songs by enunciating synthesized sounds. Mouth patterns of the face robot will be automatically changed to match the emotions corresponding to the lyrics of the songs. The experiments show that the face robot can successfully read and then accurately sing a song which is assigned discriminately.  相似文献   

14.
In order to have a robotic system able to effectively learn by imitation and not merely reproduce the movements of a human teacher, the system should have the capability to deeply understand the perceived actions to be imitated. This paper deals with the development of a cognitive architecture for learning by imitation in which a rich conceptual representation of the observed actions is built. The purpose of the following discussion is to show how the same conceptual representation can be used both in a bottom-up approach, in order to learn sequences of actions by imitation learning paradigm, and in a top-down approach, in order to anchor the symbolical representations to the perceptual activities of the robotic system. Experiments concerned with the problem of teaching a humanoid robotic system simple manipulative tasks are reported.  相似文献   

15.
仿人智能控制是一种基于知识的智能控制 ,模拟了人的控制经验与技巧。根据以前的研究成果[1] [2 ] [3 ] ,文中讨论了仿人智能控制在线学习的必要性和学习内容 ,提出了一种新颖的仿人智能推理与控制在线学习模型 ,为仿人智能控制系统的设计提供了理论基础。  相似文献   

16.
模仿学习一直是人工智能领域的研究热点。模仿学习是一种基于专家示教重建期望策略的方法。近年来,在理论研究中,此方法和强化学习等方法结合,已经取得了重要成果;在实际应用中,尤其是在机器人和其他智能体的复杂环境中,模仿学习取得了很好的效果。主要阐述了模仿学习在机器人学领域的研究与运用。介绍了和模仿学习相关的理论知识;研究了模仿学习的两类主要方法:行为克隆学习方法和逆强化学习方法;对模仿学习的成功应用进行总结;最后,给出当前面对的问题和挑战并且展望未来发展趋势。  相似文献   

17.
We propose an efficient automata-based approach to extract behavioral units and rules from continuous sequential data of animal behavior. By introducing novel extensions, we integrate two elemental methods—the N-gram model and Angluin’s machine learning algorithm into an ethological data mining framework. This allows us to obtain the minimized automaton-representation of behavioral rules that accept (or generate) the smallest set of possible behavioral patterns from sequential data of animal behavior. With this method, we demonstrate how the ethological data mining works using real birdsong data; we use the Bengalese finch song and perform experimental evaluations of this method using artificial birdsong data generated by a computer program. These results suggest that our ethological data mining works effectively even for noisy behavioral data by appropriately setting the parameters that we introduce. In addition, we demonstrate a case study using the Bengalese finch song, showing that our method successfully grasps the core structure of the singing behavior such as loops and branches. Yasuki Kakishita and Kazutoshi Sasahara have contributed equally to this work.  相似文献   

18.
针对业余歌手模仿专业歌手唱歌过程中音色不变的问题,提出一种基于高斯混合模型(GMM)的中文歌曲Morphing算法,采用GMM对语音频谱建模,并通过混合业余歌手和专业歌手的语音频谱,实现歌曲的音色转换。结果显示,混合比例因子k=0或1时,ABX测试正确率均为100%,0相似文献   

19.
Active identification methods for the temporal characteristics of objects in information and control systems are reviewed. A comparative analysis of the efficiency of binary signals, i.e., pseudo-random and regular sequences of rectangular pulses in this identification problem is performed. The above analysis covers constraints imposed on the amplitude of a reference signal and on the variance of the output’s useful component. And finally, recommendations on efficient signal choice depending on the specific conditions of identification are given.  相似文献   

20.
To structure a music education program in Canada based on the Hungarian model, Bartók's grammatical principle was adopted to define the musical characteristics of British-Canadian children's traditional songs. A computer-aided methodology analyzed a sample of singing games from a personal collection to identify 4-phrase variants, and extracted and tabulated by phrasal position their common phrase patterns of equal length. Future programs will complete the phrase analysis and analyze other musical characteristics, grouping variants successively to determine the styles of the entire collection.Ann Osborn has studied in Canada, Hungary and the United States. After six years at the University of Western Ontario, she was recently appointed associate professor at Lakehead University.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号