首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such representation is useful for music information retrieval systems. The main problem in modeling the characteristics of a singing voice is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection. The former makes it possible to calculate feature vectors that represent a spectral envelope of a singing voice after reducing accompaniment sounds. It first extracts the harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by these components. The latter method then estimates the reliability of frame of the obtained melody (i.e., the influence of accompaniment sound) by using two Gaussian mixture models (GMMs) for vocal and nonvocal frames to select the reliable vocal portions of musical pieces. Finally, each song is represented by its GMM consisting of the reliable frames. This new representation of the singing voice is demonstrated to improve the performance of an automatic singer identification system and to achieve an MIR system based on vocal timbre similarity.   相似文献   

2.
基于频域卷积信号盲源分离的乐曲数据库构建*   总被引:1,自引:1,他引:0  
将通过频域卷积信号盲源分离算法从MP3歌曲音频信号中分离出人声主唱信号,再从人声主唱信号中提取出能够表征歌曲的旋律特征构建哼唱检索系统的歌曲数据库。盲源分离要求观测信号数目不小于源信号数目,因此先用小波多分辨率分析构造一路观测信号,再用频域独立成分分析(FDICA)实现MP3歌曲音频信号的盲源分离(BSS)。实验证明,采用FDICA-based BSS从歌曲MP3中分离出的人声主唱信号的旋律特征与待检索的人声哼唱信号的旋律特征有较高的相似度,可以用歌曲MP3构建哼唱检索系统的歌曲数据库。  相似文献   

3.
通过研究哼唱旋律基频提取和检索算法,给出了一个完整的基于哼唱的音乐检索系统框架。系统主要分析了旋律特征提取和近似旋律匹配部分。旋律特征提取部分采用基于差分Mel倒谱法求基频;旋律匹配部分对经典的动态时间弯折算法原理分析后,根据声音特征引入音长差序列的余弦相似度,提高了检索效率和精度。在340首MIDI歌曲的测试集上,前三位识别效率提高3.7%,用时降低16%,系统的性能有明显改善。  相似文献   

4.
Musical scores are traditionally retrieved by title, composer or subject classification. Just as multimedia computer systems increase the range of opportunities available for presenting musical information, so they also offer new ways of posing musically-oriented queries. This paper shows how scores can be retrieved from a database on the basis of a few notes sung or hummed into a microphone. The design of such a facility raises several interesting issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements for ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input and evaluate its performance.  相似文献   

5.
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.  相似文献   

6.
Since adding background music and sound effects even to short animations is not simple, an automatic music generation system would help improve the total quality of computer generated animations. This paper describes a prototype system which automatically generates background music and sound effects for existing animations. The inputs to the system are music parameters (mood types and musical motifs) and motion parameters for individual scenes of an animation. Music is generated for each scene. The key for a scene is determined by considering the mood type and its degree, and the key of the previous scene. The melody for a scene is generated from the given motifs and the chord progression for the scene which is determined according to appropriate rules. The harmony accompaniment for a scene is selected based on the mood type. The rhythm accompaniment for a scene is selected based on the mood type and tempo. The sound effects for motions are determined according to the characteristics and intensity of the motions. Both the background music and sound effects are generated so that the transitions between scenes are smooth.  相似文献   

7.
We consider representing a short temporal fragment of musical audio as a dynamic texture, a model of both the timbral and rhythmical qualities of sound, two of the important aspects required for automatic music analysis. The dynamic texture model treats a sequence of audio feature vectors as a sample from a linear dynamical system. We apply this new representation to the task of automatic song segmentation. In particular, we cluster audio fragments, extracted from a song, as samples from a dynamic texture mixture (DTM) model. We show that the DTM model can both accurately cluster coherent segments in music and detect transition boundaries. Moreover, the generative character of the proposed model of music makes it amenable for a wide range of applications besides segmentation. As examples, we use DTM models of songs to suggest possible improvements in other music information retrieval applications such as music annotation and similarity.   相似文献   

8.
This paper proposes a novel cyclic interface for browsing through a song database. The method, which sums multiple audio streams on a server and broadcasts only a single summed stream, allows the user to hear different parts of each audio stream by cycling through all available streams. Songs are summed into a single stream based on a combination of spectral entropy and local power of each song's waveform. Perceptual parameters of the system are determined based on experiments conducted on 20 users, for three, four, and five songs. Results illustrate that the proposed methodology requires less listening time as compared to traditional list-based interfaces when the desired audio clip is among one of the audio streams. Applications of this methodology include any search system which returns multiple audio search results, including music query by example. The proposed methodology can be used for real-time searching with an ordinary internet browser.   相似文献   

9.
一种新的近似旋律匹配方法及其在哼唱检索系统中的应用   总被引:20,自引:0,他引:20  
提出了一种近似旋律匹配(approximate meltlody matching)的新方法——线性对齐匹配法,并在此基础上实现了一个哼唱检索(queryby humming)系统原型.与已有的基于内容的音乐检索(content-based music retrieval)不同,该算法并非基于近似符号串匹配、统计模型或者特征空间,而是根据相近旋律的音高轮廓在几何上的相似性,将音高和节奏特征一并考虑所设计而成的全新算法.通过实验检验该算法的有效性,在含有3864首乐曲的搜索空间中,检索62段人声哼唱,线性对齐匹配法取得了90.3%的前3位命中率,相比传统的近似符号匹配算法高出11%以上.这一实验结果有力地表明了线性对齐匹配法的有效性,及其应用于大型数字音乐检索引擎的可行性.  相似文献   

10.
11.
A method for analyzing and categorizing the vowels of a sung query is described and analyzed. This query system uses a combination of spectral analysis and parametric clustering techniques to divide a single query into different vowel regions. The method is applied separately to each query, so no training or repeated measures are necessary. The vowel regions are then transformed into strings and string search methods are used to compare the results from various songs. We apply this method to a small pilot study consisting of 40 sung queries from each of 7 songs. Approximately 60% of the queries are correctly identified with their corresponding song, using only the vowel stream as the identifier.  相似文献   

12.
Audio thumbnailing of popular music using chroma-based representations   总被引:1,自引:0,他引:1  
With the growing prevalence of large databases of multimedia content, methods for facilitating rapid browsing of such databases or the results of a database search are becoming increasingly important. However, these methods are necessarily media dependent. We present a system for producing short, representative samples (or "audio thumbnails") of selections of popular music. The system searches for structural redundancy within a given song with the aim of identifying something like a chorus or refrain. To isolate a useful class of features for performing such structure-based pattern recognition, we present a development of the chromagram, a variation on traditional time-frequency distributions that seeks to represent the cyclic attribute of pitch perception, known as chroma. The pattern recognition system itself employs a quantized chromagram that represents the spectral energy at each of the 12 pitch classes. We evaluate the system on a database of popular music and score its performance against a set of "ideal" thumbnail locations. Overall performance is found to be quite good, with the majority of errors resulting from songs that do not meet our structural assumptions.  相似文献   

13.
14.
This paper focuses on the modeling of musical melodies as networks. Notes of a melody can be treated as nodes of a network. Connections are created whenever notes are played in sequence. We analyze some main tracks coming from different music genres, with melodies played using different musical instruments. We find out that the considered networks are, in general, scale free networks and exhibit the small world property. We measure the main metrics and assess whether these networks can be considered as formed by sub-communities. Outcomes confirm that peculiar features of the tracks can be extracted from this analysis methodology. This approach can have an impact in several multimedia applications such as music didactics, multimedia entertainment, and digital music generation.  相似文献   

15.
We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner ADABOOST to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our classifier proved to be the most effective method for genre classification at the recent MIREX 2005 international contests in music information extraction, and the second-best method for recognizing artists. This paper describes our method in detail, from feature extraction to song classification, and presents an evaluation of our method on three genre databases and two artist-recognition databases. Furthermore, we present evidence collected from a variety of popular features and classifiers that the technique of classifying features aggregated over segments of audio is better than classifying either entire songs or individual short-timescale features. Editor: Gerhard Widmer  相似文献   

16.
基于音乐旋律轮廓的特征提取算法   总被引:1,自引:0,他引:1  
提出了一种基于音乐旋律轮廓的特征提取算法。该算法从哼唱片断中提取出歌曲基音序列,经规整、合并、分段后转化为旋律轮廓序列,然后使用标准音调生成的标准音调差值表将此序列转化为旋律轮廓特征。结果表明,该系统对环境噪声有较好的鲁棒性;在含有405首歌曲的搜索空间中,检索前5位成功率超过90%。  相似文献   

17.
基于分层次聚类的MIDI音乐主旋律提取方法   总被引:3,自引:1,他引:2       下载免费PDF全文
为了准确提取多音轨MIDI主旋律,同时减小主旋律分布在乐器音轨或音高较弱部分所产生的提取误差,提出了基于分层次聚类的多音轨MIDI主旋律提取方法。首先解析MIDI音乐文件,然后去除每一音轨中的控制音符和不包含旋律信息的音轨,通过归并到文该文件中的具有音高柱状图特征的音符集,从而提取出主旋律。通过与人工标识结果的实验进行比较,表明该提取主旋律方法的准确性。  相似文献   

18.
We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.  相似文献   

19.
Music and songs are integral parts of Bollywood movies. Every movie of two to three hours, contains three to ten songs, each song is 3–10 min long. Music lovers like to listen music and songs of a movie, however it is time consuming and error prone to search manually all the songs in a movie. Moreover, the task becomes much harder when songs are to be extracted from a huge archived movies’ database containing hundreds of movies. This paper presents an approach to automatically extract music and songs from archived musical movies. We used song grammar to construct Markov Chain Model that differentiates song scenes from dialogue and action scenes in a movie. We tested our system on Bollywood, Hollywood, Pakistani, Bengali, and Tamil movies. A total of 20 movies from different industries were selected for the experiments. On Bollywood movies, we achieved 97.22% recall in song extraction, whereas the recall on Hollywood musical movies is 80%. The test result on Pakistani, Tamil and Bengali movies is 87.09%.  相似文献   

20.
In this paper, a new rhythm based game for tutored music learning is presented. The main differences with similar existing systems are: i) songs can be automatically extracted from any music file or printed score; ii) it works with multiple interfaces, ranging from any MIDI controller to most popular game controllers; iii) note sequences are obtained from the melody itself rather than from time features alone. The whole system has been successfully tested for different songs using different combinations of music instances and game controllers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号