共查询到20条相似文献,搜索用时 46 毫秒
1.
Fujihara H. Goto M. Kitahara T. Okuno H. G. 《IEEE transactions on audio, speech, and language processing》2010,18(3):638-648
2.
基于频域卷积信号盲源分离的乐曲数据库构建* 总被引:1,自引:1,他引:0
将通过频域卷积信号盲源分离算法从MP3歌曲音频信号中分离出人声主唱信号,再从人声主唱信号中提取出能够表征歌曲的旋律特征构建哼唱检索系统的歌曲数据库。盲源分离要求观测信号数目不小于源信号数目,因此先用小波多分辨率分析构造一路观测信号,再用频域独立成分分析(FDICA)实现MP3歌曲音频信号的盲源分离(BSS)。实验证明,采用FDICA-based BSS从歌曲MP3中分离出的人声主唱信号的旋律特征与待检索的人声哼唱信号的旋律特征有较高的相似度,可以用歌曲MP3构建哼唱检索系统的歌曲数据库。 相似文献
3.
通过研究哼唱旋律基频提取和检索算法,给出了一个完整的基于哼唱的音乐检索系统框架。系统主要分析了旋律特征提取和近似旋律匹配部分。旋律特征提取部分采用基于差分Mel倒谱法求基频;旋律匹配部分对经典的动态时间弯折算法原理分析后,根据声音特征引入音长差序列的余弦相似度,提高了检索效率和精度。在340首MIDI歌曲的测试集上,前三位识别效率提高3.7%,用时降低16%,系统的性能有明显改善。 相似文献
4.
McNab Rodger J. Smith Lloyd A. Witten Ian H. Henderson Clare L. 《Multimedia Tools and Applications》2000,10(2-3):113-132
Musical scores are traditionally retrieved by title, composer or subject classification. Just as multimedia computer systems increase the range of opportunities available for presenting musical information, so they also offer new ways of posing musically-oriented queries. This paper shows how scores can be retrieved from a database on the basis of a few notes sung or hummed into a microphone. The design of such a facility raises several interesting issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements for ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input and evaluate its performance. 相似文献
5.
Turnbull D. Barrington L. Torres D. Lanckriet G. 《IEEE transactions on audio, speech, and language processing》2008,16(2):467-476
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects. 相似文献
6.
Jun-Ichi Nakamura Tetsuya Kaku Kyungsil Hyun Tsukasa Noma Sho Yoshida 《Computer Animation and Virtual Worlds》1994,5(4):247-264
Since adding background music and sound effects even to short animations is not simple, an automatic music generation system would help improve the total quality of computer generated animations. This paper describes a prototype system which automatically generates background music and sound effects for existing animations. The inputs to the system are music parameters (mood types and musical motifs) and motion parameters for individual scenes of an animation. Music is generated for each scene. The key for a scene is determined by considering the mood type and its degree, and the key of the previous scene. The melody for a scene is generated from the given motifs and the chord progression for the scene which is determined according to appropriate rules. The harmony accompaniment for a scene is selected based on the mood type. The rhythm accompaniment for a scene is selected based on the mood type and tempo. The sound effects for motions are determined according to the characteristics and intensity of the motions. Both the background music and sound effects are generated so that the transitions between scenes are smooth. 相似文献
7.
Barrington L. Chan A. B. Lanckriet G. 《IEEE transactions on audio, speech, and language processing》2010,18(3):602-612
8.
9.
一种新的近似旋律匹配方法及其在哼唱检索系统中的应用 总被引:20,自引:0,他引:20
提出了一种近似旋律匹配(approximate meltlody matching)的新方法——线性对齐匹配法,并在此基础上实现了一个哼唱检索(queryby humming)系统原型.与已有的基于内容的音乐检索(content-based music retrieval)不同,该算法并非基于近似符号串匹配、统计模型或者特征空间,而是根据相近旋律的音高轮廓在几何上的相似性,将音高和节奏特征一并考虑所设计而成的全新算法.通过实验检验该算法的有效性,在含有3864首乐曲的搜索空间中,检索62段人声哼唱,线性对齐匹配法取得了90.3%的前3位命中率,相比传统的近似符号匹配算法高出11%以上.这一实验结果有力地表明了线性对齐匹配法的有效性,及其应用于大型数字音乐检索引擎的可行性. 相似文献
10.
11.
Maureen Mellody Mark A. Bartsch Gregory H. Wakefield 《Journal of Intelligent Information Systems》2003,21(1):35-52
A method for analyzing and categorizing the vowels of a sung query is described and analyzed. This query system uses a combination of spectral analysis and parametric clustering techniques to divide a single query into different vowel regions. The method is applied separately to each query, so no training or repeated measures are necessary. The vowel regions are then transformed into strings and string search methods are used to compare the results from various songs. We apply this method to a small pilot study consisting of 40 sung queries from each of 7 songs. Approximately 60% of the queries are correctly identified with their corresponding song, using only the vowel stream as the identifier. 相似文献
12.
With the growing prevalence of large databases of multimedia content, methods for facilitating rapid browsing of such databases or the results of a database search are becoming increasingly important. However, these methods are necessarily media dependent. We present a system for producing short, representative samples (or "audio thumbnails") of selections of popular music. The system searches for structural redundancy within a given song with the aim of identifying something like a chorus or refrain. To isolate a useful class of features for performing such structure-based pattern recognition, we present a development of the chromagram, a variation on traditional time-frequency distributions that seeks to represent the cyclic attribute of pitch perception, known as chroma. The pattern recognition system itself employs a quantized chromagram that represents the spectral energy at each of the 12 pitch classes. We evaluate the system on a database of popular music and score its performance against a set of "ideal" thumbnail locations. Overall performance is found to be quite good, with the majority of errors resulting from songs that do not meet our structural assumptions. 相似文献
13.
14.
Stefano Ferretti 《Multimedia Tools and Applications》2018,77(13):16003-16029
This paper focuses on the modeling of musical melodies as networks. Notes of a melody can be treated as nodes of a network. Connections are created whenever notes are played in sequence. We analyze some main tracks coming from different music genres, with melodies played using different musical instruments. We find out that the considered networks are, in general, scale free networks and exhibit the small world property. We measure the main metrics and assess whether these networks can be considered as formed by sub-communities. Outcomes confirm that peculiar features of the tracks can be extracted from this analysis methodology. This approach can have an impact in several multimedia applications such as music didactics, multimedia entertainment, and digital music generation. 相似文献
15.
We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner
ADABOOST to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our classifier
proved to be the most effective method for genre classification at the recent MIREX 2005 international contests in music information
extraction, and the second-best method for recognizing artists. This paper describes our method in detail, from feature extraction
to song classification, and presents an evaluation of our method on three genre databases and two artist-recognition databases.
Furthermore, we present evidence collected from a variety of popular features and classifiers that the technique of classifying
features aggregated over segments of audio is better than classifying either entire songs or individual short-timescale features.
Editor: Gerhard Widmer 相似文献
16.
基于音乐旋律轮廓的特征提取算法 总被引:1,自引:0,他引:1
提出了一种基于音乐旋律轮廓的特征提取算法。该算法从哼唱片断中提取出歌曲基音序列,经规整、合并、分段后转化为旋律轮廓序列,然后使用标准音调生成的标准音调差值表将此序列转化为旋律轮廓特征。结果表明,该系统对环境噪声有较好的鲁棒性;在含有405首歌曲的搜索空间中,检索前5位成功率超过90%。 相似文献
17.
为了准确提取多音轨MIDI主旋律,同时减小主旋律分布在乐器音轨或音高较弱部分所产生的提取误差,提出了基于分层次聚类的多音轨MIDI主旋律提取方法。首先解析MIDI音乐文件,然后去除每一音轨中的控制音符和不包含旋律信息的音轨,通过归并到文该文件中的具有音高柱状图特征的音符集,从而提取出主旋律。通过与人工标识结果的实验进行比较,表明该提取主旋律方法的准确性。 相似文献
18.
Casey M. Rhodes C. Slaney M. 《IEEE transactions on audio, speech, and language processing》2008,16(5):1015-1028
We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles. 相似文献
19.
Sher Muhammad Doudpota 《Multimedia Tools and Applications》2014,69(2):359-382
Music and songs are integral parts of Bollywood movies. Every movie of two to three hours, contains three to ten songs, each song is 3–10 min long. Music lovers like to listen music and songs of a movie, however it is time consuming and error prone to search manually all the songs in a movie. Moreover, the task becomes much harder when songs are to be extracted from a huge archived movies’ database containing hundreds of movies. This paper presents an approach to automatically extract music and songs from archived musical movies. We used song grammar to construct Markov Chain Model that differentiates song scenes from dialogue and action scenes in a movie. We tested our system on Bollywood, Hollywood, Pakistani, Bengali, and Tamil movies. A total of 20 movies from different industries were selected for the experiments. On Bollywood movies, we achieved 97.22% recall in song extraction, whereas the recall on Hollywood musical movies is 80%. The test result on Pakistani, Tamil and Bengali movies is 87.09%. 相似文献
20.
In this paper, a new rhythm based game for tutored music learning is presented. The main differences with similar existing systems are: i) songs can be automatically extracted from any music file or printed score; ii) it works with multiple interfaces, ranging from any MIDI controller to most popular game controllers; iii) note sequences are obtained from the melody itself rather than from time features alone. The whole system has been successfully tested for different songs using different combinations of music instances and game controllers. 相似文献