首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Content-based audio classification and retrieval by support vector machines   总被引:11,自引:0,他引:11  
Support vector machines (SVMs) have been recently proposed as a new learning algorithm for pattern recognition. In this paper, the SVMs with a binary tree recognition strategy are used to tackle the audio classification problem. We illustrate the potential of SVMs on a common audio database, which consists of 409 sounds of 16 classes. We compare the SVMs based classification with other popular approaches. For audio retrieval, we propose a new metric, called distance-from-boundary (DFB). When a query audio is given, the system first finds a boundary inside which the query pattern is located. Then, all the audio patterns in the database are sorted by their distances to this boundary. All boundaries are learned by the SVMs and stored together with the audio database. Experimental comparisons for audio retrieval are presented to show the superiority of this novel metric to other similarity measures.  相似文献   

2.
Querying polyphonic music from a large data collection is an interesting topic. Recently, researchers have attempted to provide efficient methods for content-based retrieval in polyphonic music databases where queries are polyphonic. However, most of them do not work well for similarity search, which is important to many applications. In this paper, we propose three polyphonic representations with the associated similarity measures and a novel method to retrieve k music works that contain segments most similar to the query. In general, most of the index-based methods for similarity search generate all the possible answers to the query and then perform exact matching on the index for each possible answer. Based on the edit distance, our method generates only a few possible answers by performing the deletion and/or replacement operations on the query. Each possible answer is then used to perform exact matching on a list-based index, which allows the insertion operations to be performed. For each possible answer, its edit distance to the query is regarded as a lower bound of the edit distances between the matched results and the query. Based on the kNN results that match a possible answer, the possible answers that cannot provide better results are skipped. By using this mechanism, we design a method for efficient kNN search in polyphonic music databases. The experimental results show that our method outperforms the previous methods in efficiency. We also evaluate the effectiveness of our method by showing the search results to the musician and nonmusician user groups. The experimental results provide useful guidelines on the design of a polyphonic music database.  相似文献   

3.
This paper presents a tunable content-based music retrieval (CBMR) system suitable the for retrieval of music audio clips. The audio clips are represented as extracted feature vectors. The CBMR system is expert-tunable by altering the feature space. The feature space is tuned according to the expert-specified similarity criteria expressed in terms of clusters of similar audio clips. The main goal of tuning the feature space is to improve retrieval performance, since some features may have more impact on perceived similarity than others. The tuning process utilizes our genetic algorithm. The R-tree index for efficient retrieval of audio clips is based on the clustering of feature vectors. For each cluster a minimal bounding rectangle (MBR) is formed, thus providing objects for indexing. Inserting new nodes into the R-tree is efficiently performed because of the chosen Quadratic Split algorithm. Our CBMR system implements the point query and the n-nearest neighbors query with the O(logn) time complexity. Different objective functions based on cluster similarity and dissimilarity measures are used for the genetic algorithm. We have found that all of them have similar impact on the retrieval performance in terms of precision and recall. The paper includes experimental results in measuring retrieval performance, reporting significant improvement over the untuned feature space.  相似文献   

4.
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.  相似文献   

5.
6.
Given a large audio database of music recordings, the goal of classical audio identification is to identify a particular audio recording by means of a short audio fragment. Even though recent identification algorithms show a significant degree of robustness towards noise, MP3 compression artifacts, and uniform temporal distortions, the notion of similarity is rather close to the identity. In this paper, we address a higher level retrieval problem, which we refer to as audio matching: given a short query audio clip, the goal is to automatically retrieve all excerpts from all recordings within the database that musically correspond to the query. In our matching scenario, opposed to classical audio identification, we allow semantically motivated variations as they typically occur in different interpretations of a piece of music. To this end, this paper presents an efficient and robust audio matching procedure that works even in the presence of significant variations, such as nonlinear temporal, dynamical, and spectral deviations, where existing algorithms for audio identification would fail. Furthermore, the combination of various deformation- and fault-tolerance mechanisms allows us to employ standard indexing techniques to obtain an efficient, index-based matching procedure, thus providing an important step towards semantically searching large-scale real-world music collections.  相似文献   

7.
This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively; and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.   相似文献   

8.
In this paper, we propose a unified approach to fast index-based music recognition. As an important area within the field of music information retrieval (MIR), the goal of music recognition is, given a database of musical pieces and a query document, to locate all occurrences of that document within the database, up to certain possible errors. In particular, the identification of the query with regard to the database becomes possible. The approach presented in this paper is based on a general algorithmic framework for searching complex patterns of objects in large databases. We describe how this approach may be applied to two important music recognition tasks: The polyphonic (musical score-based) search in polyphonic score data and the identification of pulse-code modulation audio material from a given acoustic waveform. We give an overview on the various aspects of our technology including fault-tolerant search methods. Several areas of application are suggested. We describe several prototypic systems we have developed for those applications including the notify! and the audentify! systems for score- and waveform-based music recognition, respectively.  相似文献   

9.
10.
基于长期学习的多媒体数据库相似性检索   总被引:5,自引:0,他引:5  
基于内容的相似性检索是多媒体数据库研究的重要内容之一.近年来,利用用户相关反馈技术改善检索性能的研究成为新的热点.但是,在传统的相关反馈方法中,系统积累的反馈历史数据未得到充分利用.为了进一步提高检索系统的性能,提出了一种对相关反馈序列日志进行协同过滤在线分析的相关反馈检索方法.该方法使用编辑距离对用户的反馈序列进行相似性度量,并根据协同过滤的思想对数据库中的媒体对象与当前检索的语义相关性进行预测,从而改善检索的效果.实现了一个图像数据库检索原型系统.对11 000幅图像数据库进行的实验表明,与传统相关反馈技术相比,该方法对检索性能有明显的改善.  相似文献   

11.
We present a novel interface to (portable) music players that benefit from intelligently structured collections of audio files. For structuring, we calculate similarities between every pair of songs and model a travelling salesman problem (TSP) that is solved to obtain a playlist (i.e., the track ordering during playback) where the average distance between consecutive pieces of music is minimal according to the similarity measure. The similarities are determined using both audio signal analysis of the music tracks and Web-based artist profile comparison. Indeed, we show how to enhance the quality of the well-established methods based on audio signal processing with features derived from Web pages of music artists. Using TSP allows for creating circular playlists that can be easily browsed with a wheel as input device. We investigate the usefulness of four different TSP algorithms for this purpose. For evaluating the quality of the generated playlists, we apply a number of quality measures to two real-world music collections. It turns out that the proposed combination of audio and text-based similarity yields better results than the initial approach based on audio data only. We implemented an audio player as Java applet to demonstrate the benefits of our approach. Furthermore, we present the results of a small user study conducted to evaluate the quality of the generated playlists  相似文献   

12.
近年来,基于内容的音乐检索研究日益受到重视,不少检索方法被提出来。其中,大部分方法主要集中在精确地表征音乐的某一两个特征,以反映出音乐某一两个突出方面的性质。论文采取完全不同的思路,使用从声谱图中提取的特征矩阵来表示音乐,查询片断与数据库中候选乐曲的相似度从而转化成两个特征矩阵间的相似度。实验结果表明:该方法不仅过程与计算简单,而且能够取得良好的检索效果。  相似文献   

13.
认知科学表明基于流形学习的人脸图像检索能准确反映人脸图片的内在相似性和人类的视觉感知本质. 提出一种基于相关反馈的人脸高维索引方法--NDL,以提高人脸图像检索的性能.同时在该索引基础上提出一种流形空间下的相似查询--虚拟k近邻查询(Vk-NN), 该查询方法特别为基于NDL的人脸检索而设计.首先通过在一定阈值约束下计算任何两个人脸图片的相似度,建立一个称为邻接距离表(NDL)的二维距离图. 同时将距离值用B+-树建立索引.最后, 高维流形空间的Vk-NN查询转化为一维空间的基于B+树的查询. 实验表明:NDL索引在流形空间的检索效率明显优于顺序检索,特别适合海量人脸图片的检索.  相似文献   

14.
A new scheme of learning similarity measure is proposed for content-based image retrieval (CBIR). It learns a boundary that separates the images in the database into two clusters. Images inside the boundary are ranked by their Euclidean distances to the query. The scheme is called constrained similarity measure (CSM), which not only takes into consideration the perceptual similarity between images, but also significantly improves the retrieval performance of the Euclidean distance measure. Two techniques, support vector machine (SVM) and AdaBoost from machine learning, are utilized to learn the boundary. They are compared to see their differences in boundary learning. The positive and negative examples used to learn the boundary are provided by the user with relevance feedback. The CSM metric is evaluated in a large database of 10009 natural images with an accurate ground truth. Experimental results demonstrate the usefulness and effectiveness of the proposed similarity measure for image retrieval.  相似文献   

15.
齐晓倩  陈鸿昶  黄海 《计算机工程》2011,37(19):160-162
根据音频文件数据量大、数据间存在一定相关性的特点,提出一种基于K-L距离的两步固定音频检索方法。该方法采用基于可变门限的直方图检索方法快速筛选出相似度较高的语音文件,利用特征矩阵的K-L距离对剩余语音进行精确比较,取得较好的效果。实验结果证明,该方法能使检索准确率达到90%左右。  相似文献   

16.
作为音乐检索的重要方式,哼唱检索由于其有效性和方便性,引起了广泛的关注。对此提出了一种新的基于得分矩阵的音乐哼唱快速检索技术,可以实现哼唱音乐的快速检索。首先根据哼唱音乐特征,将音乐数据库和用户提供的哼唱片段,按自然停顿方式划分音乐的语句,同时使用K-means聚类算法对音乐的语句片段进行音高相似性计算,并根据聚类情况提取出位置特异性得分矩阵。此外,基于得分矩阵提出NA匹配算法和两种加速分段计分方法,分别是顺序前瞻计分SLS算法和置换矩阵前瞻计分PLA算法。实验结果表明所提出的基于得分矩阵的音乐检索技术能够快速有效地返回查询结果,同时PLA算法具有更有效的哼唱音乐检索结果。  相似文献   

17.
18.
Retrieving similar images from large image databases is a challenging task for today’s content-based retrieval systems. Aiming at high retrieval performance, these systems frequently capture the user’s notion of similarity through expressive image models and adaptive similarity measures. On the query side, image models can significantly differ in quality compared to those stored on the database side. Thus, similarity measures have to be robust against these individual quality changes in order to maintain high retrieval performance. In this paper, we investigate the robustness of the family of signature-based similarity measures in the context of content-based image retrieval. To this end, we introduce the generic concept of average precision stability, which measures the stability of a similarity measure with respect to changes in quality between the query and database side. In addition to the mathematical definition of average precision stability, we include a performance evaluation of the major signature-based similarity measures focusing on their stability with respect to querying image databases by examples of varying quality. Our performance evaluation on recent benchmark image databases reveals that the highest retrieval performance does not necessarily coincide with the highest stability.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号