首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification and indexing has been becoming a focus in the research of audio processing and pattern recognition. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. Then the proposed method uses a Gaussian mixture model (GMM)-based classifier where the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood. Audio clip extraction, feature extraction, creation of index, and retrieval of the query clip are the major issues in automatic audio indexing and retrieval. A method for indexing the classified audio using LPCC features and k-means clustering algorithm is proposed.  相似文献   

2.
一种新的近似旋律匹配方法及其在哼唱检索系统中的应用   总被引:20,自引:0,他引:20  
提出了一种近似旋律匹配(approximate meltlody matching)的新方法——线性对齐匹配法,并在此基础上实现了一个哼唱检索(queryby humming)系统原型.与已有的基于内容的音乐检索(content-based music retrieval)不同,该算法并非基于近似符号串匹配、统计模型或者特征空间,而是根据相近旋律的音高轮廓在几何上的相似性,将音高和节奏特征一并考虑所设计而成的全新算法.通过实验检验该算法的有效性,在含有3864首乐曲的搜索空间中,检索62段人声哼唱,线性对齐匹配法取得了90.3%的前3位命中率,相比传统的近似符号匹配算法高出11%以上.这一实验结果有力地表明了线性对齐匹配法的有效性,及其应用于大型数字音乐检索引擎的可行性.  相似文献   

3.
The content‐based classification and retrieval of real‐world audio clips is one of the challenging tasks in multimedia information retrieval. Although the problem has been well studied in the last two decades, most of the current retrieval systems cannot provide flexible querying of audio clips due to the mixed‐type form (e.g., speech over music and speech over environmental sound) of audio information in real world. We present here a complete, scalable, and extensible content‐based classification and retrieval system for mixed‐type audio clips. The system gives users an opportunity for flexible querying of audio data semantically by providing four alternative ways, namely, querying by mixed‐type audio classes, querying by domain‐based fuzzy classes, querying by temporal information and temporal relationships, and querying by example (QBE). In order to reduce the retrieval time, a hash‐based indexing technique is introduced. Two kinds of experiments were conducted on the audio tracks of the TRECVID news broadcasts to evaluate the performance of the proposed system. The results obtained from our experiments demonstrate that the Audio Spectrum Flatness feature in MPEG‐7 standard performs better in music audio samples compared to other kinds of audio samples and the system is robust under different conditions. © 2011 Wiley Periodicals, Inc.  相似文献   

4.
5.
In content-based image retrieval systems, the content of an image such as color, shapes and textures are used to retrieve images that are similar to a query image. Most of the existing work focus on the retrieval effectiveness of using content for retrieval, i.e., study the accuracy (in terms of recall and precision) of using different representations of content. In this paper, we address the issue of retrieval efficiency, i.e., study the speed of retrieval, since a slow system is not useful for large image databases. In particular, we look at using the shape feature as the content of an image, and employ the centroid–radii model to represent the shape feature of objects in an image. This facilitates multi-resolution and similarity retrievals. Furthermore, using the model, the shape of an object can be transformed into a point in a high-dimensional data space. We can thus employ any existing high-dimensional point index as an index to speed up the retrieval of images. We propose a multi-level R-tree index, called the Nested R-trees (NR-trees) and compare its performance with that of the R-tree. Our experimental study shows that NR-trees can reduce the retrieval time significantly compared to R-tree, and facilitate similarity retrieval. We note that our NR-trees can also be used to index high-dimensional point data commonly found in many other applications.  相似文献   

6.
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.  相似文献   

7.
We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.  相似文献   

8.
We consider representing a short temporal fragment of musical audio as a dynamic texture, a model of both the timbral and rhythmical qualities of sound, two of the important aspects required for automatic music analysis. The dynamic texture model treats a sequence of audio feature vectors as a sample from a linear dynamical system. We apply this new representation to the task of automatic song segmentation. In particular, we cluster audio fragments, extracted from a song, as samples from a dynamic texture mixture (DTM) model. We show that the DTM model can both accurately cluster coherent segments in music and detect transition boundaries. Moreover, the generative character of the proposed model of music makes it amenable for a wide range of applications besides segmentation. As examples, we use DTM models of songs to suggest possible improvements in other music information retrieval applications such as music annotation and similarity.   相似文献   

9.
10.
11.
In image-based retrieval, global or local features sufficiently discriminative to summarize the image content are commonly extracted first. Traditional features, such as color, texture, shape or corner, characterizing image content are not reliable in terms of similarity measure. A good match in the feature domain does not necessarily map to image pairs with similar relationship. Applying these features as search keys may retrieve dissimilar false-positive images, or leave similar false-negative ones behind. Moreover, images are inherently ambiguous since they contain a great amount of information that justifies many different facets of interpretation. Using a single image to query a database might employ features that do not match user's expectation and retrieve results with low precision/recall ratios. How to automatically extract reliable image features as a query key that matches user's expectation in a content-based image retrieval (CBIR) system is an important topic.The objective of the present work is to propose a multiple-instance learning image retrieval system by incorporating an isometric embedded similarity measure. Multiple-instance learning is a way of modeling ambiguity in supervised learning given multiple examples. From a small collection of positive and negative example images, semantically relevant concepts can be derived automatically and employed to retrieve images from an image database. Each positive and negative example images are represented by a linear combination of fractal orthonormal basis vectors. The mapping coefficients of an image projected onto each orthonormal basis constitute a feature vector. The Euclidean-distance similarity measure is proved to remain consistent, i.e., isometric embedded, between any image pairs before and after the projection onto orthonormal axes. Not only similar images generate points close to each other in the feature space, but also dissimilar ones produce feature points far apart.The utilization of an isometric-embedded fractal-based technique to extract reliable image features, combined with a multiple-instance learning paradigm to derive relevant concepts, can produce desirable retrieval results that better match user's expectation. In order to demonstrate the feasibility of the proposed approach, two sets of test for querying an image database are performed, namely, the fractal-based feature extraction algorithm vs. three other feature extractors, and single-instance vs. multiple-instance learning. Both the retrieval results, execution time and precision/recall curves show favorably for the proposed multiple-instance fractal-based approach.  相似文献   

12.
本文在音乐情感分类中的两个重要的环节:特征选择和分类器上进行了探索.在特征选择方面基于传统算法中单一特征无法全面表达音乐情感的问题,本文提出了多特征融合的方法,具体操作方式是用音色特征与韵律特征相结合作为音乐情感的符号表达;在分类器选择中,本文采用了在音频检索领域表现较好的深度置信网络进行音乐情感训练和分类.实验结果表明,该算法对音乐情感分类的表现较好,高于单一特征的分类方法和SVM分类的方法.  相似文献   

13.
基于加权MFCC的音频检索   总被引:1,自引:0,他引:1  
通过研究音频特征值提取和特征匹配算法,给出了一个完整的音频数据检索系统框架。该系统框架主要分析了音频特征提取和特征匹配。在音频特征提取部分对经典的MFCC系数进行了分析,提出了基于熵值法加权的MFCC系数,提高了检索的识别率。音频匹配部分根据特征参数矩阵表征音频信息的性质,引入了矩阵相似度的匹配方法,提高了检索效率。实验结果表明系统识别效率提高1.2%,用时降低22%,系统的性能得到明显改善。  相似文献   

14.
对基于向量空间模型的检索方法进行改进,提出基于本体语义的信息检索模型。将WordNet词典作为参照本体来计算概念之间的语义相似度,依据查询中标引项之间的相似度,对查询向量中的标引项进行权值调整,并参照Word-Net本体对标引项进行同义和上下位扩展,在此基础上定义查询与文档间的相似度。与传统的基于词形的信息检索方法相比,该方法可以提高语义层面上的检索精度。  相似文献   

15.
Widely used in data-driven computer animation, motion capture data exhibits its complexity both spatially and temporally. The indexing and retrieval of motion data is a hard task that is not totally solved. In this paper, we present an efficient motion data indexing and retrieval method based on self-organizing map and Smith–Waterman string similarity metric. Existing motion clips are first used to train a self-organizing map and then indexed by the nodes of the map to get the motion strings. The Smith–Waterman algorithm, a local similarity measure method for string comparison, is used in clustering the motion strings. Then the motion motif of each cluster is extracted for the retrieval of example-based query. As an unsupervised learning approach, our method can cluster motion clips automatically without needing to know their motion types. Experiment results on a dataset of various kinds of motion show that the proposed method not only clusters the motion data accurately but also retrieves appropriate motion data efficiently.  相似文献   

16.
Today, digital audio applications are part of our everyday lives. Audio classification can provide powerful tools for content management. If an audio clip automatically can be classified it can be stored in an organised database, which can improve the management of audio dramatically. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. The AANN model captures the distribution of the acoustic features of a class, and the backpropagation learning algorithm is used to adjust the weights of the network to minimize the mean square error for each feature vector. The proposed method also compares the performance of AANN with a Gaussian mixture model (GMM) wherein the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood.  相似文献   

17.
基于语义学习的图像多模态检索   总被引:1,自引:0,他引:1  
针对语义鸿沟问题,在语义学习的基础上设计图像的多模态检索系统。该系统结合3种查询方式进行图像检索。基于视觉特征的查询通过特征提取与相似度匹配进行排位。基于标签的查询建立在图像自动标注的基础上,但在语义空间之外的泛化能力较差。基于语义图例的查询能够在很大程度上克服这个缺陷,通过在显式或隐式的语义空间上进行查询,使检索结果更符合人类感知。实验结果表明,与基于纹理特征的图像检索相比,基于语义图例的检索具有更高的精度及召回率。  相似文献   

18.
This paper presents a study of the Multi-Type Reverse Nearest Neighbor (MTRNN) query problem. Traditionally, a reverse nearest neighbor (RNN) query finds all the objects that have the query point as their nearest neighbor. In contrast, an MTRNN query finds all the objects that have the query point in their multi-type nearest neighbors. Existing RNN queries find an influence set by considering only one feature type. However, the influence from multiple feature types is often critical for strategic decision making in many business scenarios, such as site selection for a new shopping center. To that end, we first formalize the notion of the MTRNN query by considering the influence of multiple feature types. We also propose R-tree based algorithms to find the influence set for a given query point and multiple feature types. Finally, experimental results are provided to show the strength of the proposed algorithms as well as design decisions related to performance tuning.  相似文献   

19.
As more and more information is captured and stored in digital form, many techniques and systems have been developed for indexing and retrieval of text documents, audio, images, and video. The retrieval is normally based on similarities between extracted feature vectors of the query and stored items. Feature vectors are usually multidimensional. When the number of stored objects and/or the number of dimensions of the feature vectors are large, it will be too slow to linearly search all stored feature vectors to find those that satisfy the query criteria. Techniques and data structures are thus required to organize feature vectors and manage the search process so that objects relevant to the query can be located quickly. This paper provides a survey of these techniques and data structures.  相似文献   

20.
情感是音乐最重要的语义信息,音乐情感分类广泛应用于音乐检索,音乐推荐和音乐治疗等领域.传统的音乐情感分类大都是基于音频的,但基于现在的技术水平,很难从音频中提取出语义相关的音频特征.歌词文本中蕴含着一些情感信息,结合歌词进行音乐情感分类可以进一步提高分类性能.本文将面向中文歌词进行研究,构建一部合理的音乐情感词典是歌词情感分析的前提和基础,因此基于Word2Vec构建音乐领域的中文情感词典,并基于情感词加权和词性进行中文音乐情感分析.本文首先以VA情感模型为基础构建情感词表,采用Word2Vec中词语相似度计算的思想扩展情感词表,构建中文音乐情感词典,词典中包含每个词的情感类别和情感权值.然后,依照该词典获取情感词权值,构建基于TF-IDF (Term Frequency-Inverse Document Frequency)和词性的歌词文本的特征向量,最终实现音乐情感分类.实验结果表明所构建的音乐情感词典更适用于音乐领域,同时在构造特征向量时考虑词性的影响也可以提高准确率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号