首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
We consider representing a short temporal fragment of musical audio as a dynamic texture, a model of both the timbral and rhythmical qualities of sound, two of the important aspects required for automatic music analysis. The dynamic texture model treats a sequence of audio feature vectors as a sample from a linear dynamical system. We apply this new representation to the task of automatic song segmentation. In particular, we cluster audio fragments, extracted from a song, as samples from a dynamic texture mixture (DTM) model. We show that the DTM model can both accurately cluster coherent segments in music and detect transition boundaries. Moreover, the generative character of the proposed model of music makes it amenable for a wide range of applications besides segmentation. As examples, we use DTM models of songs to suggest possible improvements in other music information retrieval applications such as music annotation and similarity.   相似文献   

Much research in music information retrieval has focused on query-by-humming systems, which search melodic databases using sung queries. The database retrieval aspect of such systems has received considerable attention, but query processing and the melodic representation have not been examined as carefully. Common methods for query processing are based on musical intuition and historical momentum rather than specific performance criteria; existing systems often employ rudimentary note segmentation or coarse quantization of note estimates. In this work, we examine several alternative query processing methods as well as quantized melodic representations. One common difficulty with designing query-by-humming systems is the coupling between system components. We address this issue by measuring the performance of the query processing system both in isolation and coupled with a retrieval system. We first measure the segmentation performance of several note estimators. We then compute the retrieval accuracy of an experimental query-by-humming system that uses the various note estimators along with varying degrees of pitch and duration quantization. The results show that more advanced query processing can improve both segmentation performance and retrieval performance, although the best segmentation performance does not necessarily yield the best retrieval performance. Further, coarsely quantizing the melodic representation generally degrades retrieval accuracy.  相似文献   

Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational multimedia database management system so that symbolic images can be stored and retrieved based on their content are described. Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented. Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of). Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion time, storage space, retrieval accuracy, and retrieval time. Received June 12, 1998 / Accepted October 13, 1998  相似文献   

In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given query would appear in relevant documents and control the impact of syntactic phrases in a retrieval model. Various experimental results over different types of queries and document collections show that our retrieval model based on variability of syntactic phrases is very effective in terms of retrieval performance, especially for long natural language queries.  相似文献   

It is foreseen that more and more music objects in symbolic format and multimedia objects, such as audio, video, or lyrics, integrated with symbolic music representation (SMR) will be published and broadcasted via the Internet. The SMRs of the flowing songs or multimedia objects will form a music stream. Many interesting applications based on music streams, such as interactive music tutorials, distance music education, and similar theme searching, make the research of content-based retrieval over music streams much important. We consider multiple queries with error tolerances over music streams and address the issue of approximate matching in this environment. We propose a novel approach to continuously process multiple queries over the music streams for finding all the music segments that are similar to the queries. Our approach is based on the concept of n-grams, and two mechanisms are designed to reduce the heavy computation of approximate matching. One mechanism uses the clustering of query n-grams to prune the query n-grams that are irrelevant to the incoming data n-gram. The other mechanism records the data n-gram that matches a query n-gram as a partial answer and incrementally merges the partial answers of the same query. We implement a prototype system for experiments in which songs in the MIDI format are continuously broadcasted, and the user can specify musical segments as queries to monitor the music streams. Experiment results show the effectiveness and efficiency of the proposed approach.  相似文献   

哼唱的随意性和音乐特征提取算法误差都会影响基于哼唱的音乐检索系统的性能。针对上述问题,利用元音帧检测获得较为精确的音符边界,实现音符分割;对分割后的音符提取相对音高和音长,实现符号描述;最后将哼唱片段中音高和音长最值点周围的符号描述作为特征与数据库中的数据进行匹配,得到最相似的候选音乐。实验表明该方法对未经训练的哼唱者的首位匹配正确率达到70%以上,匹配速度也大大优于传统方法,检索性能基本达到了实际应用的需求。  相似文献   

Finding a piece of music based on its content is a key problem in music in for music information retrieval . For example, a user may be interested in finding music based on knowledge of only a small fragment of the overall tune. In this paper, we consider the searching of musical audio using symbolic queries. We first propose a relative pitch approach for representing queries and pieces. Experiments show that this technique, while effective, works best when the whole tune is used as a query. We then present an algorithm for matching based on a pitch classes approach, using the longest common subsequence between a query and target. Experimental evaluation shows that our technique is highly effective, with a mean average precision of 0.77 on a collection of 1808 recordings. Significantly, our technique is robust for truncated queries, being able to maintain effectiveness and to retrieve correct answers whether the query fragment is taken from the beginning, middle, or end of a piece. This represents a significant reduction in the burden placed on users when formulating queries.  相似文献   

Image database design based on 9D-SPA representation for spatial relations   总被引:2,自引:0,他引:2  
Spatial relationships between objects are important features for designing a content-based image retrieval system. We propose a new scheme, called 9D-SPA representation, for encoding the spatial relations in an image. With this representation, important functions of intelligent image database systems such as visualization, browsing, spatial reasoning, iconic indexing, and similarity retrieval can be easily achieved. The capability of discriminating images based on 9D-SPA representation is much more powerful than any spatial representation method based on minimum bounding rectangles or centroids of objects. The similarity measures using 9D-SPA representation provide a wide range of fuzzy matching capability in similarity retrieval to meet different user's requirements. Experimental results showed that our system is very effective in terms of recall and precision. In addition, the 9D-SPA representation can be incorporated into a two-level index structure to help reduce the search space of each query processing. The experimental results also demonstrated that, on average, only 0.1254 percent /spl sim/ 1.6829 percent of symbolic pictures (depending on various degrees of similarity) were accessed per query in an image database containing 50,000 symbolic pictures.  相似文献   

We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.  相似文献   

廖祥文  刘德元  桂林  程学旗  陈国龙 《软件学报》2018,29(10):2899-2914
观点检索是自然语言处理领域中的一个热点研究课题.现有的观点检索模型在检索过程中往往无法根据上下文将词汇进行知识、概念层面的抽象,在语义层面忽略词汇之间的语义联系,观点层面缺乏观点泛化能力.因此,提出一种融合文本概念化与网络表示的观点检索方法.该方法首先利用知识图谱分别将用户查询和文本概念化到正确的概念空间,并利用网络表示将知识图谱中的词汇节点表示成低维向量,然后根据词向量推出查询和文本的向量并用余弦公式计算用户查询与文本的相关度,接着引入基于统计机器学习的分类方法挖掘文本的观点.最后利用概念空间、网络表示空间以及观点分析结果构建特征,并服务于观点检索模型,相关实验表明,本文提出的检索模型可以有效提高多种检索模型的观点检索性能.其中,基于统一相关模型的观点检索方法在两个实验数据集上相比基准方法在MAP评价指标上分别提升了6.1%和9.3%,基于排序学习的观点检索方法在两个实验数据集上相比于基准方法在MAP评价指标上分别提升了2.3%和14.6%.  相似文献   

We present an approach to similarity‐based retrieval from knowledge bases that takes into account both the structure and semantics of knowledge base fragments. Those fragments, or analogues, are represented as sparse binary vectors that allow a computationally efficient estimation of structural and semantic similarity by the vector dot product. We present the representation scheme and experimental results for the knowledge base that was previously used for testing of leading analogical retrieval models MAC/FAC and ARCS. The experiments show that the proposed single‐stage approach provides results compatible with or better than the results of two‐stage models MAC/FAC and ARCS in terms of recall and precision. We argue that the proposed representation scheme is useful for large‐scale knowledge bases and free‐structured database applications.  相似文献   

This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of dissimilarity space that is adapted to the asymmetric classification problem, and in turn to the query-by-example and relevance feedback paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (ie TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.  相似文献   

As an effective technique to manage and explore large scale of video collections, personalized video search has received great attentions in recent years. One of the key problems in the related technique development is how to design and evaluate the similarity measures. Most of the existing approaches simply adopt traditional Euclidean distance or its variants. Consequently, they generally suffer from two main disadvantages: (1) low effectiveness—retrieval accuracy is poor. One of main reasons is that very little research has been carried out on designing an effective fusion scheme for integrating multimodal information (e.g., text, audio and visual) from video sequences and (2) poor scalability—development process of the video similarity metrics is largely disconnected from that of the relevant database access methods (indexing structures). This article reports a new distance metric called personalized video distance to effectively fuse information about individual preference and multimodal properties into a compact signature. Moreover, a novel hashing-based indexing structure has been designed to facilitate fast retrieval process and better scalability. A set of comprehensive empirical studies have been carried out based on two large video test collections and carefully designed queries with different complexities. We observe significant improvements over the existing techniques on various aspects.  相似文献   

于邓  刘玉杰  邢敏敏  李宗民  李华 《软件学报》2019,30(11):3567-3577
在手绘草图检索(sketch-based image retrieval,简称SBIR)领域,引入一种手绘草图的新型检索模型.手绘草图与自然图片之间存在巨大的差异性,这是因为,与自然图片相比,手绘草图展现出高度抽象的视觉表达,用现有的方法对手绘草图进行特征提取,其产生的特征描述子对于手绘草图的内容无法进行有效地拟合;对于相同的物体,不同的人群用手绘草图描述方式和表达也存在巨大的差距,这就使得手绘草图-自然图片的匹配更加困难;同时,将手绘草图与自然图片映射到相同视觉域的工作,也是一项具有困难的任务.所以,手绘草图检索技术是公认的比较有挑战性的任务.提出一种将手绘草图与自然图片在多个层次上映射到同一视觉域的策略来解决跨域的问题.同时,引入多层深度融合卷积神经网络(multi-layer deep fusion convolutional neural network)的框架来训练并获得手绘草图和自然彩色图片的多层特征表达.在Flickr15k图像数据库进行检索实验,实验结果显示,多层深度融合卷积网络学习到的特征的检索精度超过了现有的手工特征以及由自然图片或者手绘草图训练出来的卷积神经网络(convolutional neural network,简称CNN)的特征.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号