首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
We present the design of an algorithm for use in an interactive music system that automatically generates music playlists that fit the music preferences of a user. To this end, we introduce a formal model, define the problem of automatic playlist generation (APG), and prove its NP-hardness. We use a local search (LS) procedure employing a heuristic improvement to standard simulated annealing (SA) to solve the APG problem. In order to employ this LS procedure, we introduce an optimization variant of the APG problem, which includes the definition of penalty functions and a neighborhood structure. To improve upon the performance of the standard SA algorithm, we incorporated three heuristics referred to as song domain reduction, partial constraint voting, and a two-level neighborhood structure. We evaluate the developed algorithm by comparing it to a previously developed approach based on constraint satisfaction (CS), both in terms of run time performance and quality of the solutions. For the latter we not only considered the penalty of the resulting solutions, but we also performed a conclusive user evaluation to assess the subjective quality of the playlists generated by both algorithms. In all tests, the LS algorithm was shown to be a dramatic improvement over the CS algorithm.  相似文献   

2.
3.
We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.  相似文献   

4.
We introduce a novel compression paradigm to generalize a class of Lempel-Ziv algorithms for lossy compression of multimedia. Based upon the fact that music, in particular electronically generated sound, has substantial level of repetitiveness within a single clip, we generalize the basic Lempel-Ziv compression algorithm to support representing a single window of audio using a linear combination of filtered past windows. In this positioning paper, we present a detailed overview of the new lossy compression paradigm, we identify the basic challenges such as similarity search and present preliminary experimental results on a benchmark of electronically generated musical pieces  相似文献   

5.
We explore the use of objective audio signal features to model the individualized (subjective) perception of similarity between music files. We present MUSIPER, a content-based music retrieval system which constructs music similarity perception models of its users by associating different music similarity measures to different users. Specifically, a user-supplied relevance feedback procedure and related neural network-based incremental learning allows the system to determine which subset of a set of objective features approximates more accurately the subjective music similarity perception of a specific user. Our implementation and evaluation of MUSIPER verifies the relation between subsets of objective features and individualized music similarity perception and exhibits significant improvement in individualized perceived similarity in subsequent music retrievals.  相似文献   

6.
Generation of Personalized Music Sports Video Using Multimodal Cues   总被引:2,自引:0,他引:2  
In this paper, we propose a novel automatic approach for personalized music sports video generation. Two research challenges are addressed, specifically the semantic sports video content extraction and the automatic music video composition. For the first challenge, we propose to use multimodal (audio, video, and text) feature analysis and alignment to detect the semantics of events in broadcast sports video. For the second challenge, we introduce the video-centric and music-centric music video composition schemes and proposed a dynamic-programming based algorithm to perform fully or semi-automatic generation of personalized music sports video. The experimental results and user evaluations are promising and show that our systems generated music sports video is comparable to professionally generated ones. Our proposed system greatly facilitates the music sports video editing task for both professionals and amateurs  相似文献   

7.
We propose and evaluate a system for content-based visualization and exploration of music collections. The system is based on a modification of Kohonen’s Self-Organizing Map algorithm and allows users to choose the locations of clusters containing acoustically similar tracks on the music space. A user study conducted to evaluate the system shows that the possibility of personalizing the music space was perceived as difficult. Conversely, the user study and objective metrics derived from users’ interactions with the interface demonstrate that the proposed system helped individuals create playlists faster and, under some circumstances, more effectively. We believe that personalized browsing interfaces are an important area of research in Multimedia Information Retrieval, and both the system and user study contribute to the growing work in this field.  相似文献   

8.
Given a large audio database of music recordings, the goal of classical audio identification is to identify a particular audio recording by means of a short audio fragment. Even though recent identification algorithms show a significant degree of robustness towards noise, MP3 compression artifacts, and uniform temporal distortions, the notion of similarity is rather close to the identity. In this paper, we address a higher level retrieval problem, which we refer to as audio matching: given a short query audio clip, the goal is to automatically retrieve all excerpts from all recordings within the database that musically correspond to the query. In our matching scenario, opposed to classical audio identification, we allow semantically motivated variations as they typically occur in different interpretations of a piece of music. To this end, this paper presents an efficient and robust audio matching procedure that works even in the presence of significant variations, such as nonlinear temporal, dynamical, and spectral deviations, where existing algorithms for audio identification would fail. Furthermore, the combination of various deformation- and fault-tolerance mechanisms allows us to employ standard indexing techniques to obtain an efficient, index-based matching procedure, thus providing an important step towards semantically searching large-scale real-world music collections.  相似文献   

9.
Music Information Retrieval Using Social Tags and Audio   总被引:1,自引:0,他引:1  
In this paper we describe a novel approach to applying text-based information retrieval techniques to music collections. We represent tracks with a joint vocabulary consisting of both conventional words, drawn from social tags, and audio muswords, representing characteristics of automatically-identified regions of interest within the signal. We build vector space and latent aspect models indexing words and muswords for a collection of tracks, and show experimentally that retrieval with these models is extremely well-behaved. We find in particular that retrieval performance remains good for tracks by artists unseen by our models in training, and even if tags for their tracks are extremely sparse.   相似文献   

10.
This paper deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bit-rate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learned object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects  相似文献   

11.
In this paper, we present a new method for analysis of musical structure that captures local prediction and global repetition properties of audio signals in one information processing framework. The method is motivated by a recent work in music perception where machine features were shown to correspond to human judgments of familiarity and emotional force when listening to music. Using a notion of information rate in a model-based framework, we develop a measure of mutual information between past and present in a time signal and show that it consist of two factors - prediction property related to data statistics within an individual block of signal features, and repetition property based on differences in model likelihood across blocks. The first factor, when applied to spectral representation of audio signals, is known as spectral anticipation, and the second factor is known as recurrence analysis. We present algorithms for estimation of these measures and create a visualization that displays their temporal structure in musical recordings. Considering these features as a measure of the amount of information processing that a listening system performs on a signal, information rate is used to detect interest points in music. Several musical works with different performances are analyzed in this paper, and their structure and interest points are displayed and discussed. Extensions of this approach towards a general framework of characterizing machine listening experience are suggested.  相似文献   

12.
An audio fingerprint is a compact yet very robust representation of the perceptually relevant parts of an audio signal. It can be used for content-based audio identification, even when the audio is severely distorted. Audio compression changes the fingerprint slightly. We show that these small fingerprint differences due to compression can be used to estimate the signal-to-noise ratio (SNR) of the compressed audio file compared to the original. This is a useful content-based distortion estimate, when the original, uncompressed audio file is unavailable. The method uses the audio fingerprints only. For stochastic signals distorted by additive noise, an analytical expression is obtained for the average fingerprint difference as function of the SNR level. This model is based on an analysis of the Philips robust hash (PRH) algorithm. We show that for uncorrelated signals, the bit error rate (BER) is approximately inversely proportional to the square root of the SNR of the signal. This model is extended to correlated signals and music. For an experimental verification of our proposed model, we divide the field of audio fingerprinting algorithms into three categories. From each category, we select an algorithm that is representative for that category. Experiments show that the behavior predicted by the stochastic model for the PRH also holds for the two other algorithms.  相似文献   

13.
视频数据中的音频流包含了丰富的语义信息.在基于内容的视频检索中,对音频信息的分析是不可分割的一部分.本文主要讨论基于内容的音频场景分割,分析各种音频特征及提取方法,并在此基础上提出一种新的音频流分割方法,根据六种音频类型(语音、音乐、静音、环境音、纯语音、音乐背景下的语音和环境音背景下的语音)的音频特征对视频数据中的音频流分割音频场景.实验证明该方法是有效的,在保证一定的分割精度的同时,准确率和查全率都得到了较大的提高.  相似文献   

14.
Algorithms for automatic playlist generation solve the problem of tedious and time consuming manual selection of musical playlists. These algorithms generate playlists according to the user’s music preferences of the moment. The user describes his preferences either by manually inputting a couple of example songs, or by defining constraints for the choice of music. The approaches to automatic playlist generation up to now were based on examining the metadata attached to the music pieces. Some of them took also the listening history into account. But anyway, a heavy accent has been put on the metadata, while the listening history, if it was used at all, had a minor role. Missings and errors in metadata frequently appear, especially when the music is acquired from the Internet. When the metadata is missing or wrong, the approaches proposed so far cannot work. Besides, entering constraints for the playlist generation can be a difficult activity. In our approach we ignored the metadata and focused on examining the listening habits. We developed two simple algorithms that track the listening habits and form a listener model—a profile of listening habits. The listener model is then used for automatic playlist generation. We developed a simple media player which tracks the listening habits and generates playlists according to the listener model. We tried the solution with a group of users. The experiment was not a successful one, but it threw some new light on the relationship between the listening habits and playlist generation.  相似文献   

15.

Emotion is considered a physiological state that appears whenever a transformation is observed by an individual in their environment or body. While studying the literature, it has been observed that combining the electrical activity of the brain, along with other physiological signals for the accurate analysis of human emotions is yet to be explored in greater depth. On the basis of physiological signals, this work has proposed a model using machine learning approaches for the calibration of music mood and human emotion. The proposed model consists of three phases (a) prediction of the mood of the song based on audio signals, (b) prediction of the emotion of the human-based on physiological signals using EEG, GSR, ECG, Pulse Detector, and finally, (c) the mapping has been done between the music mood and the human emotion and classifies them in real-time. Extensive experimentations have been conducted on the different music mood datasets and human emotion for influential feature extraction, training, testing and performance evaluation. An effort has been made to observe and measure the human emotions up to a certain degree of accuracy and efficiency by recording a person’s bio- signals in response to music. Further, to test the applicability of the proposed work, playlists are generated based on the user’s real-time emotion determined using features generated from different physiological sensors and mood depicted by musical excerpts. This work could prove to be helpful for improving mental and physical health by scientifically analyzing the physiological signals.

  相似文献   

16.
In this paper we address the problem of music playlist generation based on the user-personalized specification of context information. We propose a generic semantic multicriteria ant colony algorithm capable of dealing with domain-specific problems by the use of ontologies. It also employs any associated metadata defined in the search space to feed its solution-building process and considers any restrictions the user may have specified. An example is given of the use of the algorithm for the problem of automatic generation of music playlists, some experimental results are presented and the behavior of the approach is explained in different situations.  相似文献   

17.
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.  相似文献   

18.
This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively; and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.   相似文献   

19.
Finding a piece of music based on its content is a key problem in music in for music information retrieval . For example, a user may be interested in finding music based on knowledge of only a small fragment of the overall tune. In this paper, we consider the searching of musical audio using symbolic queries. We first propose a relative pitch approach for representing queries and pieces. Experiments show that this technique, while effective, works best when the whole tune is used as a query. We then present an algorithm for matching based on a pitch classes approach, using the longest common subsequence between a query and target. Experimental evaluation shows that our technique is highly effective, with a mean average precision of 0.77 on a collection of 1808 recordings. Significantly, our technique is robust for truncated queries, being able to maintain effectiveness and to retrieve correct answers whether the query fragment is taken from the beginning, middle, or end of a piece. This represents a significant reduction in the burden placed on users when formulating queries.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号