共查询到10条相似文献,搜索用时 125 毫秒
1.
This work addresses the soundtrack indexing of multimedia documents. Our purpose is to detect and locate sound unity to structure the audio dataflow in program broadcasts (reports). We present two audio classification tools that we have developed. The first one, a speech music classification tool, is based on three original features: entropy modulation, stationary segment duration (with a Forward–Backward Divergence algorithm) and number of segments. They are merged with the classical 4 Hz modulation energy. It is divided into two classifications (speech/non-speech and music/non-music) and provides more than 90% of accuracy for speech detection and 89% for music detection. The other system, a jingle identification tool, uses an Euclidean distance in the spectral domain to index the audio data flow. Results show that is efficient: among 132 jingles to recognize, we have detected 130. Systems are tested on TV and radio corpora (more than 10 h). They are simple, robust and can be improved on every corpus without training or adaptation. 相似文献
2.
Listening to music on personal, digital devices whilst mobile is an enjoyable, everyday activity. We explore a scheme for
exploiting this practice to immerse listeners in navigation cues. Our prototype, ONTRACK, continuously adapts audio, modifying
the spatial balance and volume to lead listeners to their target destination. First we report on an initial lab-based evaluation
that demonstrated the approach’s efficacy: users were able to complete tasks within a reasonable time and their subjective
feedback was positive. Encouraged by these results we constructed a handheld prototype. Here, we discuss this implementation
and the results of field-trials. These indicate that even with a low-fidelity realisation of the concept, users can quite
effectively navigate complicated routes.
相似文献
3.
In the age of speech and voice recognition technologies, sign language recognition is an essential part of ensuring equal
access for deaf people. To date, sign language recognition research has mostly ignored facial expressions that arise as part
of a natural sign language discourse, even though they carry important grammatical and prosodic information. One reason is
that tracking the motion and dynamics of expressions in human faces from video is a hard task, especially with the high number
of occlusions from the signers’ hands. This paper presents a 3D deformable model tracking system to address this problem,
and applies it to sequences of native signers, taken from the National Center of Sign Language and Gesture Resources (NCSLGR),
with a special emphasis on outlier rejection methods to handle occlusions. The experiments conducted in this paper validate
the output of the face tracker against expert human annotations of the NCSLGR corpus, demonstrate the promise of the proposed
face tracking framework for sign language data, and reveal that the tracking framework picks up properties that ideally complement
human annotations for linguistic research.
相似文献
4.
We present an enhancement towards adaptive video training for PhoneGuide, a digital museum guidance system for ordinary camera-equipped
mobile phones. It enables museum visitors to identify exhibits by capturing photos of them. In this article, a combined solution
of object recognition and pervasive tracking is extended to a client–server-system for improving data acquisition and for
supporting scale-invariant object recognition. A static as well as a dynamic training technique are presented that preprocess
the collected object data differently and apply two types of neural networks (NN) for classification. Furthermore, the system
enables a temporal adaptation for ensuring a continuous data acquisition to improve the recognition rate over time. A formal
field experiment reveals current recognition rates and indicates the practicability of both methods under realistic conditions
in a museum.
相似文献
5.
This paper introduces the problem of discovering maximum-length repeating patterns in music objects. A novel algorithm is
presented for the extraction of this kind of patterns from a melody music object. The proposed algorithm discovers all maximum-length
repeating patterns using an “aggressive” accession during searching, by avoiding costly repetition frequency calculation and
by examining as few as possible repeating patterns in order to reach the maximum-length repeating pattern(s). Detailed experimental
results illustrate the significant performance gains due to the proposed algorithm, compared to an existing baseline algorithm.
相似文献
6.
In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the
included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment
of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community,
we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps:
first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then,
the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio
track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their
combination.
相似文献
7.
This paper describes security and privacy issues for multimedia database management systems. Multimedia data includes text,
images, audio and video. It describes access control for multimedia database management systems and describes security policies
and security architectures for such systems. Privacy problems that result from multimedia data mining are also discussed.
相似文献
8.
In this paper, we study the performance improvement that it is possible to obtain combining classifiers based on different
notions (each trained using a different physicochemical property of amino-acids). This multi-classifier has been tested in
three problems: HIV-protease; recognition of T-cell epitopes; predictive vaccinology. We propose a multi-classifier that combines
a classifier that approaches the problem as a two-class pattern recognition problem and a method based on a one-class classifier.
Several classifiers combined with the “sum rule” enables us to obtain an improvement performance over the best results previously
published in the literature.
相似文献
9.
The map-seeking circuit algorithm (MSC) was developed by Arathorn to efficiently solve the combinatorial problem of correspondence
maximization, which arises in applications like computer vision, motion estimation, image matching, and automatic speech recognition
(Arathorn, D.W. in Map-Seeking Circuits in Visual Cognition: A Computational Mechanism for Biological and Machine Vision,
Stanford University Press, Stanford, 2002). Given an input image, a template image, and a discrete set of transformations,
the goal is to find a composition of transformations which gives the best fit between the transformed input and the template.
We imbed the associated combinatorial search problem within a continuous framework by using superposition, and we analyze
a resulting constrained optimization problem. We present several numerical schemes to compute local solutions, and we compare
their performance on a pair of test problems: an image matching problem and the challenging problem of automatically solving
a Rubik’s cube.
相似文献
10.
Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical
foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However,
despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training
complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world
data mining applications often involve huge numbers of data records. Thus it is too expensive to perform multiple scans on
the entire data set, and it is also infeasible to put the data set in memory. This paper presents a method, Clustering-Based SVM (CB-SVM), that maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies
a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples.
These samples carry statistical summaries of the data and maximize the benefit of learning. Our analyses show that the training
complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much less than that of
the entire data set. Our experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large
data sets and very accurate in terms of classification.
A preliminary version of the paper, “ Classifying Large Data Sets Using SVM with Hierarchical Clusters”, by H. Yu, J. Yang, and J. Han, appeared in Proc. 2003 Int. Conf. on Knowledge Discovery in Databases (KDD'03), Washington, DC, August 2003. However, this submission has substantially extended the previous paper and contains new and
major-value added technical contribution in comparison with the conference publication.
相似文献
|