首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
随着手机录音设备的普及以及各种功能强大且易于操作的数字媒体编辑软件的出现,语音的手机来源识别已成为多媒体取证领域重要的热点问题,针对该问题提出了一种基于频谱融合特征的手机来源识别算法。首先,通过分析不同手机相同语音的语谱图,发现不同手机的语音频谱特征是不同的;然后对语音的频谱信息量、对数谱和相位谱特征进行了研究;其次,将三个特征串联构成原始融合特征,并用每个样本的原始融合特征构建样本特征空间;最后,采用WEKA平台的CfsSubsetEval评价函数按照最佳优先搜索原则对所构建的特征空间进行特征选择,并采用LibSVM对特征选择后的样本特征空间进行模型训练和样本识别。实验部分给出了特征选择后的频谱单一特征和频谱融合特征在23款主流型号的手机语音库上分类的结果。实验结果表明,该算法使用频谱融合特征有效提高了手机品牌类内的平均识别准确率,在TIMIT翻录语音数据库和自建的CKC-SD语音数据库上分别达到99.96%和99.91%;另外,与Hanilci基于梅尔倒谱系数特征的录音设备来源识别算法进行了对比,平均识别准确率分别提高了6.58和5.14个百分点。因此可得本文所提特征可有效提高平均识别准确率,降低手机类内识别的误判率。  相似文献   

2.
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an adaptive network-based fuzzy inference system (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called warped LPC-based spectral centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and the classical features proposed in the literature for audio classification is performed, aiming to assess the good discriminatory power of the proposed feature. The vector length used to describe the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). With the aim of increasing the classification accuracy percentage, the feature space is then transformed to a new feature space by LDA. The classification task is performed applying ANFIS to the features in the transformed space. To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech signals show the good discriminating power of the proposed approach.  相似文献   

3.
一种组合特征抽取的新方法   总被引:10,自引:0,他引:10  
该文提出了一种基于特征级融合的特征抽取新方法,首先,给出了一种合理的特征融合策略,即利用复向量给出组合特征的表示,将特征空间从实向量空间拓广到复向量空间,然后,发展了具有统计不相关性的鉴别分析的理论,并将其用于复向量空间内最优鉴别特征的抽取,最后,在Concordia大学的CENPARMI手写体阿拉伯数字数据库以及南京理工大学NUST603HW手写汉字库上的试验结果表明,所提出的组合特征抽取方法不仅具有很强的维数压缩能力,而且较大幅度地提高了识别率。  相似文献   

4.
5.
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features.  相似文献   

6.
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.  相似文献   

7.
As multimedia becomes the dominant form of entertainment through an ever increasing range of digital formats, there has been a growing interest in obtaining information from entertainment media. Speech is one of the core resources in multimedia, providing a foundation for the extraction of semantic information. Thus, detecting speech is a critical first step for speech-based information retrieval systems. This work focuses on speech detection in one of the dominant forms of entertainment media: feature films. A novel approach for voice activity detection (VAD) in film audio is proposed. The approach uses correlation to analyze associations of Mel Frequency Cepstral Coefficient (MFCC) pairs in speech and non-speech data. This information then drives feature selection for the creation of MFCC cross-covariance feature vectors (MFCC-CCs) which are used to train a random forest classifier to solve a binary speech/non-speech classification problem on audio data from entertainment media. The classifier performance is evaluated on a number of test sets and achieves a classification accuracy of up to 94%. The approach is also compared with state of the art and contemporary VAD algorithms, and demonstrates competitive results.  相似文献   

8.
9.
动态滑动窗口的数据流聚类方法   总被引:2,自引:0,他引:2       下载免费PDF全文
数据流聚类是聚类分析中的重要问题。针对数据流的流速是变化的问题,在两阶段聚类框架基础上提出基于动态滑动窗口的数据流聚类算法。在线阶段,引入微聚类特征来存储数据流的概要信息,利用存储的概要信息动态调整滑动窗口规模,并计算数据点与微聚类中心的距离,以维护微聚类特征;离线阶段,对在线聚类阶段的聚类结果采用K-means算法进行宏聚类,生成最终聚类。实验结果表明,该算法具有较高的聚类质量和较好的伸缩性。  相似文献   

10.
Similarity measure of contents plays an important role in TV personalization, e.g., TV content group recommendation and similar TV content retrieval, which essentially are content clustering and example-based retrieval. We define similar TV contents to be those with similar semantic information, e.g., plot, background, genre, etc. Several similarity measure methods, notably vector space model based and category hierarchy model based similarity measure schemes, have been proposed for the purpose of data clustering and example-based retrieval. Each method has advantages and shortcomings of its own in TV content similarity measure. In this paper, we propose a hybrid approach for TV content similarity measure, which combines both vector space model and category hierarchy model. The hybrid measure proposed here makes the most of TV metadata information and takes advantage of the two similarity measurements. It measures TV content similarity from the semantic level other than the physical level. Furthermore, we propose an adaptive strategy for setting the combination parameters. The experimental results showed that using the hybrid similarity measure proposed here is superior to using either alone for TV content clustering and example-based retrieval.  相似文献   

11.
Speaker variability is known to have an adverse impact on speech systems that process linguistic content, such as speech and language recognition. However, speech production changes in individuals due to stress and emotions have similarly detrimental effect also on the task of speaker recognition as they introduce mismatch with the speaker models typically trained on modal speech. The focus of this study is on the analysis of stress-induced variations in speech and design of an automatic stress level assessment scheme that could be used in directing stress-dependent acoustic models or normalization strategies. Current stress detection methods typically employ a binary decision based on whether the speaker is or not under stress. In reality, the amount of stress in individuals varies and can change gradually. Using speech and biometric data collected in a real-world, variable-stress level law enforcement training scenario, this study considers two methods for stress level assessment. The first approach uses a nearest neighbor clustering scheme at the vowel token and sentence levels to classify speech data into three levels of stress. The second approach employs Euclidean distance metrics within the multi-dimensional feature space to provide real-time stress level tracking capability. Evaluations on audio data confirmed by biometric readings show both methods to be effective in assessment of stress level within a speaker (average accuracy of 55.6?% in a 3-way classification task). In addition, an impact of high-level stress on in-set speaker recognition is evaluated and shown to reduce the accuracy from 91.7?% (low/mid stress) to 21.4?% (high level stress).  相似文献   

12.
This paper proposes a new and reliable segmentation approach based on a fusion framework for combining multiple region-based segmentation maps (with any number of regions) to provide a final improved (i.e., accurate and consistent) segmentation result. The core of this new combination model is based on a consensus (cost) function derived from the recent information Theory based variation of information criterion, proposed by Meila, and allowing to quantify the amount of information that is lost or gained in changing from one clustering to another. In this case, the resulting consensus energy-based segmentation fusion model can be efficiently optimized by exploiting an iterative steepest local energy descent strategy combined with a connectivity constraint. This new framework of segmentation combination, relying on the fusion of inaccurate, quickly and roughly calculated, spatial clustering results, emerges as an appealing alternative to the use of complex segmentation models existing nowadays. Experiments on the Berkeley Segmentation Dataset show that the proposed fusion framework compares favorably to previous techniques in terms of reliability scores.  相似文献   

13.
In this paper, we propose a novel approach for automatic mine detection in SOund NAvigation and Ranging (SONAR) data. The proposed framework relies on possibilistic‐based fusion method to classify SONAR instances as mine or mine‐like object. The proposed semisupervised algorithm minimizes some objective function, which combines context identification, multi‐algorithm fusion criteria, and a semisupervised learning term. The optimization aims to learn contexts as compact clusters in subspaces of the high‐dimensional feature space via possibilistic semisupervised learning and feature discrimination. The semisupervised clustering component assigns degree of typicality to each data sample to identify and reduce the influence of noise points and outliers. Then, the approach yields optimal fusion parameters for each context. The experiments on synthetic data sets and standard SONAR data set show that our semisupervised local fusion outperforms individual classifiers and unsupervised local fusion.  相似文献   

14.
The role of multimedia resources in information warfare in wikimedia is investigated. A new approach to uploading files in Wikimedia is proposed with the aim to enhance the impact of multimedia resources used for information warfare in Wikimedia. The proposed approach is based on clustering of media files accumulated in Wikimedia commons. Media file clustering is formalized as an optimization problem with control constraints. A PSO algorithm with adaptive parameters has been developed to solve the optimization problem.  相似文献   

15.
针对单传感器煤矿数据预测存在的片面性问题,提出将信息融合技术与相空间重构技术相结合的多传感器煤矿数据的预测模型。对井下多种传感器,包括瓦斯浓度、风速、温度传感器,进行融合预测。以多类传感器时序数据为研究对象,首先利用信息融合的方法分别对各类传感器数据依次进行数据层融合、特征层融合;然后采用关联积分方法对两级融合之后的传感器数据分别确定相重构的时间延迟τ和嵌入维数m两个参数;最后结合多变量相空间重构技术,将各类传感器数据融合重构相空间,运用基于K-Means聚类的加权一阶局域法构建多传感器数据的预测模型。数据来源于山西省阳泉煤矿,采集了近20G数据,以瓦斯浓度、风速、温度三种传感器数据进行实验,实验结果表明:对于特征层的融合,每15分钟时间段内的数据经融合后可有效作为衡量这段时间内的特征,经过预测模型计算后,与时间段为5分钟、10分钟、20分钟相比较误差达到最小ESS=0.003,较目前的最小误差值0.05,误差大大下降,故融合预测效果较好,可以较准确地预测未来15分钟后的传感器数据,可有充足时间进一步为井下的安全评估提供决策依据。  相似文献   

16.
The continuous evolution of smart devices has led to serious limitations in multimedia applications. The multimedia graphic design and animation and the increased use of rich and complex multimedia content on the web or other media have all created a need to diversify the accessibility of the content. Therefore several techniques are used today to design a universally accessible content. The main techniques that are still used to maintain accessibility is to create two parallel streams of design and development of the same content. Thus, the continuous evolution will certainly lead to create other streams. For this, the automatic reasoning on multimedia to allow a computer to modify the design according to different variables, devices capabilities, user status and context to provide personalized adapted content. In this paper, we propose an abstract document model called XMS short for XML Multimedia Specification; it describes the composition of an original multimedia document and can be extended to any document type. We present how we may use spatial information present in this document to adapt and modify the original document. We mainly focus on the spatial aspect of a web document, a combination of RCC8, qualitative distances, and directions are used to describe the layout of a set of objects. The level of granularity of the definition of the objects defines the level of details that will be processed by our PROLOG based inference system, simplified versions of algorithms from the inference system and how it works on the spatial dimension of the document are shown. In the end samples of how spatial relations manipulation algorithms work are illustrated.  相似文献   

17.
Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques.  相似文献   

18.
针对人脸识别过程中所提取特征向量的信息不完整性与整体图像信息数据量较大的问题,提出一种类矩阵神经核特征融合的人脸识别方法。该方法为深度神经网络的首层升维操作,首先将人脸数据作为特征向量的集合,利用随机矩阵列采样构成随机特征矩阵;其次设计深度神经核将随机特征矩阵映射为高维空间中的新特征向量;最后利用快速收缩算法求解匹配过程中的不定线性代数方程组,使收敛速度达到二阶收敛。该方法既克服了直接使用人脸图像数据空间复杂度较大的问题,又增加了特征的非线性结构,提高了特征向量的表达能力。实验结果表明,该方法识别率高、稳定性强、鲁棒性好,适合处理大型数据。  相似文献   

19.
陈卓夷 《计算机科学》2007,34(4):119-120
关键帧提取是基于内容的视频检索的一个重要的组成部分,所提取的关键帧的有效性,直接影响视频检索的结果。文中提出了一种基于非参数密度估计聚类的关键帧提取方法。首先,通过提取图像的颜色特征和运动特征,然后利用均值漂移聚类方法对融合了颜色和运动信息的特征空间进行聚类。它能自动确定类别数并具有严格的收敛陛,从而大大减少了运算量,提高了运算速度。实验证明,本方法的提取结果与人的主观视觉感知系统具有良好的一致性。  相似文献   

20.
Video-based human recognition at a distance remains a challenging problem for the fusion of multimodal biometrics. As compared to the approach based on match score level fusion, in this paper, we present a new approach that utilizes and integrates information from side face and gait at the feature level. The features of face and gait are obtained separately using principal component analysis (PCA) from enhanced side face image (ESFI) and gait energy image (GEI), respectively. Multiple discriminant analysis (MDA) is employed on the concatenated features of face and gait to obtain discriminating synthetic features. This process allows the generation of better features and reduces the curse of dimensionality. The proposed scheme is tested using two comparative data sets to show the effect of changing clothes and face changing over time. Moreover, the proposed feature level fusion is compared with the match score level fusion and another feature level fusion scheme. The experimental results demonstrate that the synthetic features, encoding both side face and gait information, carry more discriminating power than the individual biometrics features, and the proposed feature level fusion scheme outperforms the match score level and another feature level fusion scheme. The performance of different fusion schemes is also shown as cumulative match characteristic (CMC) curves. They further demonstrate the strength of the proposed fusion scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号