In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification and indexing has been becoming a focus in the research of audio processing and pattern recognition. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. Then the proposed method uses a Gaussian mixture model (GMM)-based classifier where the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood. Audio clip extraction, feature extraction, creation of index, and retrieval of the query clip are the major issues in automatic audio indexing and retrieval. A method for indexing the classified audio using LPCC features and k-means clustering algorithm is proposed.  相似文献   

Research into acoustic recognition systems for animals has focused on call-dependent and species identification rather than call-independent and individual identification. Here we present a system for automatic call-independent individual recognition using mel-frequency cepstral coefficients and Gaussian mixture models across four passerine species. To our knowledge this is the first application of these techniques to the individual recognition of birds, and the results are promising. Accuracies of 89.1-92.5% were achieved and the acoustic feature and classifier method developed here have excellent potential for individual animal recognition and can be easily applied to other species.  相似文献   

通过选用德州仪器公司带浮点功能的TMS320C6713DSP芯片作为系统核心处理器,结合MSP430单片机作为外围控制器,给出了一种实时语音识别系统的设计方法。该系统核心算法采用美尔频率倒谱系数作为特征参数进行特征提取和动态时间规整(DTW)算法进行模式匹配。通过编程调试,该系统具有良好的灵活性和实时性,在抗噪声、鲁棒性和识别率等方面有明显的提高。该系统在许多领域可作为实用化的一种参考。  相似文献   

提出了一种基于高斯混合模型(GMM)的自然环境声音的识别方法。提取Mel频率倒谱系数(MFCCs)来分析声音信号;对于每种声音使用期望最大化算法基于MFCC特征集建立高斯混合模型;使用最小错误率判决规则和投票裁决的方法进行识别。使用GMM对36种自然环境的声音进行识别的正确率可达95.83%,且识别效果优于K最近邻(KNN)。  相似文献   

音频的自动分类,尤其是语音和音乐的分类,是提取音频结构和内容语义的重要手段之一,它在基于内容的音频检索、视频的检索和摘要以及语音文档检索等领域都有重大的应用价值.由于隐马尔可夫模型能够很好地刻画音频信号的时间统计特性,因此,提出一种基于隐马尔可夫模型的音频分类算法,用于语音、音乐以及它们的混合声音的分类.实验结果表明,隐马尔可夫模型的音频分类性能较好,最优分类精度达到90.28%.  相似文献   

由于传统的说话人识别中,常用的特征参数有线性预测系数(LPC)、Mel频率倒谱系数(MFCC),采用单一特征参数并不能很好地反映说话人特性.针对这种情况,提出了引入Delta特征和特征组合的方法.实验结果表明,引入Delta特征和特征组合对识别效果有明显提高,实验中选用GMM作为说话人识别模型.  相似文献   

快速准确地检测出采集录音中的咳嗽部分对许多呼吸道疾病的临床诊断有着重要意义。使用梅尔频率倒谱系数(MFCC)作为特征参数来分析所要处理的声音信号,并用多组训练数据分别为采集录音中的咳嗽音、说话声、笑声、清喉音等数据各建立两个高斯混合模型(GMM),将每类数据得到的两个GMM进行线性组合得到最终的表示每类数据的概率模型,进而实现对咳嗽音部分的检测。在此基础上引入了小波去噪理论,分别对每段数据去噪并进行端点检测。仿真实验结果表明所提方法能够有效提高系统的识别性能。  相似文献   

基于DGMM的中国手语识别系统   总被引:5,自引:0,他引:5  
手语是聋人使用的语方,是由手形动作辅之以表情姿势由符号构成的比较稳定的表达系统,是一种靠动作/视觉交际的语言,手语识别的研究目标是让机器“看懂”聋人的语方,手语识别和手语合成相结合构成一个“人-机手语翻译系统”,便于聋人与周围环境的交流,手语识别问题是动态手势信号即手语信号的识别问题,考虑系统的实时性及识别效率,系统选取Cyberglove型号数据手套作为手语输入设备,并采用DGMM(dynami  相似文献   

Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network.  相似文献   

针对智能变电站的无人值守需求及现有故障诊断系统的不足,提出一种电力设备音频监测及故障诊断系统。根据变电站电力设备音频信号信噪比较低的特点,采用具有强鲁棒性的梅尔频率倒谱系数作为判断音频信号异常的特征参数,在此基础上根据音频特征构成多样本观测序列,并采用隐马尔科夫模型进行故障诊断,通过对比对数似然估计概率的输出值确定故障类型。该方法具有实时性较强的优势,也避免了现有故障诊断方法要求较大样本容量的缺陷。实验结果表明,该故障诊断系统具有较高的识别率和鲁棒性。  相似文献   

In a recent study, we have introduced the problem of identifying cell-phones using recorded speech and shown that speech signals convey information about the source device, making it possible to identify the source with some accuracy. In this paper, we consider recognizing source cell-phone microphones using non-speech segments of recorded speech. Taking an information-theoretic approach, we use Gaussian Mixture Model (GMM) trained with maximum mutual information (MMI) to represent device-specific features. Experimental results using Mel-frequency and linear frequency cepstral coefficients (MFCC and LFCC) show that features extracted from the non-speech segments of speech contain higher mutual information and yield higher recognition rates than those from speech portions or the whole utterance. Identification rate improves from 96.42% to 98.39% and equal error rate (EER) reduces from 1.20% to 0.47% when non-speech parts are used to extract features. Recognition results are provided with classical GMM trained both with maximum likelihood (ML) and maximum mutual information (MMI) criteria, as well as support vector machines (SVMs). Identification under additive noise case is also considered and it is shown that identification rates reduces dramatically in case of additive noise.  相似文献   

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often physically incapacitated. Mel-Frequency Cepstral Coefficients (MFCCs) have been proven to be an appropriate representation of dysarthric speech, but the question of which MFCC-based feature set represents dysarthric acoustic features most effectively has not been answered. Moreover, most of the current dysarthric speech recognisers are either speaker-dependent (SD) or speaker-adaptive (SA), and they perform poorly in terms of generalisability as a speaker-independent (SI) model. First, by comparing the results of 28 dysarthric SD speech recognisers, this study identifies the best-performing set of MFCC parameters, which can represent dysarthric acoustic features to be used in Artificial Neural Network (ANN)-based ASR. Next, this paper studies the application of ANNs as a fixed-length isolated-word SI ASR for individuals who suffer from dysarthria. The results show that the speech recognisers trained by the conventional 12 coefficients MFCC features without the use of delta and acceleration features provided the best accuracy, and the proposed SI ASR recognised the speech of the unforeseen dysarthric evaluation subjects with word recognition rate of 68.38%.  相似文献   

采用主成分分析的特征映射   总被引:1,自引:0,他引:1  
在与文本无关的说话人识别研究中, 特征映射的方法可以有效减少信道的影响. 本文首先通过主成分分析的方法在模型域中估计出信道因子所在的空间, 然后通过映射的方法在特征参数域中减去信道因子的影响. 采用这种方法需要有信道信息标记的数据, 但是在特征映射时不需要对信道进行判决. 在NIST 2006年SRE 1conv4w-1conv4w数据库上, 采用本文推荐方法的系统相对基线系统在等错误率上降低了19\%.  相似文献   

Blind source separation (BSS) has attained much attention in signal processing society due to its ‘blind’ property and wide applications. However, there are still some open problems, such as underdetermined BSS, noise BSS. In this paper, we propose a Bayesian approach to improve the separation performance of instantaneous mixtures with non-stationary sources by taking into account the internal organization of the non-stationary sources. Gaussian mixture model (GMM) is used to model the distribution of source signals and the continuous density hidden Markov model (CDHMM) is derived to track the non-stationarity inside the source signals. Source signals can switch between several states such that the separation performance can be significantly improved. An expectation-maximization (EM) algorithm is derived to estimate the mixing coefficients, the CDHMM parameters and the noise covariance. The source signals are recovered via maximum a posteriori (MAP) approach. To ensure the convergence of the proposed algorithm, the proper prior densities, conjugate prior densities, are assigned to estimation coefficients for incorporating the prior information. The initialization scheme for the estimates is also discussed. Systematic simulations are used to illustrate the performance of the proposed algorithm. Simulation results show that the proposed algorithm has more robust separation performance in terms of similarity score in noise environments in comparison with the classical BSS algorithms in determined mixture case. Additionally, since the mixing matrix and the sources are estimated jointly, the proposed EM algorithm also works well in underdetermined case. Furthermore, the proposed algorithm converges quickly with proper initialization.  相似文献   

提出了一种纹理图像隐马尔可夫捆绑树(HMT-b)模型的建模方法。该方法通过对小波分解后的三个子带(HH,HL,LH)中相应节点捆绑后作为一棵复合树进行建模,改进了迭代算法,所建模型能更好地描述三个子带问实际存在的小波系数相关性;对于每个尺度中的小波系数分布,HMT-b采用高斯混合分布来拟合。同时研究了尺度系数基于小波域泊松分布的统计建模方法。  相似文献   

提出了一种基于混合高斯隐马尔可夫模型的带式输送机堆煤时刻预测方法。该方法根据传感器采集的带式输送机功率时序数据建立带式输送机运行状态的混合高斯隐马尔可夫模型,基于该模型采用基于图的状态序列遍历算法和基于切普曼-柯尔莫哥罗夫方程的概率转移算法对带式输送机堆煤时刻进行预测:基于图的状态序列遍历算法通过寻找当前状态到堆煤状态的通路确定剩余时间;基于切普曼-柯尔莫哥罗夫方程的概率转移算法通过粒子群优化算法及切普曼-柯尔莫哥罗夫方程交叉验证来获取训练样本上失败状态的概率阈值,并计算当前的状态迁移到超过失败状态概率阈值的转移次数来确定剩余时间。基于煤矿生产实际数据集的实验验证了该方法可有效预测带式输送机的堆煤发生时刻。  相似文献   

基于Contourlet域的隐马尔可夫树模型能反映不同尺度系数之间、不同方向系数之间的相关性,基于此,提出了一种基于Contourlet域隐马尔可夫树模型的图像融合算法。对源图像进行Contourlet变换,并针对高频子带系数建模并训练得到每一系数的后验概率;利用该后验概率指导高频系数融合的规则,对边缘和背景区域进行不同的融合处理,以尽可能保留原始图像的重要特征;进行Contourlet反变换得到最终融合结果。针对多聚焦图像进行了融合实验,采用联合熵、熵、相关系数、清晰度等指标对融合效果进行评价,实验表明了该算法优于基于Contourlet域的常规融合算法以及小波域隐马尔可夫树融合算法。  相似文献   

Obstructive sleep apnoea (OSA) is a highly prevalent disease affecting an estimated 2–4% of the adult male population that is difficult and very costly to diagnose because symptoms can remain unnoticed for years. The reference diagnostic method, Polysomnography (PSG), requires the patient to spend a night at the hospital monitored by specialized equipment. Therefore fast and less costly screening techniques are normally applied for setting priorities to proceed to the polysomnography diagnosis. In this article the use of speech analysis is proposed as an alternative or complement to existing screening methods. A set of voice features that could be related to apnoea are defined, based on previous results from other authors and our own analysis. These features are analyzed first in isolation and then in combination to assess their discriminative power to classify voices as corresponding to apnoea patients and healthy subjects. This analysis is performed in a database containing three repetitions of four carefully designed sentences read by 40 healthy subjects and 42 subjects suffering from severe apnoea. As a result of the analysis, a linear discriminant model (LDA) was defined including a subset of eight features (signal-to-disperiodicity ratio, a nasality measure, harmonic-to-noise ratio, jitter, difference between third and second formants on a specific vowel, duration of two of the sentences and the percentage of silence in one of the sentences). This model was tested on a separate database containing 20 healthy and 20 apnoea subjects yielding a sensitivity of 85% and a specificity of 75%, with a F1-measure of 81%. These results indicate that the proposed method, only requiring a few minutes to record and analyze the patient's voice during the visit to the specialist, could help in the development of a non-intrusive, fast and convenient PSG-complementary screening technique for OSA.  相似文献   

针对目前的室内人员步态识别方法存在计算量大、设备成本高、鲁棒性低等问题,提出一种基于信道状态信息的高鲁棒性室内人员步态识别方法WiKown。通过快速傅里叶变换设置能量指示器监测人员行走行为,将采集的CSI步态数据经滤波与降噪处理后以滑动窗口的方式提取特征值,得到人员步态的CSI信息后建立观测序列,最后通过高斯分布叠加拟合后引入隐马尔科夫模型计算观测序列概率,生成步态参数模型。在走廊、实验室和大厅真实多人环境中,WiKown方法对单人步态的平均识别率达到92.71%。实验结果表明,与决策树、动态时间规整和长短时记忆网络方法相比较,该方法能有效地识别出人员的步态信息,提升了识别精度和鲁棒性。  相似文献   

驾驶辅助系统被认为是解决交通安全问题的有效手段, 开发驾驶辅助系统的基础是对车辆的行为进行准确的识别, 以应用于车辆安全预警, 路径规划, 智能导航等方面. 目前存在的基于支持向量机模型, 隐马尔科夫模型, 卷积神经网络等行为识别方法还存在计算量与精度平衡的问题. 本文结合了隐马尔科夫模型与高斯混合模型, 提出了高斯混合隐马尔科夫模型, 利用美国联邦公路管理局NGSIM数据集对此方法进行了实验验证, 结果表明该方法对自由换道行为识别具有较高的精度. 本文还对高斯混合隐马尔科夫模型的实验参数进行了优化, 以期达到最好的识别效果, 为未来智能驾驶的车辆行为识别提供了参考.  相似文献   

