首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
为了提高车载噪声环境下语音端点检测的准确性,介绍了一种新的时间序列复杂性测度:模糊熵,并将其应用于语音信号的特征提取。分别以样本熵和模糊熵提取含噪语音信号的特征,使用双门限法对语音信号进行端点检测,特征门限值使用模糊C均值聚类算法和贝叶斯信息准则算法确定。仿真结果表明在车载噪声环境下与样本熵算法相比,模糊熵算法能更好地区分噪声信号和语音信号,具有更好的端点检测性能,相同环境下模糊熵算法的错误率比样本熵算法降低了16%以上。  相似文献   

2.
提出了一种基于Bark子波变换和概率神经网络(PNN)的语音识别模型。利用符合人耳听觉特性的Bark滤波器组进行信号重构并提取语音特征,然后利用训练好的概率神经网络进行识别。通过训练大量语音样本来构成语音识别库,并建立综合识别系统。实验结果表明该方法与传统的LPCC/DTW和MFCC/DWT方法相比,识别率分别提高了14.9%和10.1%,达到了96.9%的识别率。  相似文献   

3.
在特定人语音识别系统中,噪声严重影响语音特征提取,并导致语音识别率明显下降。针对在噪声环境下语音识别率偏低的问题,通过谱减法去除语音信号噪声,并根据语音信号语谱图可视化的特点,运用脉冲耦合神经网络从语音信号的语谱图中提取熵序列作为特征参数进行语音识别。实验结果表明,该方法能较好地去除语音信号中的噪声,并能使在噪声环境下的特定人语音识别系统具有较好的识别效果。  相似文献   

4.
利用短时傅立叶变换方法对语音信号进行了分析,得到了由各个语音信号相同频率点能量序列构成的时间序列簇。运用时间序列预处理和数理统计的方法,建立了时间序列簇中每一列向量序列与由时间序列簇的每一行向量序列的均值构成的序列之间的线性回归方程。分离了时间序列簇的趋势量和波动量,提取了说话者语音信号的特征参数,并通过这些参数对被识别语音信号进行说话者识别。实验表明在8个说话者194个语音集合中,以文中描述的距离参数为识别指标、取3个特征频率点时,开集平均识别率最高为97.94%。  相似文献   

5.
人在不同情感下的语音信号其非平稳性尤为明显,传统的MFCC只能反映语音信号的静态特征,经验模态分解能够精细地刻画语音信号的非平稳特性。为提取情感语音的非平稳特征,用经验模态分解将情感语音信号分解为一系列固有模态函数分量,通过Mel滤波器后取其对数能量,进行DCT反变换后得到改进的MFCC作为情感识别的新特征,采用支持向量机对高兴、生气、厌烦和恐惧等四种语音情感识别。仿真实验结果表明:改进的MFCC识别率达到77.17%,在不同的信噪比下,识别率最大可提高3.26%。  相似文献   

6.
隐马尔可夫模型(Hidden Markov Model,HMM)是一种有效的时序信号建模方法,已广泛用于语音识别、文字识别等领域,近年来也被用于人的行为识别。人的行为序列是一种特殊的时序信号,每类行为往往包含若干帧关键姿势。利用行为序列的这个特点,提出了AdaBoost-EHMM(AdaBoost-Exemplar-based HMM)算法,并将该算法应用于行为识别中。利用AdaBoost的特征选择方法将行为序列中的典型样本逐个选择出来作为HMM观测概率模型的均值,之后融合多级分类器进行行为识别。实验结果证明AdaBoost-EHMM算法在保证算法收敛的同时提高了识别率。  相似文献   

7.
为了对转向架关键部件进行及时的性能检测和故障诊断, 实验选用高速列车转向架典型故障振动信号, 先进行小波分解, 在各个子频带上提取小波熵特征, 用于反映振动信号在各尺度上的复杂程度。在多个小波熵特征张成的高维特征空间中对四种转向架典型故障工况进行支持向量机分类识别, 实验结果表明识别率随运行速度逐步提高, 在速度达到200 km/h时得到了90%以上的识别率, 验证了小波熵特征对于高速列车故障信号分析的有效性。  相似文献   

8.
针对传统智能家居系统在智能终端控制中存在智能化和人性化水平低的问题,提出设计一个基于语音识别的智能家居控制系统。该系统主要由智能终端、主控中心和控制节点组成。对主控中心和控制节点的软硬件方案进行设计后,即可采用系统中的图像采集模块采集家居数据;然后通过改进信号子空间与维纳滤波的两级降噪方法进行语音信号增强;之后选用24维梅尔倒谱系数对语音特征进行提取;最后采用隐马尔可夫模型HMM算法进行模板训练和模式匹配,最终实现智能家居语音自动控制。实验结果表明,在800个测试样本中,共有789个样本被正确识别,平均识别率为98.6%。且在5种不同的信噪比下,语音识别率均保持在94%及以上,最高可达97.4%。由此说明本系统具备较好的抗噪能力,提出的语音识别算法对满足系统语音自动化和智能化需求,在实际产品应用中具有重要意义。  相似文献   

9.
周萍  唐李珍 《计算机工程》2011,37(2):169-171
针对短训练语音的说话人识别系统,提出一种基于决策层融合的识别算法。识别时运用经验模式分解法对语音信号进行处理,对获取的固有模态函数分量提取语音特征序列,分别进行匹配,通过决策层融合算法,将所得的匹配结果与传统独立识别结果相结合,最终输出识别结果。利用信号分解的方法,实现待测语音信号的重复识别,同时采用决策层融合算法优化识别结果,从而在短训练语音情况下,使系统的识别率得到保障。实验结果表明,该算法在短训练语音识别系统中的识别效果优于传统方法。  相似文献   

10.
利用话者识别原理和语音数字信号处理技术对人声建模方法进行研究,建立了基于GMM模型的VDR环境下的人声识别基准系统;从分析影响人声识别率因素的角度出发,指出传统算法的不足,并提出一种基于近似熵的语音端点检测算法。理论分析和实验结果证明:新算法能有效屏蔽大动态冲击性噪声,解决了语音的虚检现象,并且在低信噪比0 dB情况下的识别率提升66%。  相似文献   

11.
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a two-stage cascaded classification scheme. The cascaded classification scheme is composed of a statistical pattern recognition classifier followed by a genetic fuzzy system. For the first stage of the classification scheme, other widely used classifiers, such as neural networks and support vector machines, have also been considered in order to assess the robustness of the proposed classification scheme. Comparison with well-proven signal features is also performed. In this work, the most commonly used genetic learning algorithms (Michigan and Pittsburgh) have been evaluated in the proposed two-stage classification scheme. The genetic fuzzy system gives rise to an improvement of about 4% in the classification accuracy rate. Experimental results show the good performance of the proposed approach with a classification accuracy rate of about 97% for the best trial.  相似文献   

12.
This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.  相似文献   

13.
语音/音乐区分是音频高效编码、音频检索、自动语音识别等音频处理和分析的重要步骤。本文提出一种新颖的语音/音乐分割与分类方法,首先根据相邻帧间的均方能量差异检测音频的变化点,实现分割;然后对音频段提取低带能量方差比、倒谱能量调制、熵调制等八维特征,用人工神经网络做分类。实验结果显示,本文算法和特征具有很高的分割准确率和分类正确率。  相似文献   

14.
基于顺序统计滤波的实时语音端点检测算法   总被引:1,自引:0,他引:1  
针对嵌入式语音识别系统,提出了一种高效的实时语音端点检测算法. 算法以子带频谱熵为语音/噪声的区分特征, 首先将每帧语音的频谱划分成若干个子带, 计算出每个子带的频谱熵, 然后把相继若干帧的子带频谱熵经过一组顺序统计滤波器获得每帧的频谱熵, 根据频谱熵的值对输入的语音进行分类. 实验结果表明, 该算法能够有效地区分语音和噪声, 可以显著地提高语音识别系统的性能. 在不同的噪声环境和信噪比条件下具有鲁棒性. 此外, 本文提出的算法计算代价小, 简单易实现, 适合实时嵌入式语音识别系统的应用.  相似文献   

15.
Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification.  相似文献   

16.
This paper proposes the use of speech-specific features for speech / music classification. Features representing the excitation source, vocal tract system and syllabic rate of speech are explored. The normalized autocorrelation peak strength of zero frequency filtered signal, and peak-to-sidelobe ratio of the Hilbert envelope of linear prediction residual are the two source features. The log mel energy feature represents the vocal tract information. The modulation spectrum represents the slowly-varying temporal envelope corresponding to the speech syllabic rate. The novelty of the present work is in analyzing the behavior of these features for the discrimination of speech and music regions. These features are non-linearly mapped and combined to perform the classification task using a threshold based approach. Further, the performance of speech-specific features is evaluated using classifiers such as Gaussian mixture models, and support vector machines. It is observed that the performance of the speech-specific features is better compared to existing features. Additional improvement for speech / music classification is achieved when speech-specific features are combined with the existing ones, indicating different aspects of information exploited by the former.  相似文献   

17.
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。  相似文献   

18.
在车载命令词识别系统中,背景音乐的播放降低了命令词识别率。而音乐信号因自相关矩阵特征值扩散度较大和谱平坦度较小在算法收敛速度上比语音信号慢,以至于传统的自适应抵消算法很难将音乐干扰消除干净,保证不了命令词识别率。为了解决这一问题,文中引入了预白化自适应滤波器来减小其自相关矩阵特征值扩散度和增大谱平坦度,并将此方法结合双滤波自适应算法,用来消除车内的背景音乐,以提高车载命令词识别系统的识别率。实验结果表明,经过背景音乐消除,命令词识别率有明显的提高,并且预白化也提高了识别率。  相似文献   

19.
This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.  相似文献   

20.
Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号