共查询到20条相似文献,搜索用时 578 毫秒
1.
为了提高车载噪声环境下语音端点检测的准确性,介绍了一种新的时间序列复杂性测度:模糊熵,并将其应用于语音信号的特征提取。分别以样本熵和模糊熵提取含噪语音信号的特征,使用双门限法对语音信号进行端点检测,特征门限值使用模糊C均值聚类算法和贝叶斯信息准则算法确定。仿真结果表明在车载噪声环境下与样本熵算法相比,模糊熵算法能更好地区分噪声信号和语音信号,具有更好的端点检测性能,相同环境下模糊熵算法的错误率比样本熵算法降低了16%以上。 相似文献
2.
提出了一种基于Bark子波变换和概率神经网络(PNN)的语音识别模型。利用符合人耳听觉特性的Bark滤波器组进行信号重构并提取语音特征,然后利用训练好的概率神经网络进行识别。通过训练大量语音样本来构成语音识别库,并建立综合识别系统。实验结果表明该方法与传统的LPCC/DTW和MFCC/DWT方法相比,识别率分别提高了14.9%和10.1%,达到了96.9%的识别率。 相似文献
3.
在特定人语音识别系统中,噪声严重影响语音特征提取,并导致语音识别率明显下降。针对在噪声环境下语音识别率偏低的问题,通过谱减法去除语音信号噪声,并根据语音信号语谱图可视化的特点,运用脉冲耦合神经网络从语音信号的语谱图中提取熵序列作为特征参数进行语音识别。实验结果表明,该方法能较好地去除语音信号中的噪声,并能使在噪声环境下的特定人语音识别系统具有较好的识别效果。 相似文献
4.
5.
人在不同情感下的语音信号其非平稳性尤为明显,传统的MFCC只能反映语音信号的静态特征,经验模态分解能够精细地刻画语音信号的非平稳特性。为提取情感语音的非平稳特征,用经验模态分解将情感语音信号分解为一系列固有模态函数分量,通过Mel滤波器后取其对数能量,进行DCT反变换后得到改进的MFCC作为情感识别的新特征,采用支持向量机对高兴、生气、厌烦和恐惧等四种语音情感识别。仿真实验结果表明:改进的MFCC识别率达到77.17%,在不同的信噪比下,识别率最大可提高3.26%。 相似文献
6.
隐马尔可夫模型(Hidden Markov Model,HMM)是一种有效的时序信号建模方法,已广泛用于语音识别、文字识别等领域,近年来也被用于人的行为识别。人的行为序列是一种特殊的时序信号,每类行为往往包含若干帧关键姿势。利用行为序列的这个特点,提出了AdaBoost-EHMM(AdaBoost-Exemplar-based HMM)算法,并将该算法应用于行为识别中。利用AdaBoost的特征选择方法将行为序列中的典型样本逐个选择出来作为HMM观测概率模型的均值,之后融合多级分类器进行行为识别。实验结果证明AdaBoost-EHMM算法在保证算法收敛的同时提高了识别率。 相似文献
7.
8.
针对传统智能家居系统在智能终端控制中存在智能化和人性化水平低的问题,提出设计一个基于语音识别的智能家居控制系统。该系统主要由智能终端、主控中心和控制节点组成。对主控中心和控制节点的软硬件方案进行设计后,即可采用系统中的图像采集模块采集家居数据;然后通过改进信号子空间与维纳滤波的两级降噪方法进行语音信号增强;之后选用24维梅尔倒谱系数对语音特征进行提取;最后采用隐马尔可夫模型HMM算法进行模板训练和模式匹配,最终实现智能家居语音自动控制。实验结果表明,在800个测试样本中,共有789个样本被正确识别,平均识别率为98.6%。且在5种不同的信噪比下,语音识别率均保持在94%及以上,最高可达97.4%。由此说明本系统具备较好的抗噪能力,提出的语音识别算法对满足系统语音自动化和智能化需求,在实际产品应用中具有重要意义。 相似文献
9.
10.
利用话者识别原理和语音数字信号处理技术对人声建模方法进行研究,建立了基于GMM模型的VDR环境下的人声识别基准系统;从分析影响人声识别率因素的角度出发,指出传统算法的不足,并提出一种基于近似熵的语音端点检测算法。理论分析和实验结果证明:新算法能有效屏蔽大动态冲击性噪声,解决了语音的虚检现象,并且在低信噪比0 dB情况下的识别率提升66%。 相似文献
11.
N. Ruiz Reyes P. Vera Candeas S. García Galán J.E. Muñoz 《Engineering Applications of Artificial Intelligence》2010,23(2):151-159
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a two-stage cascaded classification scheme. The cascaded classification scheme is composed of a statistical pattern recognition classifier followed by a genetic fuzzy system. For the first stage of the classification scheme, other widely used classifiers, such as neural networks and support vector machines, have also been considered in order to assess the robustness of the proposed classification scheme. Comparison with well-proven signal features is also performed. In this work, the most commonly used genetic learning algorithms (Michigan and Pittsburgh) have been evaluated in the proposed two-stage classification scheme. The genetic fuzzy system gives rise to an improvement of about 4% in the classification accuracy rate. Experimental results show the good performance of the proposed approach with a classification accuracy rate of about 97% for the best trial. 相似文献
12.
This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization. 相似文献
13.
语音/音乐区分是音频高效编码、音频检索、自动语音识别等音频处理和分析的重要步骤。本文提出一种新颖的语音/音乐分割与分类方法,首先根据相邻帧间的均方能量差异检测音频的变化点,实现分割;然后对音频段提取低带能量方差比、倒谱能量调制、熵调制等八维特征,用人工神经网络做分类。实验结果显示,本文算法和特征具有很高的分割准确率和分类正确率。 相似文献
14.
基于顺序统计滤波的实时语音端点检测算法 总被引:1,自引:0,他引:1
针对嵌入式语音识别系统,提出了一种高效的实时语音端点检测算法. 算法以子带频谱熵为语音/噪声的区分特征, 首先将每帧语音的频谱划分成若干个子带, 计算出每个子带的频谱熵, 然后把相继若干帧的子带频谱熵经过一组顺序统计滤波器获得每帧的频谱熵, 根据频谱熵的值对输入的语音进行分类. 实验结果表明, 该算法能够有效地区分语音和噪声, 可以显著地提高语音识别系统的性能. 在不同的噪声环境和信噪比条件下具有鲁棒性. 此外, 本文提出的算法计算代价小, 简单易实现, 适合实时嵌入式语音识别系统的应用. 相似文献
15.
《Expert systems with applications》2007,32(2):485-498
Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification. 相似文献
16.
This paper proposes the use of speech-specific features for speech / music classification. Features representing the excitation source, vocal tract system and syllabic rate of speech are explored. The normalized autocorrelation peak strength of zero frequency filtered signal, and peak-to-sidelobe ratio of the Hilbert envelope of linear prediction residual are the two source features. The log mel energy feature represents the vocal tract information. The modulation spectrum represents the slowly-varying temporal envelope corresponding to the speech syllabic rate. The novelty of the present work is in analyzing the behavior of these features for the discrimination of speech and music regions. These features are non-linearly mapped and combined to perform the classification task using a threshold based approach. Further, the performance of speech-specific features is evaluated using classifiers such as Gaussian mixture models, and support vector machines. It is observed that the performance of the speech-specific features is better compared to existing features. Additional improvement for speech / music classification is achieved when speech-specific features are combined with the existing ones, indicating different aspects of information exploited by the former. 相似文献
17.
王景芳 《计算机工程与应用》2011,47(20):147-150
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。 相似文献
18.
在车载命令词识别系统中,背景音乐的播放降低了命令词识别率。而音乐信号因自相关矩阵特征值扩散度较大和谱平坦度较小在算法收敛速度上比语音信号慢,以至于传统的自适应抵消算法很难将音乐干扰消除干净,保证不了命令词识别率。为了解决这一问题,文中引入了预白化自适应滤波器来减小其自相关矩阵特征值扩散度和增大谱平坦度,并将此方法结合双滤波自适应算法,用来消除车内的背景音乐,以提高车载命令词识别系统的识别率。实验结果表明,经过背景音乐消除,命令词识别率有明显的提高,并且预白化也提高了识别率。 相似文献
19.
This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech
and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature
set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure
of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical
features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification
of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers.
Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a
set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio
classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain
and frequency domain features in audio classification. 相似文献
20.
Qiu-yu Zhang Wen-jin Hu Yi-bo Huang Si-bin Qiao 《Multimedia Tools and Applications》2018,77(2):1555-1581
Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness. 相似文献