首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
俸云  景新幸 《计算机仿真》2009,26(10):327-329,343
美尔频率倒谱参数(Mel frequency cepstral coefficient,MFCC)仿真了人耳的听觉特性,在语音识别实际应用中取得了比较高的识别率。为了更进一步完善系统以提高系统的识别率,提出一种将MFCC和残差相位相结合的方法进行语音识别。将传统的基于MFCC的语音识别效果,与基于MFCC和残差相位相结合的语音识别效果进行比较。通过在MATLAB环境下进行仿真实验得出理想结论。利用MFCC和残差相位相结合的识别率高于MFCC的系统的识别率。所提出的改进算法更好的完善了识别系统,获得了更高的语音识别率。  相似文献   

2.
为了解决耳语音识别系统中训练语音和测试耳语音来自不同发音模式的失配问题,本文提出一种基于联合因子分析(Joint factor analysis,JFA) 与特征映射(Feature mapping,FM)的失配信息补偿算法。该算法首先用联合因子分析法计算说话人发音模式信息,并对发音模式因子和发音模式空间参数进行优化,接着对语音参数用发音模式信息进行特征映射后再进行训练和识别,以减少发音模式对系统的影响。实验结果表 明,基于因子分析和特征映射的方法可以有效地提取训练语音中的说话人信息,提高耳语识别系统的识别率。  相似文献   

3.
提出了一种基于Bark子波变换和概率神经网络(PNN)的语音识别模型。利用符合人耳听觉特性的Bark滤波器组进行信号重构并提取语音特征,然后利用训练好的概率神经网络进行识别。通过训练大量语音样本来构成语音识别库,并建立综合识别系统。实验结果表明该方法与传统的LPCC/DTW和MFCC/DWT方法相比,识别率分别提高了14.9%和10.1%,达到了96.9%的识别率。  相似文献   

4.
关于生物特征识别问题,人耳的听觉识别精度很重要.识别研究难点在于如何选取有效的耐噪特征参数,以提高识别率,传统的特征参数都将语音视为一种平稳信号进行处理,不能很好的反映语音信号的动态特性,故不能得到较好的识别率.针对提高抗噪声性能和识别声信精度,提出了一种新的特征参数(DWP-MFCC),用在感知倒谱分析(Mel-Cepstrum)的基础上引入多分辨率小波包分析技术,通过提高时频分辨率,增强语音动态信息,克服了原有单一线性分析的不足,并基于矢量量化(VQ)系统进行说话人识别实验.实验证明,与LPCC和MFCC参数相比采用新方法使系统的识别率得到显著的提高.  相似文献   

5.
为了提高语音识别的鲁棒性,提出一种新的特征组合方法。方法基于F比对梅尔频率倒谱系数(MFCC)进行加权优化,同时将不同特征组合输入到语音隐马尔科夫模型(HMM)进行训练,得到具有抗噪性的最佳组合,并采用主成分分析(PCA)进行降维,增加支持向量机(SVM)分类器作为后处理器。实验表明,改进的MFCC、短时平均能量和Teager能量算子组合参数识别效果最优,识别率达到90. 48%。PCA降维后识别率降低了0. 4%,提升了计算速度。增加后处理器,系统识别率达到95. 25%,提高了系统的识别效率和分类决策力,相对于常规识别方法,准确率有所提高。  相似文献   

6.
基于改进LPCC和MFCC的汉语耳语音识别   总被引:5,自引:0,他引:5       下载免费PDF全文
以提高汉语耳语的识别率为目的,提出了将MFCC、LPCC及它们各自的动态参数等多种特征有效结合进行耳语识别的方法。实验结果说明了LPCC、MFCC结合动态参数可作为汉语耳语音识别的特征参数,且它们的结合提高了系统的识别率,在小字库内得出的识别率为94.5%。  相似文献   

7.
基于HMM的安多藏语非特定人孤立词语音识别研究   总被引:1,自引:0,他引:1  
以VC++6.0为开发平台,实现一个基于隐马尔可夫模型(Hidden Markov Model,简称HMM)非特定人的安多藏语孤立词语音识别系统。对有声段语音进行MFCC参数的提取,对提取后的MFCC参数进行矢量量化后训练HMM模型,形成特征模板库,最后进行识别。根据安多藏语的特点,改进端点检测的方法,提高了孤立词语音信号检测的准确性,并进一步提高了识别率。  相似文献   

8.
目前说话人识别系统在理想环境下识别率已可达90%以上,但在实际通信环境下识别率却迅速下降.本文对信道失配环境下的鲁棒说话人识别进行研究.首先建立了一个基于高斯混合模型(GMM)的说话人识别系统,然后通过对实际通信信道的测试和分析,提出了两种改进方法.一是由实测数据建立了一个通用信道模型,将干净语音经通用信道模型滤波后再作为训练语音训练说话人模型;二是通过对比实测信道﹑理想低通信道及语音梅尔倒谱系数(MFCC)的特点,提出合理舍去语音第一﹑二维特征参数的方法.实验结果表明,通过处理后,系统在通信环境下的识别率提升了20%左右,与传统的倒谱均值减(CMS)方法相比,识别率提高了9%-12%.  相似文献   

9.
针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。  相似文献   

10.
声纹识别技术实现的关键点在于从语音信号中提取语音特征参数,此参数具备表征说话人特征的能力。基于GMM-UBM模型,通过Matlab实现文本无关的声纹识别系统,对主流静态特征参数MFCC、LPCC、LPC以及结合动态参数的MFCC,从说话人确认与说话人辨认两种应用角度进行性能比较。在取不同特征参数阶数、不同高斯混合度和使用不同时长的训练语音与测试语音的情况下,从理论识别效果、实际识别效果、识别所用时长、识别时长占比等多个方面进行了分析与研究。最终结果表明:在GMM-UBM模式识别方法下,三种静态特征参数中MFCC绝大多数时候具有最佳识别效果,同时其系统识别耗时最长;识别率与语音特征参数的阶数之间并非单调上升关系。静态参数在结合较佳阶数的动态参数时能够提升识别效果;增加动态参数阶数与提高系统识别效果之间无必然联系。  相似文献   

11.
Whispered speech speaker identification system is one of the most demanding efforts in automatic speaker recognition applications. Due to the profound variations between neutral and whispered speech in acoustic characteristics, the performance of conventional speaker identification systems applied on neutral speech degrades drastically when compared to whisper speech. This work presents a novel speaker identification system using whispered speech based on an innovative learning algorithm which is named as extreme learning machine (ELM). The features used in this proposed system are Instantaneous frequency with probability density models. Parametric and nonparametric probability density estimation with ELM was compared with the hybrid parametric and nonparametric probability density estimation with Extreme Learning Machine (HPNP-ELM) for instantaneous frequency modeling. The experimental result shows the significant performance improvement of the proposed whisper speech speaker identification system.  相似文献   

12.
论文针对带噪的耳语音提出了一种利用ADALINE神经网络消除背景噪声的耳语音增强算法。首先利用传统的谱减法来取得较好的谱包络,在此基础上使用AD线性神经网络进行自适应预测以达到提高耳语音质量的目的。结果表明,即使在低信噪比的情况下,信噪比也能提高20 dB左右,而且取得了良好的听觉效果。  相似文献   

13.
Speaker identification from the whispered speech is of great importance in the field of forensic science as well as in many other applications. Whispered speech shows many changes in the characteristics to its neutral counterpart. Hence the task of identification becomes difficult. This paper presents the use of only well-performing timbrel features selected by Hybrid selection method and effect of distance measures used in KNN classifier on the identification accuracy. The results using timbrel features are compared with MFCC features; the accuracy with the former is observed higher. KNN classifier with most probable distance function suitable for a whispered database like Euclidean and City-block are also compared. The combination of timbrel features and KNN classifiers with city block distance function have reported the highest identification accuracy.  相似文献   

14.
建立了一个小型耳语音库,并分析了耳语音的特点。在此基础上引入基于子带功率谱熵的改进谱减法对耳语音进行增强处理。该方法通过分析耳语音信号的子带功率谱熵,检测出耳语音的噪音段和语音段,然后对噪音段和语音段分别进行改进谱减处理,以达到良好的去噪效果。实验证明:此方法能有效分离出耳语音的噪声段和语音段,与传统谱减法相比,信噪比有了较大的提高。  相似文献   

15.
以提高汉语耳语的识别率为目的,提出了基于概率神经网(PNN)的语音识别系统。实验结果说明该方法提高了系统的识别率,大大缩短了识别时间,提高了整个系统的实时性。在小字库内得出的识别率为94.7%。  相似文献   

16.
由于耳语音信噪比较低,采用传统的算法进行耳语音端点检测存在正确率低、抗噪性能差等问题。提出了一种基于希尔伯特-黄变换瞬时能频值的耳语音端点检测的算法。运用希尔伯特-黄变换,分离出耳语音的瞬时幅值与频率,提取基于时间-能量-频率的特征参数瞬时能频值,利用该特征值对耳语音和噪声进行区分,进行端点检测。对700个信噪比为2~10 dB的耳语音测试样本进行仿真实验,该算法检测的起点正确率与终点正确率均高于零能积法、熵法以及拟和特征法。实验表明,该算法适应于多种非平稳噪声环境,能较好地检测耳语音的端点。  相似文献   

17.
针对语音情感的动态特性,利用动态递归Elman神经网络实现语音情感识别系统。通过连接记忆上时刻状态与当前网络一并输入,实现Elman网络模型的状态反馈。基于此设计了语音情感识别系统,该系统能在后台修改网络类型,并实现单语句与批量语句识别模式。针对系统进行语音情感识别实验表明,基于Elman神经网络的语音情感识别在同等参数模型设置前提下优于BP神经网络识别效果,且BP神经网络参数设置较Elman网络敏感。  相似文献   

18.
This work attempts to convert a given neutral speech to a target emotional style using signal processing techniques. Sadness and anger emotions are considered in this study. For emotion conversion, we propose signal processing methods to process neutral speech in three ways: (i) modifying the energy spectra (ii) modifying the source features and (iii) modifying the prosodic features. Energy spectra of different emotions are analyzed, and a method has been proposed to modify the energy spectra of neutral speech after dividing the speech into different frequency bands. For the source part, epoch strength and epoch sharpness are extensively studied. A new method has been proposed for modification and incorporation of epoch strength and epoch sharpness parameters using appropriate modification factors. Prosodic features like pitch contour and intensity have also been modified in this work. New pitch contours corresponding to the target emotions are derived from the pitch contours of neutral test utterances. The new pitch contours are incorporated into the neutral utterances. Intensity modification is done by dividing neutral utterances into three equal segments and modifying the intensities of these segments separately, according to the modification factors suitable for the target emotions. Subjective evaluation using mean opinion scores has been carried out to evaluate the quality of converted emotional speech. Though the modified speech does not completely resemble the target emotion, the potential of these methods to change the style of the speech is demonstrated by these subjective tests.  相似文献   

19.
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号