首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
几种语音识别特征参数的研究   总被引:2,自引:0,他引:2  
语音识别就是研究让机器最终能听懂人类口述的自然语言的一门学科,具有广阔的应用前景。在语音识别系统中,提取语音的特征参数是系统的关键问题之一。文中首先分析了常用的线性预测参数LPCC、梅尔倒谱参数MFOC及其它们的一阶差分△LPCC和△MFCC的原理和实现方法,提取了LPOC十△LPCC和MFCC+△MFCC,两种参数。其次,讨论了动态时间弯折DTW识别算法。最后,在Matlab平台上,分别采用LPCC、LPCC+△LPCC,MFCC,MFCC+△MFCC作为特征参数结合DTW识别算法进行实验仿真,结果表明MFCC+△MFCC参数的识别率最高,LPCC的识别率最低。  相似文献   

2.
基于MFCC和LPCC的说话人识别   总被引:8,自引:0,他引:8  
MFCC参数和LPCC参数是说话人识别中两种最常用的特征参数,研究了MFCC和LPCC参数提取的算法原理及差分倒谱参数的提取方法,采用MFCC、LPCC及其一阶、二阶差分作为特征参数,通过k均值算法与三层BP神经网络来进行说话人识别.实验结果表明,该方法可以有效提高识别率,同时也验证MFCC参数的鲁棒性优于LPCC参数.  相似文献   

3.
线性预测倒谱参数(LPCC)能很好的体现人的声道特性,而梅尔倒谱参数(MFCC)能很好的模拟人耳的听觉效应。针对MFCC在不同频率段的识别精度不一致和LPCC不能准确模拟人的听觉系统问题,将MFCC参数和IMFCC参数分别作为语音不同频率段的特征参数,结合线性预测参数(LPCC),均衡滤波器的分布,完整覆盖到整个频率段范围。将梅尔倒谱参数和线性预测参数结合起来作为语音识别的特征提取参数。实验结果表明,改进之后的算法从效率上和识别率上都有不同程度的提高。  相似文献   

4.
为了解决特征提取计算量大且特征参数不够全面的问题,提出了用主成分分析和K-means聚类进行语音特征参数提取的方法。通过对说话人识别系统中最常用的线性预测倒谱系数( LPCC)参数和梅尔倒谱系数( MFCC)参数提取原理以及差分参数的提取算法深入研究,选择LPCC、MFCC以及其一阶差分参数的组合作为最终混合特征参数。首先用主成分分析降低每一帧语音信号特征参数的阶数,然后经过K-means聚类降低帧数,最后通过矢量量化( VQ)来进行说话人识别。实验结果表明,该方法降低了计算复杂度,同时也提升了识别准确性。  相似文献   

5.
基于改进LPCC和MFCC的汉语耳语音识别   总被引:5,自引:0,他引:5  
以提高汉语耳语的识别率为目的,提出了将MFCC、LPCC及它们各自的动态参数等多种特征有效结合进行耳语识别的方法。实验结果说明了LPCC、MFCC结合动态参数可作为汉语耳语音识别的特征参数,且它们的结合提高了系统的识别率,在小字库内得出的识别率为94.5%。  相似文献   

6.
杜晓青  于凤芹 《计算机工程》2013,(11):197-199,204
Mel频率倒谱系数(MFCC)与线性预测倒谱系数(LPCC)融合算法只能反映语音静态特征,且LPCC对语音低频局部特征描述不足。为此,提出将希尔伯特黄变换(HHT)倒谱系数与相对光谱一感知线性预测倒谱系数(RASTA—PLPCC)融合,得到一种既反映发声机理又体现人耳感知特性的说话人识别算法。HHT倒谱系数体现发声机理,能反映语音动态特性,并更好地描述信号低频局部特征,可改进LPCC的不足。PLPCC体现人耳感知特性,识别性能强于MFCC,用3种融合算法对两者进行融合,将融合特征用于高斯混合模型进行说话人识别。仿真实验结果表明,该融合算法较已有的MFCC与LPCC融合算法识别率提高了8.0%。  相似文献   

7.
本文主要论述了一种小词表语音识别系统的硬、软件设计方法。系统以DSP5416为硬件平台,采用非线性美尔刻度倒谱参数(MFCC)为特征参数提取算法,动态时间规整(DTW)作为识别算法,实现了语音识别系统的设计。实验结果表明平均语音识别率不低于90%,取得良好的识别效果。  相似文献   

8.
本文主要论述了一种小词表语音识别系统的硬、软件设计方法。系统以DSP5416为硬件平台,采用非线性美尔刻度倒谱参数(MFCC)特征参数提取算法,动态时间规整(DTW)作为识别算法,实现了语音识别系统的设计。实验结果表明平均语音识别率不低于90%,取得良好的识别效果。  相似文献   

9.
针对语音信号特征参数LPCC和MFCC相结合后数据维数过高,导致识别器性能下降的问题,提出采用遗传算法对初始特征参数进行降维,来提高识别性能.首先提取语音信号的LPCC和MFCC,然后采用遗传算法对其进行特征降维,最后将得到的低维数据送入支持向量机进行识别.仿真实验结果表明,采用遗传算法进行特征降维与传统的PCA降维相比,识别率提高了12.2%,和初始特征相比识别率降低了1.23%,但是识别时间提高了4.5倍.  相似文献   

10.
基于Fisher比的梅尔倒谱系数混合特征提取方法   总被引:1,自引:0,他引:1  
针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。  相似文献   

11.
MFCC特征改进算法在语音识别中的应用   总被引:2,自引:0,他引:2  
本文的目的是阐明一种Mel频率倒谱参数特征的改进算法。该算法是通过线性预测的方法从语音信号中提取出残差相位,同时将残差相位与传统的MFCC相结合,并应用到语音识别系统中。该改进算法比传统的MFCC算法具有更好的识别率。  相似文献   

12.
为了提高噪声中的说话人识别率,根据各维倒谱系数鉴别能力的不同,在识别过程中对GMM(Gauss mixture model)模型的各维分量直接加权,提出了直接倒谱加权的GMM模型,并且研究了在噪声情况下衡量各维特征鉴别能力的新方法。将该方法与MMSE(Minimum mean square error)相融合,对白噪声和地铁噪声进行实验,得到基线系统和MMSE增强系统在不同噪声情况下最优的加权窗函数。试验结果表明,直接倒谱加权GMM能显著提高系统识别精度。  相似文献   

13.
MVA Processing of Speech Features   总被引:1,自引:0,他引:1  
In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) filtering directly in the cepstral domain. We call this technique mean subtraction, variance normalization, and ARMA filtering (MVA) post-processing, and speech features with MVA post-processing are called MVA features. Overall, compared to raw features without post-processing, MVA features achieve an error rate reduction of 45% on matched tasks and 65% on mismatched tasks on the Aurora 2.0 noisy speech database, and an average 57% error reduction on the Aurora 3.0 database. These improvements are comparable to the results of much more complicated techniques even though MVA is relatively simple and requires practically no additional computational cost. In this paper, in addition to describing MVA processing, we also present a novel analysis of the distortion of mel-frequency cepstral coefficients and the log energy in the presence of different types of noise. The effectiveness of MVA is extensively investigated with respect to several variations: the configurations used to extract and the type of raw features, the domains where MVA is applied, the filters that are used, the ARMA filter orders, and the causality of the normalization process. Specifically, it is argued and demonstrated that MVA works better when applied to the zeroth-order cepstral coefficient than to log energy, that MVA works better in the cepstral domain, that an ARMA filter is better than either a designed finite impulse response filter or a data-driven filter, and that a five-tap ARMA filter is sufficient to achieve good performance in a variety of settings. We also investigate and evaluate a multi-domain MVA generalization  相似文献   

14.
We are presenting a new method that improves the accuracy of text dependent speaker verification systems. The new method exploits a set of novel speech features derived from a principal component analysis of pitch synchronous voiced speech segments. We use the term principal pitch components (PPCs) or optimal pitch bases (OPBs) to denote the new feature set. Utterance distances computed from these new PPC features are only loosely correlated with utterance distances computed from cepstral features. A distance measure that combines both cepstral and PPC features provides a discriminative power that cannot be achieved with cepstral features alone. By augmenting the feature space of a cepstral baseline system with PPC features we achieve a significant reduction of the equal error probability of incorrect customer rejection versus incorrect impostor acceptance. The proposed method delivers robust performance in various noise conditions.  相似文献   

15.
Data-driven temporal filtering approaches based on a specific optimization technique have been shown to be capable of enhancing the discrimination and robustness of speech features in speech recognition. The filters in these approaches are often obtained with the statistics of the features in the temporal domain. In this paper, we derive new data-driven temporal filters that employ the statistics of the modulation spectra of the speech features. Three new temporal filtering approaches are proposed and based on constrained versions of linear discriminant analysis (LDA), principal component analysis (PCA), and minimum class distance (MCD), respectively. It is shown that these proposed temporal filters can effectively improve the speech recognition accuracy in various noise-corrupted environments. In experiments conducted on Test Set A of the Aurora-2 noisy digits database, these new temporal filters, together with cepstral mean and variance normalization (CMVN), provide average relative error reduction rates of over 40% and 27% when compared with baseline Mel frequency cepstral coefficient (MFCC) processing and CMVN alone, respectively.  相似文献   

16.
The playback speech contains information from the environment, playback and recorder used. This work focuses on proposal of a novel normalization scheme, namely, low frequency frame-wise normalization (LFFN) as one of the modules in feature extraction process that is hypothesized to help in capturing the artifacts from the playback speech. It is based on low frequency bin processing that is performed frame-wise and hence its name. The constant-Q transform (CQT) based features are found to provide the benchmark results for detection of spoofing attacks. In this work, LFFN is combined with CQT to extract two new features from octave and linear power spectra, respectively. The first one is obtained by CQT, LFFN and octave segmentation that is referred to as constant-Q normalization segmentation coefficients (CQNSC). The latter uses conventional constant-Q cepstral coefficient (CQCC) and LFFN to obtain constant-Q normalization cepstral coefficients (CQNCC). The studies are performed on ASVspoof 2017 version 2.0 corpus that is designed for studying playback speech detection. The experimental results show the effectiveness of proposed LFFN with CQT based features. We obtain equal error rate of 10.63% and 10.31% for CQNSC and CQNCC features on the evaluation set of ASVspoof 2017 version 2.0 corpus, respectively.  相似文献   

17.
一种改进的基于倒谱特征的带噪端点检测方法   总被引:6,自引:0,他引:6  
影响语音识别性能的一个关键因素是端点检测的准确性。实际应用中的信噪比较低,使得某些高信噪比下性能好的检测算法不能有效地工作,影响系统的识别率。该文针对基于倒谱特征的带噪端点检测算法提出了3点改进:(1)将语音信号经滤波后分成高低频两子带,分别进行分析;(2)用LPC美尔倒谱特征LPCCMCC代替常规倒谱特征作为特征参数;(3)改进噪声估计,使其具有自适应性。实验结果表明本方法在低信噪比下有较好的检测性能。  相似文献   

18.
基于FMFCC和HMM的说话人识别   总被引:2,自引:0,他引:2  
张永亮  张先庭  鲁宇明 《计算机仿真》2010,27(5):352-354,358
美尔频率倒谱系数(MFCC)是说话人识别中常用的特征参数,而语音信号是非平稳信号,MFCC并不能很好的反映语音的时频特性。针对这一缺陷,为了提高说话人的识别率,结合新的时频分析工具分数傅立叶变换(FRFT)。将MFCC推广到分数形式,得到分数美尔频率倒谱系数(FMFCC),用以表征语音信号的特征;并利用可分性测度验证了特征参数的有效性;通过建立20个不同说话人的FMFCC特征库,采用隐马尔可夫模型(HMM)对说话人进行仿真识别。仿真结果表明,在合适的变换阶次下,说话人的平均识别率可达93%以上。  相似文献   

19.
This paper presents a new fingerprint recognition method based on mel-frequency cepstral coefficients (MFCCs). In this method, cepstral features are extracted from a group of fingerprint images, which are transformed first to 1-D signals by lexicographic ordering. MFCCs and polynomial shape coefficients are extracted from these 1-D signals or their transforms to generate a database of features, which can be used to train a neural network. The fingerprint recognition can be performed by extracting features from any new fingerprint image with the same method used in the training phase. These features are tested with the neural network. The different domains are tested and compared for efficient feature extraction from the lexicographically ordered 1-D signals. Experimental results show the success of the proposed cepstral method for fingerprint recognition at low as well as high signal to noise ratios (SNRs). Results also show that the discrete cosine transform (DCT) is the most appropriate domain for feature extraction.  相似文献   

20.
Automatic speaker verification (ASV) systems are highly vulnerable against spoofing attacks. Anti-spoofing, determining whether a speech signal is natural/genuine or spoofed, is very important for improving the reliability of the ASV systems. Spoofing attacks using the speech signals generated using speech synthesis and voice conversion have recently received great interest due to the 2015 edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015). In this paper, we propose to use linear prediction (LP) residual based features for anti-spoofing. Three different features extracted from LP residual signal were compared using the ASVspoof 2015 database. Experimental results indicate that LP residual phase cepstral coefficients (LPRPC) and LP residual Hilbert envelope cepstral coefficients (LPRHEC) obtained from the analytic signal of the LP residual yield promising results for anti-spoofing. The proposed features are found to outperform standard Mel-frequency cepstral coefficients (MFCC) and Cosine Phase (CosPhase) features. LPRPC and LPRHEC features give the smallest equal error rates (EER) for eight spoofing methods out of ten spoofing attacks in comparison to MFCC and CosPhase features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号