首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 234 毫秒
1.
林朗  王让定  严迪群  李璨 《计算机应用》2018,38(6):1648-1652
随着语音技术的发展,以回放语音为代表的各种仿冒语音给声纹认证系统及音频取证技术带来了极大挑战。针对回放语音对声纹认证系统的攻击问题,提出一种基于修正倒谱特征的检测算法。首先,采用变异系数来分析原始语音和回放语音在频域上的差异;然后,有针对性地将提取梅尔倒谱系数(MFCC)过程中的Mel滤波器组换成由linear滤波器和逆Mel滤波器组合的新滤波器组,进而得到基于新滤波器组的修正倒谱特征;最后,使用高斯混合模型(GMM)作为分类器进行分类判别。实验结果表明,修正的倒谱特征能够有效地检测回放语音,其等错误率约为3.45%。  相似文献   

2.
胡峰松  曹孝玉 《计算机工程》2012,38(21):168-170,174
目前主流说话人特征参数在噪声环境中的鲁棒性较差。为此,提出一种可用于说话人识别的听觉倒谱特征系数。分析人耳听觉模型的工作机理,采用Gammatone滤波器组代替传统的三角滤波器组模拟人耳耳蜗的听觉模型,用指数压缩代替固定的对数压缩,模拟人耳听觉模型处理信号的非线性特性。在基于高斯混合模型分类器的识别算法下进行仿真实验,结果表明,该听觉特征具有比梅尔频率倒谱系数和线性预测倒谱系数更好的抗噪声能力。  相似文献   

3.
杜晓青  于凤芹 《计算机工程》2013,(11):197-199,204
Mel频率倒谱系数(MFCC)与线性预测倒谱系数(LPCC)融合算法只能反映语音静态特征,且LPCC对语音低频局部特征描述不足。为此,提出将希尔伯特黄变换(HHT)倒谱系数与相对光谱一感知线性预测倒谱系数(RASTA—PLPCC)融合,得到一种既反映发声机理又体现人耳感知特性的说话人识别算法。HHT倒谱系数体现发声机理,能反映语音动态特性,并更好地描述信号低频局部特征,可改进LPCC的不足。PLPCC体现人耳感知特性,识别性能强于MFCC,用3种融合算法对两者进行融合,将融合特征用于高斯混合模型进行说话人识别。仿真实验结果表明,该融合算法较已有的MFCC与LPCC融合算法识别率提高了8.0%。  相似文献   

4.
从人类听觉特性的角度出发,给出了一种基于人耳耳蜗听觉模型的Gammatone滤波器组模型。提出了一种新的Gammatone频率倒谱系数提取的算法,并在高斯混合模型识别系统中进行仿真实验。实验结果表明,采用这种新算法提取的Gammatone频率倒谱系数,应用于说话人识别系统,系统的识别率及鲁棒性都有明显的提高。  相似文献   

5.
为了提高说话人识别的准确率,可以同时采用多个特征参数,针对综合特征参数中各维分量对识别结果的影响可能不一样,同等对待并不一定是最优的方案这个问题,提出基于Fisher准则的梅尔频率倒谱系数(MFCC)、线性预测梅尔倒谱系数(LPMFCC)、Teager能量算子倒谱参数(TEOCC)相混合的特征参数提取方法。首先,提取语音信号的MFCC、LPMFCC和TEOCC三种参数;然后,计算MFCC和LPMFCC参数中各维分量的Fisher比,分别选出六个Fisher比高的分量与TEOCC参数组合成混合特征参数;最后,采用TIMIT语音库和NOISEX-92噪声库进行说话人识别实验。仿真实验表明,所提方法与MFCC、LPMFCC、MFCC+LPMFCC、基于Fisher比的梅尔倒谱系数混合特征提取方法以及基于主成分分析(PCA)的特征抽取方法相比,在采用高斯混合模型(GMM)和BP神经网络的平均识别率在纯净语音环境下分别提高了21.65个百分点、18.39个百分点、15.61个百分点、15.01个百分点与22.70个百分点;在30 dB噪声环境下,则分别提升了15.15个百分点、10.81个百分点、8.69个百分点、7.64个百分点与17.76个百分点。实验结果表明,该混合特征参数能够有效提高说话人识别率,且具有更好的鲁棒性。  相似文献   

6.
研究语音动态特征参数提取问题,在话者语音识别过程中,动态特征参数可以有效提高识别率.但是传统算法在其提取过程中存在大量干扰冗余信息,造成了识别率降低并带来运算速度的降低.为解决上述副作用,提出在说话人识别系统中,使用一种动态时频倒谱系数参数的方法.上述方法在不减少反应话者个体特征分布特性的前提下,可消除冗余信息并降低样本特征的维度.利用上述方法提取语音特征参数并输入混合高斯-通用背景模型进行说话人语音分类.在Matlab上仿真结果表明,动态时频倒谱系数可有效改进话者语音识别系统的识别正确率.  相似文献   

7.
利用高斯混合模型(GMM)方法进行语音的性别识别.首先概述了特征提取、识别方法及性别识别的过程;然后通过减少提取特征的语音帧数和降低高斯混合模型的混合阶数来提高性别识别速度;最后,将由Mel频率倒谱参数(MFCC)特征和基音频率特征两种方法得到的测试样本后验概率结合,提出新的计算测试样本后验概率的方法.实验表明依据此后验概率能有效提高识别的正确率.  相似文献   

8.
基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。  相似文献   

9.
为了提高利用高压水射流靶物反射声信号识别靶物材质的效率,针对地雷探测过程常见的地雷、石块、砖块和木块4种靶物,采用不同的特征提取方法来识别靶物材质。在分析Mel频率倒谱系数及小波包变换倒谱系数基本原理的基础上,结合靶物反射声信号的特点,提出了一种基于Mel频率倒谱和小波包变换倒谱特征融合的特征提取方法:利用小波包变换将原始靶物反射声信号划分为若干子频段,选取其中一个子频段作为低频和高频的划分层;低频部分提取Mel频率倒谱系数作为特征值,高频部分则提取小波包变换倒谱系数作为特征值,将2组特征值线性合并为一组新的特征向量,用于靶物材质的识别。采用最小二乘支持向量机建立多分类模型,验证基于单一特征和基于特征融合的特征提取方法的识别率。实验结果表明,在取得低频与高频的最佳划分层时,基于特征融合的特征提取方法的平均识别率达到82.812 5%,较单一的利用Mel频率倒谱系数或小波包变换倒谱系数作为特征向量时的平均识别率分别提高了10.312 5%和7.812 5%。  相似文献   

10.
在伴随着外部噪声的情况下,待识别的声纹美尔频率倒谱系数特征各项属性很容易受到外部噪声的干扰发生改变,造成声纹特征的识别的精度不高.为提高精度,提出了一种用支持向量机的美尔频率倒谱系数特征干扰去除算法.确定分类决策函数时充分考虑美尔频率倒谱系数与声纹中心以及噪声之间的关系,并且将声纹特征引入核函数,将原空间样本数据通过非线性变换映射到高维特征空间,在高维空间中求最优或广义最优分类面,实现对语音特征的干扰消除.实验表明,利用改进算法实现了声纹特征中过零率,倒谱特征、矩形窗和汉明窗长的短时能量函数特征的优化.  相似文献   

11.
基于Fisher比的梅尔倒谱系数混合特征提取方法   总被引:1,自引:0,他引:1  
针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。  相似文献   

12.
基于人耳听觉感知的MFCC较其他说话人特征具有强抗噪性、高识别率特点。考虑美尔滤波器组的结构,其只在低频区具有较高的分辨率,在高频区分辨率却较低,这样势必会遗失一些包含在高频区域的重要信息。利用反美尔域下的特征R-MFCC与MFCC的各自优点,将R-MFCC与MFCC结合,形成优势互补,并给出了衡量各种特征参数识别能力的Fisher准则,结合Fisher准则构造出一种新的混合特征参数。采用支持向量机分别以MFCC、R-MFCC以及新构造的混合特征为参数进行说话人的识别,实验证明基于Fisher准则的优选混合特征作为说话人识别特征是可行的。  相似文献   

13.
针对说话人识别系统中存在的有效语音特征提取以及噪声影响的问题,提出了一种新的语音特征提取方法——基于S变换的美尔倒谱系数(SMFCC)。该方法是在传统美尔倒谱系数(MFCC)的基础上利用S变换的二维时频多分辨率特性,以及奇异值分解(SVD)方法的二维时频矩阵有效去噪性,并结合相关统计分析方法最终获得语音特征。采用TIMIT语音数据库,将所提的特征和现有特征进行对比实验。SMFCC特征的等错误率(EER)和最小检测代价(MinDCF)均小于线性预测倒谱系数(LPCC)、MFCC及其结合方法LMFCC,比MFCC的EER和MinDCF08分别下降了3.6%与17.9%。实验结果表明所提方法能够有效去除语音信号中的噪声,提升局部分辨率。  相似文献   

14.
The variations in speech production due to stress have an adverse affect on the performances of speech and speaker recognition algorithms. In this work, different speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Coefficients (CC) and Mel Frequency Cepstral Coefficients (MFCC), are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density characteristics, F-ratio test, Kolmogorov-Smirnov test (KS test) and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. Four different stressed conditions, Neutral, Compassionate, Anger and Happy are tested. The stressed speech database used in this work consists of 600 stressed speech files which are recorded from 30 speakers. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with the VQ classifier. The relative classification results and the relative magnitudes of F-ratio values for SFF, MFCC and CC features are obtained with the same order. SFF and MFCC feature show consistent relative performance for all the three tests, F-ratio, K-S test and VQ classifier.  相似文献   

15.
Speech coding facilitates speech compression without perceptual loss that results in the elimination or deterioration of both speech and speaker specific features used for a wide range of applications like automatic speaker and speech recognition, biometric authentication, prosody evaluations etc. The present work investigates the effect of speech coding in the quality of features which include Mel Frequency Cepstral Coefficients, Gammatone Frequency Cepstral Coefficients, Power-Normalized Cepstral Coefficients, Perceptual Linear Prediction Cepstral Coefficients, Rasta-Perceptual Linear Prediction Cepstral Coefficients, Residue Cepstrum Coefficients and Linear Predictive Coding-derived cepstral coefficients extracted from codec compressed speech. The codecs selected for this study are G.711, G.729, G.722.2, Enhanced Voice Services, Mixed Excitation Linear Prediction and also three codecs based on compressive sensing frame work. The analysis also includes the variation in the quality of extracted features with various bit-rates supported by Enhanced Voice Services, G.722.2 and compressive sensing codecs. The quality analysis of extracted epochs, fundamental frequency and formants estimated from codec compressed speech was also performed here. In the case of various features extracted from the output of selected codecs, the variation introduced by Mixed Excitation Linear Prediction codec is the least due to its unique method for the representation of excitation. In the case of compressive sensing based codecs, there is a drastic improvement in the quality of extracted features with the augmentation of bit rate due to the waveform type coding used in compressive sensing based codecs. For the most popular Code Excited Linear Prediction codec based on Analysis-by-Synthesis coding paradigm, the impact of Linear Predictive Coding order in feature extraction is investigated. There is an improvement in the quality of extracted features with the order of linear prediction and the optimum performance is obtained for Linear Predictive Coding order between 20 and 30, and this varies with gender and statistical characteristics of speech. Even though the basic motive of a codec is to compress single voice source, the performance of codecs in multi speaker environment is also studied, which is the most common environment in majority of the speech processing applications. Here, the multi speaker environment with two speakers is considered and there is an augmentation in the quality of individual speeches with increase in diversity of mixtures that are passed through codecs. The perceptual quality of individual speeches extracted from the codec compressed speech is almost same for both Mixed Excitation Linear Prediction and Enhanced Voice Services codecs but regarding the preservation of features, the Mixed Excitation Linear Prediction codec has shown a superior performance over Enhanced Voice Services codec.  相似文献   

16.
屈百达  李金宝  徐宝国 《计算机应用》2007,27(10):2547-2548
在噪声环境语音识别中,如何提取鲁棒性特征参数是其核心问题之一,首先提出了一种二维根倒谱特征参数,然后,该参数结合基于最小方差无失真响应谱估计的特征参数(PMCC)。最终,发现了一种新颖的鲁棒特征参数,在不同的信噪比下,它能成功地被用于连续语音识别中。试验结果表明,在不同的噪声环境和信噪比下,二维PMCC鲁棒特征参数比传统Mel频率倒谱系数(MFCC)和感知线性预测(PLP)有更好的识别率。  相似文献   

17.
Laguerre滤波器在抗噪语音识别特征提取中的应用   总被引:1,自引:0,他引:1  
为克服FIR滤波器存在的通阻带特性差、滤波器阶次高等缺点给语音识别系统带来的不利影响,采用Laguerre滤波器组代替过零峰值幅度特征提取中使用的FIR滤波器组进行前端处理。在仔细研究FIR滤波器参数确定方法的基础上,叙述了Laguerre滤波器原理及参数计算方法,并给出了计算结果。孤立词、非特定人语音识别实验结果表明,使用Laguerre滤波器不仅使识别系统抗噪性能优于使用FIR滤波器,而且滤波器阶数也大为下降。  相似文献   

18.

Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号