首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
在语音识别特征提取过程中,为克服传统自相关法在计算特征参数时实时性较差的缺点,提出一种用于提取频率规整线性预测系数(WLPC)的自适应最小均方误差(LMS)算法。该方法通过自适应LMS技术,不仅能提取出符合人耳的听觉特性的特征参数,而且实现了对WLPC系数的实时提取。实验采用DTW(动态时间规整)算法,对比了自相关法WLPC预测误差和自适应法WLPC两种特征参数对孤立词识别率的影响结果和预测误差,结果证明了采用该算法具有较高的分类准确率和良好的时间性能。  相似文献   

2.
对形状轮廓相似目标进行识别时,应用全局特征很难得到有效的鉴别结果,针对这一问题,提出了一种基于Con-tourlet、核主成分分析+Fisher线性辨别(KPCA+FLD)的特征提取方法。选取Contourlet分解后提取出来的多尺度局部特征,以加权求和的方式进行融合处理,选用KFD(KPCA+FLD)对融合后的特征进行降维,选择鉴别力强的特征。最后通过一系列的仿真实验,包括选用不同的特征提取方法、分解层次、核函数、融合权重,验证了该特征提取方法的有效性。  相似文献   

3.
叶吉祥  庞欢 《计算机工程与应用》2012,48(11):214-217,223
语音情感计算引起了国内外广泛的关注,特别是在语音情感特征提取方面做了大量的研究。利用经验模态分解(EMD)方法对情感语音进行处理,得到情感语音的前4阶固有模态函数(IMF),并将前4阶IMF分别通过Hilbert变换得到其瞬时频率和瞬时振幅。提取它们的统计特征,再结合情感语音的声学特征共同组成情感特征向量,并对特征向量做归一化处理。利用支持向量机(SVM)对四种情感语音即生气、高兴、悲伤和平静进行识别。实验结果表明该方法的识别效果较好。  相似文献   

4.
一种基于MVDR和CCBC的抗噪语音识别方法   总被引:1,自引:0,他引:1  
提出了一种适用于抗噪声语音识别的方法,其特征提取过程基于最小方差无失真响应(Minimum variance distortionles sresponse,MVDR)谱估计方法,并对该特征进行频率弯折以提高其知觉分辨率,最后使用基于正则相关分析的谱变换补偿(Canonical correlation based on compensation,CCBC)法对该特征进行自适应处理,从而提高了系统的鲁棒性。在展览馆噪声、人群噪声和汽车噪声下,与基于传统Mel倒谱系数(MFCC)特征的系统进行了对比实验,结果表明使用本文方法的语音识别系统的识别率得到了显著的提高。  相似文献   

5.
为了改善传统语音特征参数在复杂环境下识别性能不足的问题,提出了一种基于Gammatone滤波器和子带能量规整的语音特征提取方法.该方法以能量规整倒谱系数(PNCC)特征算法为基础,在前端引入平滑幅度包络和归一化Gammatone滤波器组,并通过子带能量规整方法抑制真实环境的背景噪声,最后在后端进行特征弯折和信道补偿处理加以改进.实验采用高斯混合通用背景分类器模型(GMM-UBM)将该算法和其他特征参数进行对比.结果表明,在多种噪声环境中相比其他特征参数,本文方法表现出良好的抗噪能力,即使在低信噪比下仍有较好的识别效果.  相似文献   

6.
提出一种双向二维加权局部保持投影算法(Two-directional Two-dimensional Weighted Locality Preserving Projections,(2D)2WLPP)用于语音特征提取后维度的降低,考虑到普通的二维降维算法只能从一个方向进行特征降维且所降至的维数选择非常受限,该方法能够从水平和垂直两个方向对语音矩阵进行降维处理,这样可以大大降低提取后的语音特征数目;考虑到不同投影向量对保持局部结构的重要程度不同,进而对各个特征赋予不同的权重系数.实验证明,该算法运算速度快,与已有的二维局部保持投影相比,获得了更高的识别率.  相似文献   

7.
有效的基于内容的音频特征提取方法   总被引:1,自引:1,他引:0       下载免费PDF全文
音频特征提取是音频分类的基础,好的特征将会有效提高分类精度。在提取频域特征Mel频率倒谱系数(MFCC)的同时,对每一帧信号做离散小波变换,提取小波域特征,把频域和小波域特征相结合计算其统计特征。通过SVM模型建立音频模板,对纯语音、音乐及带背景音乐的语音进行分类识别,取得了较高的识别精度。  相似文献   

8.
传统的冯诺依曼架构在处理语音等复杂信息时能效较低,神经形态电路更适合于语音等复杂信息的智能处理。常用的音频场景识别方式中的长时特征和短时特征都有其不足之处,卷积神经网络可通过训练提取适合后续分类任务的特征,在特征提取方面有更大的优势。针对四层的卷积神经网络的特征提取及分析方法在语谱图上进行了音频场景识别的研究,并验证了音频场景识别在神经形态电路-类脑计算芯片上的可实现性。  相似文献   

9.
针对即时定位与建图技术中点线视觉里程计在环境纹理发生变化时运行效率低下的问题,设计了一种基于环境信息熵的特征提取自适应优化器,以提高原有点线视觉里程计算法的效率及鲁棒性。优化器以图像信息熵作为主要影响因子,确定里程计的最优提取特征,生成包含特征提取选择的策略信息地图;对未探索区域的纹理环境进行预判性计算,与策略地图快速匹配,得到该区域的最优特征提取策略。在TUM数据集环境下测试了具有优化器的点线视觉里程计(APL-VO)的平均处理时间及建图效果。实验结果显示,与原有算法相比,具有自适应优化器的点线视觉里程计在复合环境中具有更强的鲁棒性及建图效率。  相似文献   

10.
孤立性肺结节的检测是肺癌早期诊断的关键。针对传统点增强滤波器虽然对结节增强具有很好的敏感性,但是却产生很多假阳性区域的问题,提出一种通过计算3维增强密度指数和判别规则来识别肺结节的方法。首先采用自适应双边滤波器对CT图像序列进行降噪和平滑处理;然后计算对应的Hessian矩阵及其特征值得到预增强系数,并获得感兴趣体区域,通过对预增强系数的分析来构造3维增强密度指数;最后应用判别规则对感兴趣体进行识别。针对两个肺部CT图像数据集对该方法进行了测试,结果表明,在识别孤立性肺结节方面该方法是有效的。  相似文献   

11.
基于KL-小波包分析的文本无关的说话人识别   总被引:2,自引:0,他引:2  
提出了一种新的文本无关的说话人识别方法,它通过KL变换对语音进行规整、去噪,再利用小波包分解系数弹性地选择频带,提高时频分辨率,更好地提取出说话人的特征。实验证明,这种方法不仅具有较高的识别率,并且在嘈杂环境下也能保持稳定的性能。  相似文献   

12.
Online signature verification using a new extreme points warping technique   总被引:2,自引:0,他引:2  
There are two common methodologies to verify signatures: the functional approach and the parametric approach. In this paper, we propose a new warping technique for the functional approach in signature verification. The commonly used warping technique is dynamic time warping (DTW). It was originally used in speech recognition and has been applied in the field of signature verification with some success since two decades ago. The new warping technique we propose is named as extreme points warping (EPW). It proves to be more adaptive in the field of signature verification than DTW, given the presence of the forgeries. Instead of warping the whole signal as DTW does, EPW warps a set of selected important points. With the use of EPW, the equal error rate is improved by a factor of 1.3 and the computation time is reduced by a factor of 11.  相似文献   

13.
Automatic recognition of children’s speech using acoustic models trained by adults results in poor performance due to differences in speech acoustics. These acoustical differences are a consequence of children having shorter vocal tracts and smaller vocal cords than adults. Hence, speaker adaptation needs to be performed. However, in real-world applications, the amount of adaptation data available may be less than what is needed by common speaker adaptation techniques to yield reasonable performance. In this paper, we first study, in the discrete frequency domain, the relationship between frequency warping in the front-end and corresponding transformations in the back-end. Three common feature extraction schemes are investigated and their transformation linearity in the back-end are discussed. In particular, we show that under certain approximations, frequency warping of MFCC features with Mel-warped triangular filter banks equals a linear transformation in the cepstral space. Based on that linear transformation, a formant-like peak alignment algorithm is proposed to adapt adult acoustic models to children’s speech. The peaks are estimated by Gaussian mixtures using the Expectation-Maximization (EM) algorithm [Zolfaghari, P., Robinson, T., 1996. Formant analysis using mixtures of Gaussians, Proceedings of International Conference on Spoken Language Processing, 1229–1232]. For limited adaptation data, the algorithm outperforms traditional vocal tract length normalization (VTLN) and maximum likelihood linear regression (MLLR) techniques.  相似文献   

14.
本文在分析基于短时能量的语音端点检测算法局限的基础上,引入短时信噪比SNR估计方法,并设计自适应的判决门限,提出一种自适应语音端点检测算法.通过对平稳高斯白噪声环境下信噪比从-10dB到20dB的带噪语音信号进行的仿真实验表明,所提方法能更为准确地检测到语音的端点.  相似文献   

15.
在声纹密码任务中由于数据稀疏的问题难以实现区分性训练,本文以一种表征距离度量的特征矢量为基础提出新的声纹密码区分性系统框架,对正反例样本的新特征矢量实现了基于最小分类错误准则的区分性训练,将声纹密码从确认问题转化为二类分类问题。在自由说话风格的60人数据集上,声纹密码区分性系统与混合高斯模型-通用背景模型(Gaussian mixture model-universal background model,GMM-UBM)系统融合后等错误率为4.48%,相对GMM-UBM,动态时间规划(Dynamic time warping,DTW)基线系统性能分别提升了17.95%和59.68%。  相似文献   

16.
在DCT域进行LDA的唇读特征提取方法   总被引:3,自引:0,他引:3       下载免费PDF全文
为解决视觉语言特征提取这个唇读技术中最关键的难题,提出一种新的基于DCT和LDA的特征提取方法。为提取对不同口型最具分类能力的特征矢量,首先基于DCT对视觉语言部位变换降维,然后基于LDA算法从DCT系数提取对口型分类性能最优的特征矢量。在特定人与非特定人的唇读数据库上以及实时唇读识别的实验都表明,该方法唇读识别率比传统的人工直接选择DCT系数法以及PCA提取法有明显提高。  相似文献   

17.
Automatic recognition of the speech of children is a challenging topic in computer-based speech recognition systems. Conventional feature extraction method namely Mel-frequency cepstral coefficient (MFCC) is not efficient for children's speech recognition. This paper proposes a novel fuzzy-based discriminative feature representation to address the recognition of Malay vowels uttered by children. Considering the age-dependent variational acoustical speech parameters, performance of the automatic speech recognition (ASR) systems degrades in recognition of children's speech. To solve this problem, this study addresses representation of relevant and discriminative features for children's speech recognition. The addressed methods include extraction of MFCC with narrower filter bank followed by a fuzzy-based feature selection method. The proposed feature selection provides relevant, discriminative, and complementary features. For this purpose, conflicting objective functions for measuring the goodness of the features have to be fulfilled. To this end, fuzzy formulation of the problem and fuzzy aggregation of the objectives are used to address uncertainties involved with the problem.The proposed method can diminish the dimensionality without compromising the speech recognition rate. To assess the capability of the proposed method, the study analyzed six Malay vowels from the recording of 360 children, ages 7 to 12. Upon extracting the features, two well-known classification methods, namely, MLP and HMM, were employed for the speech recognition task. Optimal parameter adjustment was performed for each classifier to adapt them for the experiments. The experiments were conducted based on a speaker-independent manner. The proposed method performed better than the conventional MFCC and a number of conventional feature selection methods in the children speech recognition task. The fuzzy-based feature selection allowed the flexible selection of the MFCCs with the best discriminative ability to enhance the difference between the vowel classes.  相似文献   

18.
The new model reduces the impact of local spectral and temporal variability by estimating a finite set of spectral and temporal warping factors which are applied to speech at the frame level. Optimum warping factors are obtained while decoding in a locally constrained search. The model involves augmenting the states of a standard hidden Markov model (HMM), providing an additional degree of freedom. It is argued in this paper that this represents an efficient and effective method for compensating local variability in speech which may have potential application to a broader array of speech transformations. The technique is presented in the context of existing methods for frequency warping-based speaker normalization for ASR. The new model is evaluated in clean and noisy task domains using subsets of the Aurora 2, the Spanish Speech-Dat-Car, and the TIDIGITS corpora. In addition, some experiments are performed on a Spanish language corpus collected from a population of speakers with a range of speech disorders. It has been found that, under clean or not severely degraded conditions, the new model provides improvements over the standard HMM baseline. It is argued that the framework of local warping is an effective general approach to providing more flexible models of speaker variability.  相似文献   

19.
This paper presents a method for the estimation and mapping of parametric models of speech resonance at formants for voice conversion. The spectral features at formants that contribute to voice characteristics are the trajectories of the frequencies, the bandwidths and intensities of the resonance at formants. The formant features are extracted from the poles of a linear prediction (LP) model of speech. The statistical distributions of formants are modelled by a two-dimensional hidden Markov model (HMM) spanning the time and frequency dimensions. Experimental results are presented which show a close match between HMM-based formant models and the histograms of formants. For voice conversion two alternative methods are explored for mapping the formants of a source speaker to those of a target speaker. The first method is based on an adaptive formant-tracking warping of the frequency response of the LP model and the second method is based on the rotation of the poles of the LP model of speech. Both methods transform all spectral parameters of the resonance at formants of the source speaker towards those of the target speaker. In addition, the issues affecting the selection of the warping ratios for the mapping functions are investigated. Experimental results of formant estimation and perceptual evaluation of voice morphing based on parametric formant models are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号