首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
本文提出了一种线性预测分析方法。通过估计频率抽样获得谱包,由归一化频率估计谱包;谱包规定在mel频率级,由IDFT提取抽样自相关估计,我们从抽样自相关的结果最终获得谱包cepstral系数(SEC)。HMM(Hidden Markov Model)识别实验表明,SEC与其它算法相比较,在低信噪比时,识别性能明显提高。  相似文献   

2.
在噪声环境下能准确有效地提取语音信息是语音识别的重点难点,将其应用于嵌入式系统中,有一定的研究意义.通过比较分析传统的语音特征参数提取的方法:线性预测倒谱系数,Mel频率倒谱系数,提出了一种新的方法,采用Mel频率倒谱系数与一阶差分Mel频率倒谱系数(MFCC+ A MFCC)相结合的方法提取语音特征参数,结合双门限检测法进行端点检测和HMM模型进行模型匹配,并进行了以ARMSX2410为核心硬件与软件的系统设计.该方法较传统方法提高了系统的鲁棒性、识别的准确率和系统效率,适用于噪声环境下的语音识别.  相似文献   

3.
徐金甫  韦岗 《计算机工程》2000,26(5):58-59,89
提出了一种抗噪声语音特征。首先计算语音信号单边自相关序列的差分序列,再计算该差分序列的线性预测系数,进一步求出例说系数。实验证明,传统的线性预测例谱系数和边自相关序列的一性预测倒谱数相比,采用单边自相关序列差分序列的线性预测倒谱系数作为语音信号的特征矢量,可以提高语音识别系统对带噪音语音的识别率。  相似文献   

4.
为了提高利用高压水射流靶物反射声信号识别靶物材质的效率,针对地雷探测过程常见的地雷、石块、砖块和木块4种靶物,采用不同的特征提取方法来识别靶物材质。在分析Mel频率倒谱系数及小波包变换倒谱系数基本原理的基础上,结合靶物反射声信号的特点,提出了一种基于Mel频率倒谱和小波包变换倒谱特征融合的特征提取方法:利用小波包变换将原始靶物反射声信号划分为若干子频段,选取其中一个子频段作为低频和高频的划分层;低频部分提取Mel频率倒谱系数作为特征值,高频部分则提取小波包变换倒谱系数作为特征值,将2组特征值线性合并为一组新的特征向量,用于靶物材质的识别。采用最小二乘支持向量机建立多分类模型,验证基于单一特征和基于特征融合的特征提取方法的识别率。实验结果表明,在取得低频与高频的最佳划分层时,基于特征融合的特征提取方法的平均识别率达到82.812 5%,较单一的利用Mel频率倒谱系数或小波包变换倒谱系数作为特征向量时的平均识别率分别提高了10.312 5%和7.812 5%。  相似文献   

5.
语音识别系统及其特征参数的提取研究   总被引:2,自引:0,他引:2  
魏星  周萍 《计算机与现代化》2009,(9):167-168,172
在语音识别系统中,特征参数的选择对系统的识别性能有关键性的影响,本文主要研究几种重要的语音特征参数,包括线性预测倒谱系数、美尔倒谱系数、基于小波分析的参数等,并对这些参数进行了分析和比较,最后对语音识别的研究未来进行了展望.  相似文献   

6.
一种适用于说话人识别的改进Mel滤波器   总被引:1,自引:0,他引:1  
项要杰  杨俊安  李晋徽  陆俊 《计算机工程》2013,(11):214-217,222
Mel倒谱系数(MFcc)侧重提取语音信号的低频信息,对语音信号的频谱分布特性描述不充分,不能有效区分说话人个性信息。为此,通过分析语音信号各频段所含说话人个性信息的不同,结合Mel滤波器和反Mel滤波器在高低频段的不同特性,提出一种适于说话人识别的改进Mel滤波器。实验结果表明,改进Mel滤波器提取的新特征能够获得比传统Mel倒谱系数以及反Mel倒谱系数(IMFCC)更好的识别效果,并且基本不增加说话人识别系统训练和识别的时间开销。  相似文献   

7.
在概率模型中,给出了引入倒谱预测值的动态相关性来进行特征补偿的方法。该方法采用期望最大化(EM)算法来估计联合分布参数,基于语音和噪声的先验概率密度,在倒谱域中对语音特征参数进行最小均方误差预测(MMSE),以提高语音识别精度。不同噪声环境和不同信噪比下的实验结果表明,该方法能有效地提高噪声环境下的中文连续语音识别的正确率。  相似文献   

8.
几种无语音检测噪音估计方法的比较研究   总被引:1,自引:0,他引:1  
噪音谱的估计是谱相减方法中关键的一环。传统的噪声谱的估计是通过对输入语音作语音检测,区分出纯噪声段,根据噪声段的频谱估计出噪声谱。该方法的准确性局限于语音检测算法的性能,在信噪比较低时,性能下降很快。近年来人们提出了多种不用语音检测的噪声估计方法,这些方法不区分语音和非语音段,在每一帧都进行噪声谱的更新。评估了几种无语音检测的噪音估计方法,比较了它们用于谱相减时在语音识别中的性能,提出了一种新的基于能量聚类的无语音检测噪音估计方法,通过实验验证了它的优良性能。  相似文献   

9.
杜晓青  于凤芹 《计算机工程》2013,(11):197-199,204
Mel频率倒谱系数(MFCC)与线性预测倒谱系数(LPCC)融合算法只能反映语音静态特征,且LPCC对语音低频局部特征描述不足。为此,提出将希尔伯特黄变换(HHT)倒谱系数与相对光谱一感知线性预测倒谱系数(RASTA—PLPCC)融合,得到一种既反映发声机理又体现人耳感知特性的说话人识别算法。HHT倒谱系数体现发声机理,能反映语音动态特性,并更好地描述信号低频局部特征,可改进LPCC的不足。PLPCC体现人耳感知特性,识别性能强于MFCC,用3种融合算法对两者进行融合,将融合特征用于高斯混合模型进行说话人识别。仿真实验结果表明,该融合算法较已有的MFCC与LPCC融合算法识别率提高了8.0%。  相似文献   

10.
在伴随着外部噪声的情况下,待识别的声纹美尔频率倒谱系数特征各项属性很容易受到外部噪声的干扰发生改变,造成声纹特征的识别的精度不高.为提高精度,提出了一种用支持向量机的美尔频率倒谱系数特征干扰去除算法.确定分类决策函数时充分考虑美尔频率倒谱系数与声纹中心以及噪声之间的关系,并且将声纹特征引入核函数,将原空间样本数据通过非线性变换映射到高维特征空间,在高维空间中求最优或广义最优分类面,实现对语音特征的干扰消除.实验表明,利用改进算法实现了声纹特征中过零率,倒谱特征、矩形窗和汉明窗长的短时能量函数特征的优化.  相似文献   

11.
Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variation through the eigenstructure of the sample covariance matrix. This covariance is computed using sliding window over spectral features. The newly formulated feature vectors representing local spectral variations are used with classical and state-of-the-art speaker recognition systems. Results on multiple speaker recognition evaluation corpora reveal that eigenvectors weighted with their normalized singular values are useful in representing local covariance information. We have also shown that local variability features can be extracted using mel frequency cepstral coefficients (MFCCs) as well as using three recently developed features: frequency domain linear prediction (FDLP), mean Hilbert envelope coefficients (MHECs) and power-normalized cepstral coefficients (PNCCs). Since information conveyed in the proposed feature is complementary to the standard short-term features, we apply different fusion techniques. We observe considerable relative improvements in speaker verification accuracy in combined mode on text-independent (NIST SRE) and text-dependent (RSR2015) speech corpora. We have obtained up to 12.28% relative improvement in speaker recognition accuracy on text-independent corpora. Conversely in experiments on text-dependent corpora, we have achieved up to 40% relative reduction in EER. To sum up, combining local covariance information with the traditional cepstral features holds promise as an additional speaker cue in both text-independent and text-dependent recognition.  相似文献   

12.
In this paper, a set of features derived by filtering and spectral peak extraction in autocorrelation domain are proposed. We focus on the effect of the additive noise on speech recognition. Assuming that the channel characteristics and additive noises are stationary, these new features improve the robustness of speech recognition in noisy conditions. In this approach, initially, the autocorrelation sequence of a speech signal frame is computed. Filtering of the autocorrelation of speech signal is carried out in the second step, and then, the short-time power spectrum of speech is obtained from the speech signal through the fast Fourier transform. The power spectrum peaks are then calculated by differentiating the power spectrum with respect to frequency. The magnitudes of these peaks are then projected onto the mel-scale and pass the filter bank. Finally, a set of cepstral coefficients are derived from the outputs of the filter bank. The effectiveness of the new features for speech recognition in noisy conditions will be shown in this paper through a number of speech recognition experiments.A task of multi-speaker isolated-word recognition and another one of multi-speaker continuous speech recognition with various artificially added noises such as factory, babble, car and F16 were used in these experiments. Also, a set of experiments were carried out on Aurora 2 task. Experimental results show significant improvements under noisy conditions in comparison to the results obtained using traditional feature extraction methods. We have also reported the results obtained by applying cepstral mean normalization on the methods to get robust features against both additive noise and channel distortion.  相似文献   

13.
In this paper, auditory inspired modulation spectral features are used to improve automatic speaker identification (ASI) performance in the presence of room reverberation. The modulation spectral signal representation is obtained by first filtering the speech signal with a 23-channel gammatone filterbank. An eight-channel modulation filterbank is then applied to the temporal envelope of each gammatone filter output. Features are extracted from modulation frequency bands ranging from 3-15 H z and are shown to be robust to mismatch between training and testing conditions and to increasing reverberation levels. To demonstrate the gains obtained with the proposed features, experiments are performed with clean speech, artificially generated reverberant speech, and reverberant speech recorded in a meeting room. Simulation results show that a Gaussian mixture model based ASI system, trained on the proposed features, consistently outperforms a baseline system trained on mel-frequency cepstral coefficients. For multimicrophone ASI applications, three multichannel score combination and adaptive channel selection techniques are investigated and shown to further improve ASI performance.  相似文献   

14.
随着手机录音设备的普及以及各种功能强大且易于操作的数字媒体编辑软件的出现,语音的手机来源识别已成为多媒体取证领域重要的热点问题,针对该问题提出了一种基于频谱融合特征的手机来源识别算法。首先,通过分析不同手机相同语音的语谱图,发现不同手机的语音频谱特征是不同的;然后对语音的频谱信息量、对数谱和相位谱特征进行了研究;其次,将三个特征串联构成原始融合特征,并用每个样本的原始融合特征构建样本特征空间;最后,采用WEKA平台的CfsSubsetEval评价函数按照最佳优先搜索原则对所构建的特征空间进行特征选择,并采用LibSVM对特征选择后的样本特征空间进行模型训练和样本识别。实验部分给出了特征选择后的频谱单一特征和频谱融合特征在23款主流型号的手机语音库上分类的结果。实验结果表明,该算法使用频谱融合特征有效提高了手机品牌类内的平均识别准确率,在TIMIT翻录语音数据库和自建的CKC-SD语音数据库上分别达到99.96%和99.91%;另外,与Hanilci基于梅尔倒谱系数特征的录音设备来源识别算法进行了对比,平均识别准确率分别提高了6.58和5.14个百分点。因此可得本文所提特征可有效提高平均识别准确率,降低手机类内识别的误判率。  相似文献   

15.
Recently, several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain. This uncertainty is used by a decoder that exploits the variance associated with the enhanced cepstral features to improve robust speech recognition. Systematic evaluations on a subset of the Aurora4 task using the estimated uncertainty show substantial improvement over the baseline performance across various noise conditions.  相似文献   

16.
针对低信噪比条件下基本谱减算法存在降噪效果不佳,产生音乐噪声过大,语音可懂度不高的问题,提出了一种改进型的谱减算法。算法先计算语音信号的倒谱距离值,检测出噪音段和语音段,用动态计算的噪声值代替基本谱减法采用的噪声统计均值;根据当前帧和噪声帧的倒谱距离比值动态设置谱减系数,改进了传统算法中谱减系数保持不变的缺点;同时采用三种方法抑制音乐噪声。仿真实验表明,在低信噪比情况下,改进型的谱减算法可以有效降噪,提高信噪比和可懂度,达到语音增强的目的。  相似文献   

17.
In this work, spectral features extracted from sub-syllabic regions and pitch synchronous analysis are proposed for speech emotion recognition. Linear prediction cepstral coefficients, mel frequency cepstral coefficients and the features extracted from high amplitude regions of spectrum are used to represent emotion specific spectral information. These features are extracted from consonant, vowel and transition regions of each syllable to study the contribution of these regions toward recognition of emotions. Consonant, vowel and the transition regions are determined using vowel onset points. Spectral features extracted from each pitch cycle, are also used to recognize emotions present in speech. The emotions used in this study are: anger, fear, happy, neutral and sad. The emotion recognition performance using sub-syllabic speech segments are compared with the results of conventional block processing approach, where entire speech signal is processed frame by frame. The proposed emotion specific features are evaluated using simulated emotion speech corpus, IITKGP-SESC (Indian Institute of Technology, KharaGPur-Simulated Emotion Speech Corpus). The emotion recognition results obtained using IITKGP-SESC are compared with the results of Berlin emotion speech corpus. Emotion recognition systems are developed using Gaussian mixture models and auto-associative neural networks. The purpose of this study is to explore sub-syllabic regions to identify the emotions embedded in a speech signal, and if possible, to avoid processing of entire speech signal for emotion recognition without serious compromise in the performance.  相似文献   

18.
基于伪倒谱感觉加权滤波器的设计及应用   总被引:1,自引:0,他引:1       下载免费PDF全文
针对50 Hz~7 000 Hz的宽带语音低频到高频的频谱倾斜现象严重问题,设计一个适用于宽带语音编码的感觉加权滤波器。该滤波器根据ISP伪倒谱与线性预测倒谱系数差进行设计,ISP伪倒谱是电抗函数自然对数一半取反z变换,其实质是一个多项式的倒谱。实验结果表明,该滤波器对改善宽带语音频谱倾斜问题具有较好的效果。  相似文献   

19.
This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM) generated synthetic speech in noise. We present a method for modifying the Mel cepstral coefficients generated by statistical parametric models that have been trained on plain speech. We update these coefficients such that the glimpse proportion – an objective measure of the intelligibility of speech in noise – increases, while keeping the speech energy fixed. An acoustic analysis reveals that the modified speech is boosted in the region 1–4 kHz, particularly for vowels, nasals and approximants. Results from listening tests employing speech-shaped noise show that the modified speech is as intelligible as a synthetic voice trained on plain speech whose duration, Mel cepstral coefficients and excitation signal parameters have been adapted to Lombard speech from the same speaker. Our proposed method does not require these additional recordings of Lombard speech. In the presence of a competing talker, both modification and adaptation of spectral coefficients give more modest gains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号