首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 93 毫秒
基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。  相似文献   

耳语音是噪声源激励,与正常音相比,其共振峰位置发生了偏移,带宽增宽。故采用传统的线性预测法提取耳语音共振峰时存在虚假峰问题。通过分析功率谱,提出了一种改进算法。根据极点功率不变的原则,利用极点交互因子修正共振峰的带宽,从而准确地提取出耳语音的共振峰。对汉语普通话单元音音素仿真实验的结果证明了该算法的有效性。  相似文献   

本文介绍了一种采用串并联结构的共振峰语音综合系统。该系统比一些采用专用硬件的语音综合系统具有调整容易、组态灵活、稳定性高等优点。另外,共振峰方法还可以构成一种经济的语音信息存贮和传输型式,减小存贮量和传输带宽。  相似文献   

本文介绍了一种采用串并联结构的共振峰语音综合系统。该系统比一些采用专用硬件的语音综合系统具有调整容易、组态灵活、稳定性高等优点。另外,共振峰方法还可构成一种经济的语音信息存贮和传输型式,减小存贮量和传输带宽。  相似文献   

本文介绍了一种采用串并联结构的共振峰语音综合系统。该系统比一些采用专用硬件的语音综合系统具有调整容易、组态灵活、稳定性高等优点。另外,共振峰方法还可以构成一种经济的语音信息存贮和传输型式,减小存贮量和传输带宽。  相似文献   

反映声道(谐振器)物理特性且不易受环境影响的元音共振峰可以更好地反映说话人的声纹特征,为此提出了说话人共振峰自适应MFCC(梅尔倒谱系数)特征提取SOC(片上系统)设计。首先提取说话人语音元音的三组共振峰来设计Mel三角形滤波器组,并基于传统MFCC与共振峰改进MFCC矩阵参数比值设计自适应融合说话人语音特征以改进MFCC。在MATLAB中完成性能仿真,在QUARTUS II中完成VERILOG-HDL代码设计,在FPGA(现场可编程门阵列)开发板上完成SOC设计、编译、仿真和验证下载。结果表明,在较高信噪比环境下,基于自适应融合和共振峰改进的MFCC得到的特征向量比传统的MFCC具有更强的鲁棒性,此技术在说话人声纹身份识别传感器设计中有较大应用推广价值。  相似文献   

为实现更为有效的自动语音情感识别系统,提出了一种基于声门信号特征参数及高斯混合模型的情感识别算法.该算法基于人类发音机理,通过逆滤波器及线性预测方法,实现声门信号的估计,提取声门信号时域特征参数表征不同情感类别.实验采用公开的BES (berlin emotion speech database)情感语料库,对愤怒、无聊、厌恶、害怕、高兴、平静、悲伤这7种情感进行自动识别.实验结果表明,提出的语音情感识别系统能有效的识别各类情感状态,其情感判别正确率接近于人类识别正确率,且优于传统的基音频率及共振峰参数.  相似文献   

AI克隆语音技术的出现将对现代社会法治秩序造成致命冲击。近年来研究人员仅关注了AI合成语音与样本语音内容相同领域的研究,而对AI合成语音与样本内容不同的检材的鉴定研究却甚少,相关鉴定内容无法识别。为此,提出了一种三维度基于改进MFCC特征模型对AI克隆语音源进行鉴定。首先对先前研究人员人工分析的AI克隆语音特性进行验证,总结出可识别的“共振峰F5异常活跃”与“能量、共振峰、音高曲线异常突变”的特征。其次基于AI克隆语音的特征运用二阶差分修正MFCC系数并采用“逆差逻辑推演法”将能量、共振峰、音高曲线突变特性进一步量化采样,将其定义为语音鉴定的特征向量三元组。然后以特征向量三元组为输入,运用D-S证据合成规则将三组检材与样本比对的结果融合。最后形成三维度基于改进MFCC特征参量的检材评定模型。人群随机采样实验结果表明,该AI克隆语音源鉴定方法对以同一人为克隆源所合成的AI克隆语音鉴定的平均概率为67.324%,标准差为7.32%,鉴定效果很好。  相似文献   

针对HMM语音合成算法,固定参数的后置滤波器无法适应不同失真程度的频谱导致合成语音自然度下降,提出了一种基于后置滤波器参数自适应的语音合成改进算法。该方法根据语音谱的平坦度自适应选择最优的短时滤波参数来对合成语音频谱的共振峰区域增强;使用长时后置滤波器优化合成语音的基频谐波结构来减轻合成语音基频的不连续性。仿真实验结果表明,该方法能够有效地减轻语音的频谱过平滑,主观测试结果表明,合成语音的自然度得以提高。  相似文献   

对时变性强的非平稳汉语语音信号,建议采用变时-频复子波分析方法提取汉语语音信号的幅度谱、相位谱、基音周期及共振峰信息。选择有n阶消失矩及良好的时频局域化特性的复高斯子波提取汉语语音信号的幅度谱和相位谱,实验结果表明,该方法提取的语音信号的幅度谱、相位谱和子波变换谱表征了汉语语音的音节包络、细节包络及声调,区分了清、浊音,并准确提取了语音信号的动态基音周期、估计出共振峰。这对汉语语音特征提取和识别提供了一种新的思路。  相似文献   

A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.  相似文献   

In this paper, we study the effect of filter bank smoothing on the recognition performance of children's speech. Filter bank smoothing of spectra is done during the computation of the Mel filter bank cepstral coefficients (MFCCs). We study the effect of smoothing both for the case when there is vocal-tract length normalization (VTLN) as well as for the case when there is no VTLN. The results from our experiments indicate that unlike conventional VTLN implementation, it is better not to scale the bandwidths of the filters during VTLN - only the filter center frequencies need be scaled. Our interpretation of the above result is that while the formant center frequencies may approximately scale between speakers, the formant bandwidths do not change significantly. Therefore, the scaling of filter bandwidths by a warp-factor during conventional VTLN results in differences in spectral smoothing leading to degradation in recognition performance. Similarly, results from our experiments indicate that for telephone-based speech when there is no normalization it is better to use uniform-bandwidth filters instead of the constant- like filters that are used in the computation of conventional MFCC. Our interpretation is that with constant- filters there is excessive spectral smoothing at higher frequencies which leads to degradation in performance for children's speech. However, the use of constant- filters during VTLN does not create any additional performance degradation. As we will show, during VTLN it is only important that the filter bandwidths are not scaled irrespective of whether we use constant- or uniform-bandwidth filters. With our proposed changes in the filter bank implementation we get comparable performance for adults and about 6% improvement for children both for the case of using VTLN as well as the for the case of not using VTLN on a telephone-based digit recognition task.  相似文献   

Laguerre滤波器在抗噪语音识别特征提取中的应用   总被引:1,自引:0,他引:1  
为克服FIR滤波器存在的通阻带特性差、滤波器阶次高等缺点给语音识别系统带来的不利影响,采用Laguerre滤波器组代替过零峰值幅度特征提取中使用的FIR滤波器组进行前端处理。在仔细研究FIR滤波器参数确定方法的基础上,叙述了La-guerre滤波器原理及参数计算方法,并给出了计算结果。孤立词、非特定人语音识别实验结果表明,使用Laguerre滤波器不仅使识别系统抗噪性能优于使用FIR滤波器,而且滤波器阶数也大为下降。  相似文献   

使用听觉感知的小波变换来提取电子耳蜗中的共振峰参数。首先用听觉感知的小波变换对原始语音信号进行分解重构,然后分别用自相关和格型法对合成语音信号和原始语音信号进行共振峰提取。实验结果表明:使用听觉感知的小波变换进行共振峰参数提取的可行性,合成语音信号能更好地表征原始语音信号的特征;同时也证实了电子耳蜗语音处理器中使用由格型法提取共振峰参数比自相关法更精确。  相似文献   

In this paper, a set of features derived by filtering and spectral peak extraction in autocorrelation domain are proposed. We focus on the effect of the additive noise on speech recognition. Assuming that the channel characteristics and additive noises are stationary, these new features improve the robustness of speech recognition in noisy conditions. In this approach, initially, the autocorrelation sequence of a speech signal frame is computed. Filtering of the autocorrelation of speech signal is carried out in the second step, and then, the short-time power spectrum of speech is obtained from the speech signal through the fast Fourier transform. The power spectrum peaks are then calculated by differentiating the power spectrum with respect to frequency. The magnitudes of these peaks are then projected onto the mel-scale and pass the filter bank. Finally, a set of cepstral coefficients are derived from the outputs of the filter bank. The effectiveness of the new features for speech recognition in noisy conditions will be shown in this paper through a number of speech recognition experiments.A task of multi-speaker isolated-word recognition and another one of multi-speaker continuous speech recognition with various artificially added noises such as factory, babble, car and F16 were used in these experiments. Also, a set of experiments were carried out on Aurora 2 task. Experimental results show significant improvements under noisy conditions in comparison to the results obtained using traditional feature extraction methods. We have also reported the results obtained by applying cepstral mean normalization on the methods to get robust features against both additive noise and channel distortion.  相似文献   

为了改善传统语音特征参数在复杂环境下识别性能不足的问题,提出了一种基于Gammatone滤波器和子带能量规整的语音特征提取方法.该方法以能量规整倒谱系数(PNCC)特征算法为基础,在前端引入平滑幅度包络和归一化Gammatone滤波器组,并通过子带能量规整方法抑制真实环境的背景噪声,最后在后端进行特征弯折和信道补偿处理加以改进.实验采用高斯混合通用背景分类器模型(GMM-UBM)将该算法和其他特征参数进行对比.结果表明,在多种噪声环境中相比其他特征参数,本文方法表现出良好的抗噪能力,即使在低信噪比下仍有较好的识别效果.  相似文献   

Digital computer processing of speech is of much current interest. This paper examines the synthesis of speech utilizing the wave digital filter which has been shown to have low coefficient sensitivity properties and to generate smaller roundoff error than conventional filters. Also examined is the coefficient quantization in the digital formant speech synthesis model and how implementation with the wave filter may serve as a better alternative. Simulation and generation of speech confirm the feasibility and corresponding advantages of implementation with the wave digital filter compared to conventional filters.  相似文献   

基于SVM的语音情感识别算法   总被引:1,自引:0,他引:1  
为有效提高语音情感识别系统的识别正确率,提出一种基于SVM的语音情感识别算法.该算法提取语音信号的能量、基音频率及共振峰等参数作为情感特征,采用SVM(Support Vector Machine,支持向量机)方法对情感信号进行建模与识别.在仿真环境下的情感识别实验中,所提算法相比较人工神经网络的ACON(All Cl...  相似文献   

陈旭  蒋晔 《计算机工程》2021,47(3):291-297,303
录音回放是目前声纹识别技术应对各种仿冒语音攻击的主要手段。针对传统语音特征无法区分真实语音和回放语音的问题,提出一种基于高斯滤波器组的Fisher比混合倒谱特征提取算法。将高斯滤波器组代替传统三角滤波器组,分别采用线性频率和逆ERB频率替换MEL频率,形成高斯线性频率倒谱系数特征(G-LFCC)和高斯逆ERB频率倒谱系数特征(G-IEFCC)两个新的特征。通过Fisher准则将G-LFCC和G-IEFCC融合,生成新的混合特征参数,该特征提高了真实语音和回放语音在高频段的区分度,同时降低回放语音在低频段因不同录音及回放设备造成的干扰。在ASVSpoof2017评测数据上的实验结果表明,该算法混合特征具有较好的区分效果,与IMFCC、LFCC、CQCC和GSV等算法相比,等错误概率分别降低21.8%、38.8%、58.3%和43.7%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号