共查询到20条相似文献,搜索用时 125 毫秒
1.
语音识别指利用计算机识别语音信号所表达的内容,其目的是要准确地理解语音所蕴含的含义。本文着重研究了语音识别实现过程的特征提取,针对特征提取的多种方法,选用LPC倒谱系数作为特征参数提取,较彻底地去除了语音信号产生过程的激励信息,主要反映了声道模型,而且只需十几个倒谱系数就较好地描述了语音的共振峰特性。通过对语音信号进行预加重、分帧、加窗、自相关分析,而后提取出LPC倒谱系数。根据流程编写VC程序,对语音信号进行分析处理,去除对语音识别无关紧要的冗余信息,从而获得用于语音识别的重要信息。 相似文献
2.
随着说话人识别技术的发展,实用有效的说话人识别系统越来越成为研究的重点.语音特征参数的鲁棒性直接影响一个说话人识别系统的具体性能,过去主要针对移动通信环境下存在信道失真的问题,研究差分倒谱的鲁棒性.文中则主要在加性白噪声环境下研究Mel倒谱参数、Mel差分倒谱参数的顽健性以及它们经过倒谱系数零均值化(CMN)处理后识别性能的改进.从仿真结果可以看出:在加性白噪声环境下,差分倒谱参数具有很好的鲁棒性;倒谱系数零均值化能有效的除去加性白噪声. 相似文献
3.
4.
噪声环境下,为了提高说话人识别系统的鲁棒性,需要对系统进行各种抗噪声处理。采用梅尔频率倒谱系数作为语音的特征参数,矢量量化方法进行模式匹配,将改进的基于听觉掩蔽效应的语音增强器作为预处理器,对语音信号首先进行降噪处理。语音增强器实验结果表明,经过降噪处理后提高了输入信号的信噪比,减少了语音失真,同时很好地抑制了背景噪声和残余音乐噪声。将经过降噪处理的语音信号送入说话人识别系统,提高了系统的识别性能。 相似文献
5.
6.
本文提出了一种基于线性预测残差倒谱的多语音基音频率检测算法,该算法首先对混合语音信号进行线性预测分析,进而计算预测信号与原混合信号的残差,并对残差信号做倒谱变换,得到混合语音信号的线性预测残差倒谱;然后在该信号的残差倒谱中,结合图像处理的技术,利用语音信号基音倒频匹配法检测出多语音信号的基音频率;最后在基音标定的过程中,本文算法利用语音信号的连续特性,依据信号基音频率前后差距变化最小原则标记出各基音所属话者。实验结果表明,本文提出的算法在弱回声及无回声的情况下能快速有效地从单声道混合语音信号中检测出多语音基音信息。 相似文献
7.
应用于语音识别片上系统的语音检测算法 总被引:2,自引:0,他引:2
语音识别技术的研究已经进入实用化阶段,而实用化语音识别系统中的一个关键技术就是可靠的语音检测。本文提出了一种基于有限状态机模型的实时语音检测算法(FSM-SD)。采用对数最大似然判决帧能量检测器和过零率检测器控制各状态之间的跳转关系。针对语音识别中的MFCC(Mel频标倒谱系数)和LPCC(线性预测倒谱参数)特征提取过程,分别得到两种不同的帧能量计算方法。将FSM-SD应用到在OAK DSP上实现的小词表汉语语音识别系统,通过实验验证了其对系统识别性能和噪声稳健性的有效保证。 相似文献
8.
9.
10.
11.
在孤立字识别中,精确地判别语言信号的起始点和终止点是相当重要的。确定出语音信号范围的方案可以用来减少大量非实时系统的计算和提高识别精确度。本文在利用语音的某些特征参数——短时平均幅度或能量和短时平均过零率的基础上,提出了利用上述特征参数进行语音端点检测的IBM/PC机实现程序。 相似文献
12.
在电磁泄漏信号检测的过程中,由于外部噪声和检测系统内部产生的噪声,使原始信号中含有了大量的电磁噪声,给信号检测和提取带来了困难,尤其是一些微弱信号,它们很容易混叠在噪声中,而导致检测错误。因此,对所获取到的原始信号进行噪声消除,是我们首先要完成的工作。该文针对上述问题提出小波变换理论来消除电磁泄漏信号的噪声,并通过具体... 相似文献
13.
14.
15.
16.
基于听觉感知小波变换的电子耳蜗CIS语音信号处理 总被引:1,自引:0,他引:1
为克服以往滤波器组参数调整复杂,提出了一种听觉感知的小波变换的电子耳蜗语音处理的方法。文章在连续交替取样CIS(Confinuous Interleaved Sampling)语音信号处理方案的基础上,利用人耳听觉的临界频率与听觉感知的小波变换域的相似性,进行了电子耳蜗输出信号的重构,采用短时傅立叶变换的语谱图分析。实验结果表明:本方法获得的合成语音与原始语音在频谱包络特征上非常相似,频域特征里接近入耳的实际生理特性。 相似文献
17.
《Signal Processing Magazine, IEEE》2001,18(3):26-34
The author's goal is to generate a virtual space close to the real communication environment between network users or between humans and machines. There should be an avatar in cyberspace that projects the features of each user with a realistic texture-mapped face to generate facial expression and action controlled by a multimodal input signal. Users can also get a view in cyberspace through the avatar's eyes, so they can communicate with each other by gaze crossing. The face fitting tool from multi-view camera images is introduced to make a realistic three-dimensional (3-D) face model with texture and geometry very close to the original. This fitting tool is a GUI-based system using easy mouse operation to pick up each feature point on a face contour and the face parts, which can enable easy construction of a 3-D personal face model. When an avatar is speaking, the voice signal is essential in determining the mouth shape feature. Therefore, a real-time mouth shape control mechanism is proposed by using a neural network to convert speech parameters to lip shape parameters. This neural network can realize an interpolation between specific mouth shapes given as learning data. The emotional factor can sometimes be captured by speech parameters. This media conversion mechanism is described. For dynamic modeling of facial expression, a muscle structure constraint is introduced for making a facial expression naturally with few parameters. We also tried to obtain muscle parameters automatically from a local motion vector on the face calculated by the optical flow in a video sequence 相似文献
18.
Bobillet W. Diversi R. Grivel E. Guidorzi R. Najim M. Soverini U. 《Signal Processing, IEEE Transactions on》2007,55(12):5564-5578
In the framework of speech enhancement, several parametric approaches based on an a priori model for a speech signal have been proposed. When using an autoregressive (AR) model, three issues must be addressed. (1) How to deal with AR parameter estimation? Indeed, due to additive noise, the standard least squares criterion leads to biased estimates of AR parameters. (2) Can an estimation of the variance of the additive noise for each speech frame be obtained? A voice activity detector is often used for its estimation. (3) Which estimation rules and techniques (filtering, smoothing, etc.) can be considered to retrieve the speech signal? Our contribution in this paper is threefold. First, we propose to view the identification of the noisy AR process as an errors-in-variables problem. This blind method has the advantage of providing accurate estimations of both the AR parameters and the variance of the additive noise. Second, we propose an alternative algorithm to standard Kalman smoothing, based on a constrained minimum variance estimation procedure with a lower computational cost. Third, the combination of these two steps is investigated. It provides better results than some existing speech enhancement approaches in terms of signal-to-noise-ratio (SNR), segmental SNR, and informal subjective tests. 相似文献
19.
20.
提出了语音信号的快速实值离散Gabor变换(RDGT)方法,讨论了由RDGT系数计算语音复谱图值、语谱图生成和语音信号的快速重建问题。并给出了实例。 相似文献