首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
王坤赤  蒋华 《信息技术》2007,(10):20-22
基音频率和共振峰频率的提取在语音编码、语音合成和语音识别中有着广泛的应用。通过深入分析语音信号的时域和频域性质,针对语音信号幅度谱的特征设计了一种有效的基频和共振峰提取算法。并对实际语音信号进行参数提取测试,实验结果证明了这种算法能够准确提取不同讲话者和录音条件下的语音信号的基频与共振峰频率。  相似文献   

2.
黄胜华 《电信科学》1990,6(3):43-48
基音检测是语音处理中的十分重要的问题。本文提出一种有效的基音检测算法——自选极性峰波质心法(Hump Centroid with Automatic Polar Selection,简称HCAPS)。这种算法是根据语音信号浊音段中选定极性的峰波(Hump)的波形质心来提取基音周期。文中在简要介绍浊音信号波形特征的基础上,着重讨论自选极性峰波质心法的算法原理,并给出主要算法的程序流程图。计算机模拟实验表明,该算法的精度与准确率高,计算量小,抗扰性好,并且对各种语音有很好的适应性。它是一种很有实用价值的基音检测方法。  相似文献   

3.
基于扩展谱相减的RCAF基音周期检测算法   总被引:1,自引:0,他引:1  
针对传统基音检测算法在信噪比低的情况下提取的基音周期错误率较高,该文提出了一种基于RCAF (Reverse CAMDF Autocorrelation Function)搜索试探平滑的基音轨迹提取方法。采用自适应判决准则的扩展谱相减进行语音增强,在语音段实现了对噪声信号的估计。应用RCAF算法提取基音周期,通过搜索试探平滑算法对提取出的基音周期进行平滑处理。该算法降低了误判率,提高了提取精度。仿真结果表明,该算法在-10dB信噪比情况下,其性能优于传统的CAMDF和AWAC等方法。  相似文献   

4.
基音周期是语音信号的重要参数,提取藏语语音基音周期为藏语语音识别和藏语语音合成奠定很重要的基础。这里在分析藏语发音特点的基础上进行基于LPC的藏语语音基音周期提取算法的分析,实践表明,该方法更加符合小信噪比藏语音信号基音周期和提取。在传统LPC分析方法的基础上结合自相关法和倒谱法,分析计算平均相对误差,总结出了符合藏语语音特点的特征提取算法。  相似文献   

5.
提出一种基于自组织神经网络的数字语音识别模型。首先用基于小波变换和线性预测的特征提取方法提取语音信号特征,用自组织神经网络进行识别判决。这种语音识别方法适合于小词汇量的孤立词识别,网络结构简单,所需训练数据十分的少,实时性能好。用MATLAB进行仿真实验,识别率达到98%。  相似文献   

6.
与桌面环境相比,电话网络环境下的语音识别率仍然还比较低,为了推动电话语音识别在实际中的应用,提高其识别率成了当务之急.先前的研究表明,电话语音识别率明显下降通常是因为测试和训练环境的电话通道不同引起数据失配造成的,因此该文提出基于统计模型的动态通道补偿算法(SMDC)减少它们之间的差异,采用贝叶斯估计算法动态地跟踪电话通道的时变特性.实验结果表明,大词汇量连续语音识别的字误识率(CER)相对降低约27%,孤立词的词误识率(WER)相对降低约30%.同时,算法的结构时延和计算复杂度也比较小.平均时延约200ms.可以很好地嵌入到实际电话语音识别应用中.  相似文献   

7.
该文提出了一种特征波形提取速率自适应于输入语音帧特性的波形内插编码方案。基于双加权长时预测增益最大原则并利用前向基音判决实现了较为可靠的基音周期估计算法,用基音周期、浊音度和波表面平坦度决定波形提取速率以及SEW(Slowly Evolving Waveform)和REW(Rapidly Evolving Waveform)的更新速率。实验证明,该文提出的波形内插(WI)编码算法相比固定波形提取速率的WI算法在平均码率和计算复杂度上均有一定程度的降低,且合成语音质量明显优于4.8kbps的CELP语音编码算法。  相似文献   

8.
TETRA语音编码中基音预处理算法的优化   总被引:6,自引:0,他引:6  
本文在TETRA语音编码的预处理部分引入了数值滤波算法。该算法能够有效去除声道的共振峰结构对基音检测的影响,进一步试验采用去均值。低通滤波和数值滤波等算法来替代原有的预处理,可以得到更好的优化效果。我们设计了普通语音、基音有抖动的语音和带有噪声干扰的语音未检验优化算法的性能。在这两种优化算法中都可以发现处理后的语音信号在时域上表现出明显的周期性特征,同时在频域上原始语音的共振峰影响被消除或得到有效抑制。  相似文献   

9.
基于SHS的重叠语音基音分离检测方法   总被引:3,自引:1,他引:2  
运用SHS(分谐波累加)法可准确地提取单个语音的基音周期,本文将该方法推广运用到重叠语音基音的分离检测,并提出了相应的检测方法.实验证明,本文提出的方法能有效地分离并提取两重叠语音的基音,即使对于基音周期相差较近的两个重叠语音也能取得较好的结果.  相似文献   

10.
基于BP神经网络的耳语音转换为正常语音的研究   总被引:1,自引:1,他引:0  
提出了一种基于BP神经网络的汉语耳语音转换为正常语音的方法.首先提取正常语音、耳语音的共振峰参数,使用BP神经网络训练出耳语音到正常语音共振峰参数的转换模型;然后根据模型求出与耳语音对应的正常语音共振峰参数,采用共振峰合成的方法将耳语音转换为正常语音.实验结果表明:使用该方法转换的正常语音DRT得分为80%,MOS得分为3.5,在可懂度和音质方面均达到了满意的效果.  相似文献   

11.
基于CNN的连续语音说话人声纹识别   总被引:1,自引:0,他引:1  
近年来,随着社会生活水平的不断提高,人们对机器智能人声识别的要求越来越高.高斯混合—隐马尔可夫模型(Gaussian of mixture-hidden Markov model,GMM-HMM)是说话人识别研究领域中最重要的模型.由于该模型对大语音数据的建模能力不是很好,对噪声的顽健性也比较差,模型的发展遇到了瓶颈.为了解决该问题,研究者开始关注深度学习技术.引入了CNN深度学习模型研究连续语音说话人识别问题,并提出了CNN连续说话人识别(continuous speaker recognition of convolutional neural network,CSR-CNN)算法.模型提取固定长度、符合语序的语音片段,形成时间线上的有序语谱图,通过CNN提取特征序列,经过奖惩函数对特征序列组合进行连续测量.实验结果表明,CSR-CNN算法在连续—片段说话人识别领域取得了比GMM-HMM更好的识别效果.  相似文献   

12.
A phonetic approach to the problem of automatic recognition of isolated words is investigated. The phonetic encoding method whereby each word from a vocabulary is associated with the code sequence of stable phonemes is proposed. The information-theoretical estimate of vocabulary confusability, the calculations of which rely on the phonetic database of a speaker and the communications channel SNR, is synthesized using the Kullback-Leibler divergence properties. In the experimental study of the proposed method, the mutual influence between the recognition quality and the proposed estimate of confusability is demonstrated by solving the problem of recognition of words in the Russian speech. It is established that the introduced requirement to isolated syllable pronunciation makes it possible to attain the 90–95% accuracy of recognition for vocabularies containing 2000 words.  相似文献   

13.
汉语语音识别研究面临的一些科学问题   总被引:12,自引:0,他引:12  
杜利民  侯自强 《电子学报》1995,23(10):110-116,61
本文简述汉语语音自动识别从实验室技术过渡到实际商用技术所必须解决的一些科学问题,列举了汉语语音编码的结构特点和规则,强调(1)在汉语音节的声母、韵母层面上的语言模型对语音的识别很有帮助,也会提供文字语言和讲话语言的有用知识;(2)使用区别性导引特征和描述性均匀特征有助于加速语音识别的搜索速度,减少失配和改善对音位变体的细分,本文还着重讨论了在语音信号的声学处理环节提高语音识别鲁棒性的重要问题和途径,文中还提出了标注性学习、提示性猜测的逐步过渡的训练和自适应方法,用于汉语大词汇连续语音识别。  相似文献   

14.
This paper describes the implementation of a Speech Understanding System component which tracks the formants of pseudo-syllabic nuclei containing voiced consonants. The nuclei are isolated from continuous speech after a precategorical classification in which feature extraction is carried out by modules organized in a hierarchy of levels. FFT and LPC spectra are the input to the formant tracking system. It works under the control of rules specifying the possible formant evolutions given previously hypothesized phonetic features and produces fuzzy graphs rather than usual formant patterns because formants are not always evident in the spectrogram pattern.  相似文献   

15.
This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results  相似文献   

16.
17.
A neural network system which combines a self-organizing feature map and multilayer perception for the problem of isolated word speech recognition is presented. A new method combining self-organization learning and K-means clustering is used for the training of the feature map, and an efficient adaptive nearby-search coding method based on the `locality' of the self-organization is designed. The coding method is shown to save about 50% computation without degradation in recognition rate compared to full-search coding. Various experiments for different choices of parameters in the system were conducted on the TI 20 word database with best recognition rates as high as 99.5% for both speaker-dependent and multispeaker-dependent tests  相似文献   

18.
In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. As DAE has a deep structure and nonlinear processing steps, it is flexible enough to model highly nonlinear mapping between input and output space. In this paper, we train a DAE to map reverberant and noisy speech features to the underlying clean speech features in the cepstral domain. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2 % of the baseline system to 21.2 % in simulated environments and from 47.5 % to 41.3 % in real environments, respectively.  相似文献   

19.
A words phonetic decoding method in automatic speech recognition is considered. The properties of Kullback–Leibler divergence are used to synthesize the estimation of the distribution of divergence between minimum speech units (e.g., single phonemes) inside a single class. It is demonstrated that the minimum variance of the intraphonemic divergence is reached when the phonetic database is tuned to the voice of a single speaker. The estimations are proven by experimental results on the recognition of vowel sounds and isolated words of Russian language.  相似文献   

20.
介绍了一种基于连续M元高斯混合密度的隐马尔可夫模型(HMM)的非特定人孤立词语音识别仿真系统。通过研究模型状态数、训练时间以及特征参数选取对语音识别率的影响,得出HMM状态数取4,训练次数为20次,特征参数选取48维LPCC和MFCC的混合参数,可使语音识别系统对于汉语孤立词的识别率达到90%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号