首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 849 毫秒
1.
We investigate the performance of an isolated word speech recognition (IWSR) system for degraded speech. We propose a recognition scheme which adapts itself to mild degradations in speech and improves the reliability of recognition significantly. The scheme does not use a priori information regarding the nature and extent of noise. We suggest techniques which adaptively discriminate between noisy and noise-free parameters by using a selective weighting procedure in the final distance calculation. A new measure of performance is adopted to compare several recognition schemes using small data sets. Our scheme lends itself to greater flexibility in handling degradations in speech input than do the existing recognition schemes.  相似文献   

2.
季伟  王力 《通信技术》2013,(12):15-18
语音识别系统中,语音的特征提取是语音识别的关键技术之一。通过对语音的系统研究,提出一种全新的基于流形学习的特征提取方法。流形算法是近些年才发展起来的非线性降维方法,在人脸识别领域已取得较好效果,但在语音识别领域一直处于空白。现提出的基于流形学习LPP算法的语音特征提取方案,是一次重大的尝试,可以为以后深入研究语音识别技术提供较好参考。仿真实验结果表明,该算法与传统特征提取LPCC、MFCC算法相比,可以取得较好的识别率。  相似文献   

3.
针对语音识别实际应用过程中的噪声问题,给出了一种新的抗噪声的特征提取算法,即先利用小波变换将语音信号进行小波子带分解,再根据人耳的听觉掩蔽效应,由谱压缩的技术,将小波变换后的子带语音信号进行压缩,从而提取其对应的语音特征。通过MATLAB软件建立实验平台,仿真实验结果表明该语音特征可以在噪声环境下得到较高的识别率。新的特征参数即充分利用了小波的抗噪声特性又有效地降低了语音识别中的训练环境和识别环境间的失配,具有抗噪声的特点。  相似文献   

4.
应用于语音识别片上系统的语音检测算法   总被引:2,自引:0,他引:2  
语音识别技术的研究已经进入实用化阶段,而实用化语音识别系统中的一个关键技术就是可靠的语音检测。本文提出了一种基于有限状态机模型的实时语音检测算法(FSM-SD)。采用对数最大似然判决帧能量检测器和过零率检测器控制各状态之间的跳转关系。针对语音识别中的MFCC(Mel频标倒谱系数)和LPCC(线性预测倒谱参数)特征提取过程,分别得到两种不同的帧能量计算方法。将FSM-SD应用到在OAK DSP上实现的小词表汉语语音识别系统,通过实验验证了其对系统识别性能和噪声稳健性的有效保证。  相似文献   

5.
王大巍 《电子技术》2010,47(7):21-22
语音口令识别是语音信息处理的一个重要研究方向,本文给出一种基于嵌入式系统的语音口令识别系统的设计方案,硬件系统的核心芯片是嵌入式微处理器,语音口令识别算法采用连续隐马尔克夫模型。实验结果表明,将语音识别系统与嵌入式系统相结合,可以使语音口令识别系统广泛应用于便携式设备中。  相似文献   

6.
提出了一种结合韵律信息的高性能汉语连续数字语音识别算法,该识别算法基于CHMM(连续隐马尔可夫模型),采用MFCC(MEL频率倒谱系数)为主要语音特征参数,结合韵律信息进行连续数字精确分割,能够有效区分易混数字。算法采用两级识别框架来提高语音识别率,其中,第1级对连续数字分割,在此基础上进行数字语音识别,输出各候选结果,第2级在候选结果中确定易混数字对,并运用韵律信息进一步选择正确结果。实验表明,最终汉语连续数字语音识别率有很大提高。  相似文献   

7.
It has become increasingly important to develop hands-free speech recognition techniques for the human-computer interface in car environments. However, severe car noise degrades the speech recognition performance substantially. To compensate the performance loss, it is necessary to adapt the original speech hidden Markov models (HMMs) to meet changing car environments. A novel frame-synchronous adaptation mechanism for in-car speech recognition is presented. This mechanism is intended to perform unsupervised model adaptation efficiently on a frame-by-frame basis instead of a conventional adaptation algorithm relying on batch adaptation data and supervision information. The proposed adaptation scheme is performed during frame likelihood calculation where an optimal equalisation factor is first computed to equalise the model mean vector and the input frame vector. This equalisation factor then serves as a reference index to retrieve an additional bias vector for model mean adaptation. As a result, a rapid and flexible algorithm is exploited to establish a new robust likelihood measure. In experiments on hands-free in-car speech recognition with the microphone far from the talker, this framework is found to be effective in terms of recognition rate and computational cost under various driving speeds  相似文献   

8.
Based on the observation that dissimilar speech enhancement algorithms perform differently for different types of interference and noise conditions, we propose a context-adaptive speech pre-processing scheme, which performs adaptive selection of the most advantageous speech enhancement algorithm for each condition. The selection process is based on an unsupervised clustering of the acoustic feature space and a subsequent mapping function that identifies the most appropriate speech enhancement channel for each audio input, corresponding to unknown environmental conditions. Experiments performed on the MoveOn motorcycle speech and noise database validate the practical value of the proposed scheme for speech enhancement and demonstrate a significant improvement in terms of speech recognition accuracy, when compared to the one of the best performing individual speech enhancement algorithm. This is expressed as accuracy gain of 3.3% in terms of word recognition rate. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in fast-varying noise environments.  相似文献   

9.
This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.  相似文献   

10.
基于多媒体融合的图像检索系统的实现   总被引:1,自引:0,他引:1  
提出一种融合了文本、语音、图像等信息特征的图像检索方案,并以MATLAB为平台构建了一种基于语音识别技术的新型图像检索系统.与基于文本或基于内容的图像检索系统相比,该系统既提高了检索性能,又使得人机交互更加便利.  相似文献   

11.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients  相似文献   

12.
Numerous speech representations have been reported to be useful in speaker recognition. However, there is much less agreement on which speech representation provides a perfect representation of speaker-specific information conveyed in a speech signal. Unlike previous work, we propose an alternative approach to speaker modeling by the simultaneous use of different speech representations in an optimal way. Inspired by our previous empirical studies, we present a soft competition scheme on different speech representations to exploit different speech representations in encoding speaker-specific information. On the basis of this soft competition scheme, we present a parametric statistical model, generalized Gaussian mixture model (GGMM), to characterize a speaker identity based on different speech representations. Moreover, we develop an expectation-maximization algorithm for parameter estimation in the GGMM. The proposed speaker modeling approach has been applied to text-independent speaker recognition and comparative results on the KING speech corpus demonstrate its effectiveness.  相似文献   

13.
For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.  相似文献   

14.
在语音识别中,MFCC 参数是说话人识别中常用的特征参数之一。文中针对说话人识别速度较慢以及占用资源较大的问题,提出了一种 MFCC 计算的有效方案。利用 MFCC 滤波器的频率响应函数的三角形结构,改进了 Mel 滤波器的设计方法。实验结果表明,文中所提方案在单帧内存访问时间上减少了 83.6%,在保证识别准确率不降低的情况下,使识别速度大幅度提高,降低了说话人识别计算的复杂性。  相似文献   

15.
In speech recognition (ASR) based on hidden Markov models (HMM) it is necessary to obtain a spectral approximation with a reduced set of representation coefficients. The author introduces to the speech parameterisation scheme multitapering and a modification of the usual mel frequency cepstrum coefficients (MFCC) processing scheme based on wavelets on intervals (wavelet frequency coefficients, WFC). Phoneme recognition performance improvements compared to the MFCC have been experimentally verified on data from a speech database, using multitapering and WFC.  相似文献   

16.
This paper presents a statistical model‐based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision‐directed Wiener filter, we combine a decision‐directed method with an original spectrum reconstruction method and develop a new two‐stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource‐constrained automotive devices is considered, ETSI standard advance distributed speech recognition font‐end (ETSI‐AFE) can be an effective solution, and ETSI‐AFE is also based on the decision‐directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI‐AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.  相似文献   

17.
舒倩  李银国 《通信技术》2007,40(11):374-375,378
MFCC是语音识别中常用的特征参数,根据MFCC分量对语音端点的敏感性,提出利用平常舍去的识别特征参数分量MFCC0作为语音端点检测的参量.接着根据MFCC0的特性设计了一种新的端点检测方法,该方法简单且无需增加额外的计算量.实验结果表明,基于该方法的语音识别系统不仅可以通过端点检测大大压缩数据量,而且提高了系统的识别率.  相似文献   

18.
李永伟  陶建华  李凯 《信号处理》2023,39(4):632-638
语音情感识别是实现自然人机交互不可缺失的部分,是人工智能的重要组成部分。发音器官的调控引起情感语音声学特征的差异,从而被感知到不同的情感。传统的语音情感识别只是针对语音信号中的声学特征或听觉特征进行情感分类,忽略了声门波和声道等发音特征对情感感知的重要作用。在我们前期工作中,理论分析了声门波和声道形状对感知情感的重要影响,但未将声门波与声道特征用于语音情感识别。因此,本文从语音生成的角度重新探讨了声门波与声道特征对语音情感识别的可能性,提出一种基于源-滤波器模型的声门波和声道特征语音情感识别方法。首先,利用Liljencrants-Fant和Auto-Regressive eXogenous(ARX-LF)模型从语音信号中分离出情感语音的声门波和声道特征;然后,将分离出的声门波和声道特征送入双向门控循环单元(BiGRU)进行情感识别分类任务。在公开的情感数据集IEMOCAP上进行了情感识别验证,实验结果证明了声门波和声道特征可以有效的区分情感,且情感识别性能优于一些传统特征。本文从发音相关的声门波与声道研究语音情感识别,为语音情感识别技术提供了一种新思路。  相似文献   

19.
以ATM实现分组话音通信,需要解决两个基本问题:传输时延和分组丢失。针对这两个基本问题,本文着重讨论了ATM分组话音通信中32kb/sADPCM话音分组丢失的重建技术--于模式匹配的波形替代技术和静默重建技术,分别用以补偿由于网络阻塞造成的分组话音信息丢失而产生的失真和改善重建话音的自然性。  相似文献   

20.
胡文轩  王秋林  李松  洪青阳  李琳 《信号处理》2021,37(10):1816-1824
端到端语音识别模型无需发音词典进行训练,可以大幅降低开发新语种语音识别系统的负担。本文利用端到端模型的这一优势,建立了一种语种无关的端到端多语种语音识别系统。该模型使用基于字符的建模方法进行训练,同时构建多语种输出符号集,使其包括所有目标语言中出现的字符。模型训练生成单一模型,其网络参数为所有语种共享。在OLR竞赛提供的10个语种数据集上,相较于单语种语音识别系统,本文提出的多语种语音识别系统在所有语言上的表现都更加优秀。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号