共查询到19条相似文献,搜索用时 89 毫秒
1.
不良语音识别技术是在高清音视频业务中监测不良信息的有效手段。本文提出了一个基于声学模型的不良语音识别框架。并针对框架中特征提取、声学模型构建、不良语音判定模型三个重要部分的实现方法进行分析描述。并列举了各种方法的优缺点。对构建高效的不良语音识别系统具有较高的参考价值。 相似文献
2.
汉语连续语音识别中不同基元声学模型的复合 总被引:1,自引:0,他引:1
该文研究由不同声学基元训练的声学模型的复合。在汉语连续语音识别中,流行的基元包括上下文相关的声韵母基元和音素基元。实验发现,有些汉语音节在声韵母模型下有更高的识别率,有些音节在音素模型下有更高的识别率。该文提出一种复合这两种声学模型的方法,一方面在识别过程中同时使用两种模型,另一方面在识别过程中避开造成低识别率的模型。实验表明,采用本文的方法后,音节错误率比音素模型和声韵母模型分别下降了9.60%和6.10%。 相似文献
3.
模型补偿技术已成功应用到噪声环境下的语音识别任务中。流行的模型补偿技术如Log-Add和Log-Normal PMC(并行模型合并)方法对动态特征参数通常只能给出近似的补偿。因此他们的识别率在较低的信噪比条件下变得很低。本文利用静态特征的导函数推导出了一种新的动态模型参数补偿方法。新的方法可以同任何已知的静态模型补偿算法结合产生出新的用于识别的噪声语音模型。实验证明这一新算法的应用,使其识别率比仅使用原有的模型补偿算法有较为明显的提高,并且新算法的复杂度较原有的模型补偿算法只有轻微的增加。 相似文献
4.
基于鲁棒主成分分析(RPCA)的单通道语音增强算法是高斯白噪声环境下语音增强的一种重要处理手段,但其对低秩语音分量处理效果欠佳且无法较好地抑制色噪声。针对此问题,该文提出一种基于白化频谱重排RPCA的改进语音增强算法(WSRRPCA),通过优化噪声白化模型,将色噪声语音增强转换成白噪声语音信号处理,利用频谱重排改进RPCA语音增强处理算法,从而获得色噪声环境下语音信号处理性能的整体提升。仿真实验表明,该算法能够较好地实现色噪声环境下的语音增强,且相对于其他算法具有更佳的噪声抑制和语音质量提升能力。 相似文献
5.
6.
7.
语音识别技术及应用(上) 总被引:4,自引:0,他引:4
引言 语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的高技术。语音识别是一门交叉学科。近二十年来,语音识别技术取得显著进步,开始从实验室走向市场。 相似文献
8.
由于环境噪声的影响,实际应用中说话人识别系统性能会出现急剧下降。提出了一种基于高斯混合模型-通用背景模型和自适应并行模型组合的鲁棒性语音身份识别方法。自适应并行模型组合是一种噪声鲁棒性的特征补偿算法,能够有效减少训练环境与测试环境之间的不匹配现象,从而提高系统识别准确率和抗噪性能。首先,算法从测试语音中估计出噪声特征,然后用一个单高斯模型对噪声特征进行拟合得到噪声均值和协方差。最后,根据得出的噪声均值和协方差,调整训练好的高斯混合模型均值向量和协方差矩阵,使其尽可能地匹配测试环境。实验结果表明,该方法可以准确地重构干净语音的高斯混合模型参数,并且能够显著提高说话人识别的准确率,特别是在低信噪比情况下。 相似文献
9.
在实际环境中,训练环境和测试环境的失配会导致语音识别系统的性能急剧恶化.模型自适应算法是减小环境失配影响的有效方法之一,它通过少量自适应数据将模型参数变换到识别环境.最大似然线性回归是一种常用的基于变换的模型自适应算法,本文针对最大似然线性回归算法在数据较少时模型参数估计不准确的缺点,提出了基于最大似然子带线性回归的模型自适应算法.该算法将Mel滤波器组的全部通道划分为若干个子带,假设每个子带内多个通道的模型均值分量共享一个线性环境变换关系,以增加可用的数据.实验表明,本文算法可以较好地克服数据稀疏问题,只需要很少的数据即可取得较好的自适应效果,尤其适合于少量数据时的快速模型自适应. 相似文献
10.
11.
12.
13.
14.
16.
本文提出了一种新的用于片上的语音识别多级搜索算法.该算法以连续隐含马尔可夫模型(Continuous Density HMM,CDHMM)为基本识别框架.在保证识别率基本不变的前提下,大大降低了片内存储空间的占用量,减少了识别搜索时间.在第二级识别候选词条的选取准则上,提出一种基于置信度的选择方法,更进一步改善了识别速度,增强了识别的稳健性.在200个语音命令的识别任务下,系统的识别率为98.83%.而当识别词条增加到600条时,该算法也具有良好的识别性能. 相似文献
17.
Sadaoki Furui 《The Journal of VLSI Signal Processing》2005,41(3):245-254
The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due
to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods
that are robust against voice variation due to individuality, the physical and psychological condition of the speaker, telephone
sets, microphones, network characteristics, additive background noise, speaking styles, and other aspects. This paper overviews
robust architecture and modeling techniques for speech recognition and understanding. The topics include acoustic and language
modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture
for spoken dialogue systems, multi-modal speech recognition, and speech summarization. This paper also discusses the most
important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.
Dr. Sadaoki Furui is currently a Professor at Tokyo Institute of Technology, Department of Computer Science. He is engaged in a wide range
of research on speech analysis, speech recognition, speaker recognition, speech synthesis, and multimodal human-computer interaction
and has authored or coauthored over 450 published articles. From 1978 to 1979, he served on the staff of the Acoustics Research
Department of Bell Laboratories, Murray Hill, New Jersey, as a visiting researcher working on speaker verification. He is
a Fellow of the IEEE, the Acoustical Society of America and the Institute of Electronics, Information and Communication Engineers
of Japan (IEICE). He was President of the Acoustical Society of Japan (ASJ) from 2001 to 2003 and the Permanent Council for
International Conferences on Spoken Language Processing (PC-ICSLP) from 2000 to 2004. He is currently President of the International
Speech Communication Association (ISCA). He was a Board of Governor of the IEEE Signal Processing Society from 2001 to 2003.
He has served on the IEEE Technical Committees on Speech and MMSP and on numerous IEEE conference organizing committees. He
has served as Editor-in-Chief of both Journal of Speech Communication and the Transaction of the IEICE. He is an Editorial
Board member of Speech Communication, the Journal of Computer Speech and Language, and the Journal of Digital Signal Processing.
He has received the Yonezawa Prize and the Paper Awards from the IEICE (1975, 88, 93, 2003), and the Sato Paper Award from
the ASJ (1985, 87). He has received the Senior Award from the IEEE ASSP Society (1989) and the Achievement Award from the
Minister of Science and Technology, Japan (1989). He has received the Technical Achievement Award and the Book Award from
the IEICE (2003, 1990). He has also received the Mira Paul Memorial Award from the AFECT, India (2001). In 1993 he served
as an IEEE SPS Distinguished Lecturer. He is the author of “Digital Speech Processing, Synthesis, and Recognition” (Marcel
Dekker, 1989, revised, 2000) in English, “Digital Speech Processing” (Tokai University Press, 1985) in Japanese, “Acoustics
and Speech Processing” (Kindai-Kagaku-Sha, 1992) in Japanese, and “Speech Information Processing” (Morikita, 1998) in Japanese.
He edited “Advances in Speech Signal Processing” (Marcel Dekker, 1992) jointly with Dr. M.M. Sondhi. He has translated into
Japanese “Fundamentals of Speech Recognition,” authored by Drs. L.R. Rabiner and B.-H. Juang (NTT Advanced Technology, 1995)
and “Vector Quantization and Signal Compression,” authored by Drs. A. Gersho and R. M. Gray (Corona-sha, 1998). 相似文献
18.
19.
A new class‐based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class‐specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class‐specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel‐cepstral‐based features for test sets A, B, and C, respectively. 相似文献