共查询到18条相似文献,搜索用时 62 毫秒
1.
2.
3.
基于改进语音特征提取方法的语音识别 总被引:1,自引:1,他引:0
在分析语音特征提取方法基础上提出一种改进组合算法,并采用HMM声学模型和Viterbi算法进行模式训练和识别.实验结果表明,该算法在噪声环境中具有较好的鲁棒性,能有效提高噪声环境下中文连续语音识别的正确率,增强语音识别整体性能,因此在噪声环境下的语音识别系统中具有一定的实用价值. 相似文献
4.
线性预测HMM(Linear Prediction HMM,LPHMM)并没有象传统HMM那样引入状态输出独立同分布假设,但实用中识别性能并不佳.通过分析两种HMM的各自优劣,本文提出了一种新的语音识别的混合模型,将语音静态特性(基于传统HMM)和动态特性(基于LPHMM)分别描述又有机结合在一起,更为精确地刻划了真实的语音现象,同时又继承使系统的实现改动很小和较小的计算量.汉语大词汇量非特定人连续语音识别的实验表明,混合模型的识别性能显著好于LPHMM和传统HMM.理论上,本文还给出了LPHMM的一组闭式参数重估公式. 相似文献
5.
针对字母手势的检测和跟踪问题,文章提出一种基于最大似然准则Hausdorff距离的手势识别算法。该算法首先对字母手势图像进行二值化处理,并由字母手势图像的边缘信息中提取字母手势的关键点(指根和指尖);然后采用基于最大似然准则的Hausdorff距离对手势进行识别,搜索策略采用类似于Rucklidge提出的多分辨率搜索方法,在不影响成功率和目标定位精度的情况下,可以显著地缩短搜索时间。实验结果表明此方法可以较好地识别字母手势,同时对部分变形(旋转和缩放)手势也有良好的效果。 相似文献
6.
7.
为解决有记忆非线性的连续相位调制(CPM)信号调制方式识别精度低的问题,该文提出一种基于记忆因子的CPM信号最大似然调制识别新方法。该方法定义具有时齐马尔科夫性的映射符号,通过计算其后验概率构造记忆因子,进一步结合CPM分解和EM算法,推导出时间可分离,信道参数可估计的CPM信号似然函数。该调制识别方法所需符号数目少,适用信噪比范围广,识别CPM信号种类多且精度高,对相位误差鲁棒性强。仿真结果证明,当符号数目为200,信噪比为0 dB,相位误差任意时,该方法对8种CPM信号的识别率可达95%以上。 相似文献
8.
大口径平面镜作为光学系统的重要组成部分, 其面形精度对系统成像具有重要影响。子孔径拼接检测作为大口径光学平面反射镜检测的常用手段, 子孔径拼接算法是该技术的核心。研究了平面子孔径拼接算法, 基于最大似然估计与正交化Zernike多项式拟合建立了一套合理的拼接算法与数学模型, 基于该算法模型可以有效实现对大口径平面镜的拼接检测, 同时编写了相应的拼接程序, 并利用100 mm干涉仪对120 mm的平面镜进行了拼接检测, 给出了拼接检测与全口径检测的对比结果, 对比结果表明: 拼接所得全孔径相位分布与全口径检测结果的RMS值偏差分别为0.002, 验证了算法的可靠性与准确性。 相似文献
9.
10.
根据Flether等人的研究,基于感知独立性假设的子带识别方法被用于抗噪声鲁棒语音识别。本文拓展子带方法,采用基于噪声污染假定的多带框架来减少噪声影响。论文不仅从理论上分析了噪声污染假定多带框架在识别性能上的潜在优势,而且提出了多带环境下的鲁棒语音识别算法。研究表明:多带框架不仅回避了独立感知假设要求,而且与子带方法相比,多带方法能更好的减少噪声影响,提高系统识别性能。 相似文献
11.
噪声自适应的多数据流复合子带语音识别方法 总被引:3,自引:0,他引:3
首先针对现有丢失数据语音识别技术中的边缘化(marginalisation)技术在特征运用上的局限,提出了一种倒谱特征分量的可靠性估计方法,将边缘化技术推广到常用的倒谱语音识别系统中; 然后利用基于全带和子带倒谱特征的边缘化识别器在不同噪声中的互补性能,提出了一种噪声自适应的多数据流复合子带语音识别方法。实验结果表明,所提识别方法可以自适应地选出全带和子带数据流中受噪声影响较小者并以之为主要依据进行识别,有效地提高了识别系统在多变噪声环境中的鲁棒性。 相似文献
12.
For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%. 相似文献
13.
基于子带能量累积变化的语音端点检测 总被引:1,自引:0,他引:1
噪声环境下的语音端点检测在稳健语音识别中占有十分重要的地位。根据噪音和语音子带能量的累积分布变化,提出一种新的语音信号端点检测算法。通过计算各帧的子带能量变化程度,并以此设定门限进行语音端点的检测。实验表明,与一些传统的端点检测算法比较,该算法在速度和抗噪声能力上都有所增强,适合低信噪比下的语音端点检测。 相似文献
14.
摘 要 稀疏编码(SRC)是一种用于人脸识别的方法。该方法把检测图像表示为一组训练样本的稀疏线性组合,表示的准确性通过L2或L1残余项来衡量。此模型假定编码残余项服从高斯分布或拉普拉斯分布,实际上却不能很准确的描述编码错误率。本文提出一种新的稀疏编码方法,建立一种有约束的回归问题模型。最大似然稀疏编码(MSC)寻找此模型的最大似然估计参数,对异常情况具有很强的鲁棒性。在Yale及ORL人脸数据库的实验结果表明了该方法对于人脸模糊、光照及表情变化等的有效性及鲁棒性。 相似文献
15.
16.
The maximum likelihood linear spectral transformation (ML‐LST) using a numerical iteration method has been previously proposed for robust speech recognition. The numerical iteration method is not appropriate for real‐time applications due to its computational complexity. In order to reduce the computational cost, the objective function of the ML‐LST is approximated and a closed‐form solution is proposed in this paper. It is shown experimentally that the proposed closed‐form solution for the ML‐LST can provide rapid speaker and environment adaptation for robust speech recognition. 相似文献
17.
Sadaoki Furui 《The Journal of VLSI Signal Processing》2005,41(3):245-254
The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due
to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods
that are robust against voice variation due to individuality, the physical and psychological condition of the speaker, telephone
sets, microphones, network characteristics, additive background noise, speaking styles, and other aspects. This paper overviews
robust architecture and modeling techniques for speech recognition and understanding. The topics include acoustic and language
modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture
for spoken dialogue systems, multi-modal speech recognition, and speech summarization. This paper also discusses the most
important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.
Dr. Sadaoki Furui is currently a Professor at Tokyo Institute of Technology, Department of Computer Science. He is engaged in a wide range
of research on speech analysis, speech recognition, speaker recognition, speech synthesis, and multimodal human-computer interaction
and has authored or coauthored over 450 published articles. From 1978 to 1979, he served on the staff of the Acoustics Research
Department of Bell Laboratories, Murray Hill, New Jersey, as a visiting researcher working on speaker verification. He is
a Fellow of the IEEE, the Acoustical Society of America and the Institute of Electronics, Information and Communication Engineers
of Japan (IEICE). He was President of the Acoustical Society of Japan (ASJ) from 2001 to 2003 and the Permanent Council for
International Conferences on Spoken Language Processing (PC-ICSLP) from 2000 to 2004. He is currently President of the International
Speech Communication Association (ISCA). He was a Board of Governor of the IEEE Signal Processing Society from 2001 to 2003.
He has served on the IEEE Technical Committees on Speech and MMSP and on numerous IEEE conference organizing committees. He
has served as Editor-in-Chief of both Journal of Speech Communication and the Transaction of the IEICE. He is an Editorial
Board member of Speech Communication, the Journal of Computer Speech and Language, and the Journal of Digital Signal Processing.
He has received the Yonezawa Prize and the Paper Awards from the IEICE (1975, 88, 93, 2003), and the Sato Paper Award from
the ASJ (1985, 87). He has received the Senior Award from the IEEE ASSP Society (1989) and the Achievement Award from the
Minister of Science and Technology, Japan (1989). He has received the Technical Achievement Award and the Book Award from
the IEICE (2003, 1990). He has also received the Mira Paul Memorial Award from the AFECT, India (2001). In 1993 he served
as an IEEE SPS Distinguished Lecturer. He is the author of “Digital Speech Processing, Synthesis, and Recognition” (Marcel
Dekker, 1989, revised, 2000) in English, “Digital Speech Processing” (Tokai University Press, 1985) in Japanese, “Acoustics
and Speech Processing” (Kindai-Kagaku-Sha, 1992) in Japanese, and “Speech Information Processing” (Morikita, 1998) in Japanese.
He edited “Advances in Speech Signal Processing” (Marcel Dekker, 1992) jointly with Dr. M.M. Sondhi. He has translated into
Japanese “Fundamentals of Speech Recognition,” authored by Drs. L.R. Rabiner and B.-H. Juang (NTT Advanced Technology, 1995)
and “Vector Quantization and Signal Compression,” authored by Drs. A. Gersho and R. M. Gray (Corona-sha, 1998). 相似文献