共查询到19条相似文献,搜索用时 171 毫秒
1.
基于HMM的安多藏语非特定人孤立词语音识别研究 总被引:1,自引:0,他引:1
以VC++6.0为开发平台,实现一个基于隐马尔可夫模型(Hidden Markov Model,简称HMM)非特定人的安多藏语孤立词语音识别系统。对有声段语音进行MFCC参数的提取,对提取后的MFCC参数进行矢量量化后训练HMM模型,形成特征模板库,最后进行识别。根据安多藏语的特点,改进端点检测的方法,提高了孤立词语音信号检测的准确性,并进一步提高了识别率。 相似文献
2.
针对当前关键词识别少资源或零资源场景下的要求, 提出一种基于音频自动分割技术和深度神经网络的关键词识别算法. 首先采用一种基于度量距离的改进型语音分割算法, 将连续语音流分割成孤立音节, 再将音节细分成和音素状态联系的短时音频片段, 分割后的音频片段具有段间特征差异大, 段内特征方差小的特点. 接着利用一种改进的矢量量化方法对音频片段的状态特征进行编码, 实现了关键词集内词的高精度量化编码和集外词的低精度量化编码. 最后以音节为识别单位, 采用压缩的状态转移矩阵作为音节的整体特征, 送入深度神经网络进行语音识别. 仿真结果表明, 该算法能从自然语音流中较为准确地识别出多个特定关键词, 算法易于理解、训练简便, 且具有较好的鲁棒性. 相似文献
3.
4.
用于拟人机器人的嵌入式语音交互系统研究 总被引:3,自引:0,他引:3
本文介绍了一种用于拟人机器人的嵌入式语音交互系统.系统采用高质量的语音
采集模块及语音输出模块,以高性能数字信号处理器(DSP)TMS320VC5402为硬件核心.HMM语音识别引擎以LPC倒谱及其差分分量作为语音特征表达,改进的Baum Welch重估算法完成了多观察值序列下的语音模板训练.同时进行了语音特征不同表达形式对识别结果影响的对比实验.系统外围控制程序完成识别结果提示以及与上位机的通讯.系统在词汇量为200的非特定人、孤立词识别上取得了很好的效果. 相似文献
5.
6.
一种新的基于LBG和DTW的模板训练算法 总被引:1,自引:1,他引:0
提出了一种新的基于LBG和DTW结合的模板训练算法,包括模板训练、初始模板设置、空子集处理三个部分,能够完整、有效地解决语音识别中模板训练的问题。该算法实现了语音信号特征矩阵的聚类及其质心的生成,使孤立词语音识别系统更好地适用于非特定人的情况,提高了系统对训练集外说话人语音的正确识别率。设计、实现了一个识别系统,模板训练中较快的收敛速度和系统较高的识别率验证了算法的优良性能。 相似文献
7.
关键词识别是一种特殊的语音识别技术,它旨在连续语音中通过对特征矢量的判别检测出感兴趣的词。论文给出一种新的检测方法,应用线性判别分析(LDA)对语音特征参数进行降维,使得分类更加清晰。通过实验表明了该方法提高了系统的性能。 相似文献
8.
9.
10.
11.
12.
最大熵模型能够充分利用上下文,灵活取用多个特征。使用最大熵模型进行哈萨克语的词性标注,根据哈语的粘着性、形态丰富等特点设计特征模板,并加入了向后依赖词性的特征模板。对模型进行了改进,在解码中取概率最大的前n个词性分别加入下一个词的特征向量中,以此类推直至句子结束,最终选出一条概率最优的词性标注序列。实验结果表明,特征模板的选择是正确的,改进模型的准确率达到了96.8%。 相似文献
13.
介绍了一种基于分而治之的语音识别错误纠正方案,首先利用混淆网络把连续语音识别问题转换为顺序的、独立的分类子任务。每个分类子任务可看做是孤立词识别问题,通过训练专门的支持向量机来区分混淆网络的识别候选。提出了一种快速的基于码本转换的语音向量对齐方法,解决了变长语音向量无法直接作为支持向量机输入的问题。通过一个普通话音节识别任务的实验结果表明,该方案能有效提高系统的正确率。 相似文献
14.
S. D. Apte 《International Journal of Speech Technology》2007,10(1):57-62
The paper proposes an innovative technique for generation of optimal mother wavelet using LPC trajectory with special reference
to speech recognition. A new wavelet based model is proposed for speech signal processing. Lower order linear predictor coefficients
(LPC) are related to the vocal tract area near lip that is the articulating organ. The trajectory of second LPC is proposed
for the generation of mother wavelet for speech recognition. The observation interval is selected as the pitch period that
represents one complete cycle of speech waveform. LPC of order 10 are evaluated for each pitch synchronous (PS) segment. An
innovative technique is proposed for the generation of mother wavelet. The mother wavelet is separately generated for each
word utterance. This generates a multidimensional space for speech words and increases the recognition accuracy. The wavelet
transform (WT) coefficients are evaluated with respect to the generated mother wavelet for each word utterance and are stored
as template along with the generated mother wavelet for each word utterance. The data base consists of 30 word utterances
recorded locally using the sound recorder facility. In the recognition mode, the external word utterance is scanned and is
divided into PS segments. The trajectory of second LPC is tracked. WT coefficients are evaluated with respect to the mother
wavelet of each word in the vocabulary and are compared with the template for each word. The results indicate 100% recognition
accuracy. 相似文献
15.
Jianxiong Wu Chorkin Chan 《IEEE transactions on pattern analysis and machine intelligence》1993,15(11):1174-1185
This paper presents an artificial neural network (ANN) for speaker-independent isolated word speech recognition. The network consists of three subnets in concatenation. The static information within one frame of speech signal is processed in the probabilistic mapping subnet that converts an input vector of acoustic features into a probability vector whose components are estimated probabilities of the feature vector belonging to the phonetic classes that constitute the words in the vocabulary. The dynamics capturing subnet computes the first-order cross correlation between the components of the probability vectors to serve as the discriminative feature derived from the interframe temporal information of the speech signal. These dynamic features are passed for decision-making to the classification subnet, which is a multilayer perceptron (MLP). The architecture of these three subnets are described, and the associated adaptive learning algorithms are derived. The recognition results for a subset of the DARPA TIMIT speech database are reported. The correct recognition rate of the proposed ANN system is 95.5%, whereas that of the best of continuous hidden Markov model (HMM)-based systems is only 91.0% 相似文献
16.
基于MFCC和加权矢量量化的说话人识别系统 总被引:14,自引:4,他引:14
文章介绍的说话人识别系统,采用能够反映人对语音的感知特性的Mel频率倒谱系数(Mel-FrequencyCeptralCoefficients,MFCC)作为特征参数,同时考虑到特征参数各维分量对于不同说话人的区分程度,采用加权的办法进行矢量量化。取得了很好的结果,系统训练和识别计算量和存储量都比较低。 相似文献
17.
18.
19.
BP神经网络应用于孤立词语发音识别的研究 总被引:2,自引:1,他引:1
介绍了BP神经网络的学习规则和用于语音识别的基本原理,建立了一个用于常用孤立词语音识别的BP神经网络,选择声道反射系数为语音识别的特征值,建立了网络的训练样本集,对网络进行了训练;用MATLAB进行了识别仿真,表明能较好地实现孤立词语音识别. 相似文献