共查询到20条相似文献,搜索用时 171 毫秒
1.
2.
为了在语音识别中增强对不同语音单元之间的相关性的利用,该文基于空间相关性变换(Spatial Correlation Transformation,SCT)框架,提出一种新的模型训练算法,在说话人无关模型的训练中利用训练数据中的空间相关性进行模型参数重估。该算法对所有训练数据进行空间相关性变换,削弱数据间的空间相关性,使重估的模型更不依赖训练数据,以改善模型的性能。实验表明,基于空间相关性变换框架的模型训练方法与基于该框架的特征变换方法相结合,使系统的平均错误率相对基线系统下降了18%。 相似文献
3.
提出一种约束条件下的结构化高斯混合模型及非平行语料语音转换方法.从源与目标说话人的原始非平行语料中提取出少量相同音节,在结构化高斯混合模型的训练过程中,利用这些相同音节包含的语义信息及声学特征对应关系对K均值聚类中心进行约束,并在(Expectation Maximum,EM)迭代过程中对语音帧属于模型分量的后验概率进行修正,得到基于约束的结构化高斯混合模型(Structured Gaussian Mixture Model with Constraint condition,C-SGMM).再利用全局声学结构(Acoustic Universal Structure,AUS)原理对源和目标说话人的约束结构化高斯混合模型的高斯分布进行匹配对准,推导出短时谱转换函数.主观和客观评价实验结果表明,使用该方法得到的转换后语音在谱失真,目标倾向性和语音质量等方面均优于传统的结构化模型语音转换方法,转换语音的平均谱失真仅为0.52,说话人正确识别率达到95.25%,目标语音倾向性指标ABX平均为0.82,性能更加接近于基于平行语料的语音转换方法. 相似文献
4.
研究了修正Fukunaga-koontz变换在说话人识别中的应用方法。通过修正Fukunaga-koontz变换对说人语音特征空间进行了降维,并通过高斯混合模型进行说话人建模。采用NIST 2006年测试的1conv4w-1conv4w作为实验,对比了LDA方法与修正Fukunaga-koontz变换在说话人识别中的识别性能。结果证实,将修正Fukunaga-koontz变换用于说话人识别获得了理想的效果,与传统的LDA降维方法相比,识别性能得到了较大的提升。 相似文献
5.
基于改进语音特征提取方法的语音识别 总被引:1,自引:1,他引:0
在分析语音特征提取方法基础上提出一种改进组合算法,并采用HMM声学模型和Viterbi算法进行模式训练和识别.实验结果表明,该算法在噪声环境中具有较好的鲁棒性,能有效提高噪声环境下中文连续语音识别的正确率,增强语音识别整体性能,因此在噪声环境下的语音识别系统中具有一定的实用价值. 相似文献
6.
7.
提出一种基于层级狄利克雷过程隐马尔科夫模型(HDPHMM)符号化器的无监督语音查询样例检测(QbE-STD)方法。该方法首先应用一个双状态层隐马尔科夫模型,其中顶层状态用于表示所发现的声学单元,底层状态用于建模顶层状态的发射概率,通过对顶层状态假设一个层级狄利克雷过程先验,获得非参贝叶斯模型HDPHMM。使用无标注语音数据对该模型进行训练,然后对测试语音和查询样例输出后验概率特征矢量,使用非负矩阵分解算法对后验概率进行优化得到新的特征,然后在此基础上,应用修正分段动态时间规整算法进行检索,构成QbE-STD系统。实验结果表明,相比于基于高斯混合模型符号化器的基线系统,本文所提出的方法性能更优,检索精度得到显著提升。 相似文献
8.
高斯混合模型采用固定混合数结构的建模方法并不符合说话人语音特征分布的多样性,从而出现过拟合或者欠拟合的情况并影响系统的识别性能。提出一种混合数可变的自适应高斯混合模型并将其应用于说话人识别。模型训练中根据说话人语音特征参数分布的聚类特性,采用吸收合并与分裂机制动态调整混合数以获得更加精确的拟合性能,提高系统识别率。实验结果显示,在特征参数MFCC和BFCC(Bilinear Frequency Cepstrum Coefficients)下相对误识率分别下降了41.41%和22.21%。 相似文献
9.
10.
在VoIP说话人识别中,当使用原始语音(未经过编译码处理)训练的说话人模型识别经过语音编译码处理的测试语音时,系统的识别性能会发生下降.本文给出了一种基于统计匹配和EM(期望最大化)算法的VoIP说话人特征(12阶的LPCC系数)补偿算法,其中对假设失真特征与未失真识别特征间符合非线性(二次函数型)和线性函数关系时的函数参数进行了估计,并使用得到的补偿函数对失真特征进行补偿.实验结果表明,该特征补偿算法对VoIP中广泛使用的G.729 8kb/s、G.723.1 6.3kb/s、G.723.1 5.3kb/s编译码所造成的识别性能下降有较大的改善,其性能也优于CMS(倒谱均值减)方法. 相似文献
11.
This communication presents a new method for automatic speech recognition in reverberant environments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation time is computed using a statistical model for short-term log-energy sequences of anechoic speech. The estimated reverberation time is then used to select the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated reverberation time, which serves to recognize the reverberated speech utterance. The proposed model selection approach is shown to improve significantly recognition accuracy for a connected digit task in both simulated and real reverberant environments, outperforming standard channel normalization techniques. 相似文献
12.
13.
Wu C.-H. Chen Y.-J. Yan G.-L. 《Vision, Image and Signal Processing, IEE Proceedings -》2000,147(1):55-61
Mandarin speech is known for its tonal characteristic, and prosodic information plays an important role in Mandarin speech recognition. Driven by this property, phonetic and prosodic information are integrated and used for Mandarin telephone speech keyword spotting. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 132 subsyllable models, two general acoustic filler models and one background/silence model are separately trained and used as the basic recognition units. For utterance verification, 12 anti-subsyllable models, 175 context-dependent prosodic models and five anti-prosodic models are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 3088 conversational speech utterances from 33 speakers (20 males and 13 females) and a vocabulary of 2583 faculty names, at 8.5% false rejection, the proposed verification method results in an 18.3% false alarm rate. Furthermore, this method is able correctly to reject 90.9% of non-keywords. Comparison with a baseline system without prosodic-phase verification shows that prosodic information can benefit the verification performance 相似文献
14.
汉语连续语音识别中不同基元声学模型的复合 总被引:1,自引:0,他引:1
该文研究由不同声学基元训练的声学模型的复合。在汉语连续语音识别中,流行的基元包括上下文相关的声韵母基元和音素基元。实验发现,有些汉语音节在声韵母模型下有更高的识别率,有些音节在音素模型下有更高的识别率。该文提出一种复合这两种声学模型的方法,一方面在识别过程中同时使用两种模型,另一方面在识别过程中避开造成低识别率的模型。实验表明,采用本文的方法后,音节错误率比音素模型和声韵母模型分别下降了9.60%和6.10%。 相似文献
15.
民航陆空通话对民航飞行安全十分重要,但因其通话模式有特殊的语法结构与发音方式,日常语音识别声学模型无法有效应用于民航陆空通话的语音处理问题。针对民航陆空通话的特殊语境,本文提出了基于双向长短时记忆网络(BiLSTM)的民航陆空通话语音识别方法。首先,提取民航陆空通话语音的FBANK特征作为输入,以时序链式连接(CTC)为目标函数,训练BiLSTM网络得到BiLSTM/CTC模型。然后,利用声学模型,语言模型与陆空通话词典实现民航陆空通话的语音识别,并结合数据增强与数据迁移对模型进行增强训练提高语音识别性能。实验结果表明本文提出的方法适用于民航陆空通话语音识别,并且数据增强模型可有效降低民航陆空通话语音识别的词错误率。 相似文献
16.
Chin-Hui Lee Qiang Huo 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1241-1269
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing 相似文献
17.
18.
On the basis of psychological acoustic theories and experiments, this paper proposes an acoustic model which is based on acoustic perceptual feature. Compared with the physiological acoustics based acoustic model, this model is more suitable to represent human's perceptual features of continuous speech, so it is suitable for recognition of continuous speech. 相似文献
19.
在语音情感识别技术中,由于噪声环境、说话方式和说话人特质原因,会造成实验数据库特征不匹配的情况。从语音学上分析,该问题多存在于跨数据库情感识别实验。训练的声学模型和用于测试的语句样本之间的错位,会使语音情感识别性能剧烈下降。本文据此所研究的选择性注意声学模型能有效探测变化的情感特征。同时,利用时频原子对模型进行改进,使之能提取跨语音数据库中的显著性特征用于情感识别。实验结果表明,利用文章所提方法在跨库情感样本上进行特征提取,再通过典型的分类器,识别性能提高了9个百分点,从而验证了该方法对不同数据库具有更好的鲁棒性。 相似文献
20.
On the basis of psychological acoustic theories and experiments, this paper proposes an acoustic model which is based on acoustic
perceptual feature. Compared with the physiological acoustics based acoustic model, this model is more suitable to represent
human’s perceptual features of continuous speech, so it is suitable for recognition of continuous speech. 相似文献