首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
采用模糊聚类C均值聚类确定型心改进LBG算法,实现语音参数MFCC码本的矢量量化,实验结果表明,该算法有着与单一LBG算法相近的量化误差,自适应确定码本大小码,码本尺寸显著降低,减小码本的存储量。  相似文献   

2.
针对蓝牙语音信号加密后失去语音特征而不能通过语音信道传输的问题,建立蓝牙语音加密数据传输模型,提出一种面向蓝牙语音加密传输的波形码本生成算法。该算法以子载波调制生成初始调制码本,训练数据得到解调码本,通过设计末位淘汰机制的粒子对算法寻找最优码本。仿真分析表明该码本生成算法具有收敛速度快的优势,能够生成不同比特传输速率和符号错误率低的波形码本。实验结果表明,在蓝牙中使用该波形码本传输数据具有较低的符号错误率。  相似文献   

3.
基于方差归一化失真测度的改进的LBG算法   总被引:2,自引:1,他引:2  
矢量量化(VQ)技术在话者识别系统中得到了广泛的应用。 VQ码本的产生通常采用 LBG算法,失真测度则为对矢量的各分量等权重的欧氏距离。在话者识别系统中特征矢量的各个分量的分布是有差别的,且对于不同的话者,这种差别的程度又是不一样的。由于不同分布的各维参数对话者识别的有效性各不相同,因此,文章提出了一种能反映这种有效性差别的失真测度,即:方差归一化失真测度。以该失真测度为基础,并结合时序相关的初始码本设计方法及有效的零胞腔处理技术,文章提出了改进的LBG算法,同时利用该算法训练出改进的VQ话者模型,并进行了话者识别实验。  相似文献   

4.
为解决采用矢量量化的方法进行说话人识别时出现的失真问题,根据汉语语音的发音特性,提出了将矢量量化与语音特征的聚类技术相结合的方法,在进行矢量量化码书训练之前,先对特征矢量进行聚类筛选。实验结果表明,当测试语音片段长度为4 s时,在保持95%左右识别率下,采用普通矢量量化方法需64码本数,而采用该文方法只需8码本数,降低了8倍。结果说明该方法不但在一定程度上解决了因训练样本不足而引起的失真问题,而且通过方法的改进,实现了采用较低码字数产生较好的识别结果,从而提高识别效率。  相似文献   

5.
为充分利用码本的级间相关性,提出了一种联合码本优化多级矢量量化(JCO-MSVQ)码本设计方法。每次迭代时,先将训练矢量对码字进行聚类,再对各级码本进行联合优化,利用条件期望逐级更新码本。实验数据表明,该算法在设计10维线谱频率(LSF)参数量化码本时,较随机松弛算法(SR)码本有更小的平均量化畸变。23比特/帧LSF参数量化器平均对数谱失真为0.87dB,达到了透明量化要求。  相似文献   

6.
针对LBG算法初始码本随机选取后易出现空胞腔、易陷入局部极小、迭代次数大等缺陷,本文依据模糊聚类理论引入了矢量量化码本设计训练的模糊聚类与LBG级联算法:先用模糊聚类算法训练码本,将训练得到的码本作为传统LBG算法的初始码本,再用传统LBG算法训练.论述了模糊聚类和LBG联合算法的原理与方法;用该算法分剐训练了语音线性...  相似文献   

7.
提出一种将减法聚类与改进的模糊C-均值聚类相结合并用于说话人识别的方法.该方法将从语 音信号中提取的Mel 频率倒谱系数及其差分作为特征参数;用减法聚类算法初始化聚类中心,再用改进的模 糊C-均值聚类算法进行修正,形成码本.识别时,对每一个待识别语音进行模糊聚类识别.仿真结果表明,该 方法比改进的模糊C-均值聚类算法识别率高,具有较好的鲁棒性,且计算比较简单.  相似文献   

8.
语音识别中的一种说话人聚类算法   总被引:1,自引:1,他引:1  
本文介绍了稳健语音识别中的一种说话人聚类算法,包括它在语音识别中的作用和具体的用法,聚类中常用的特征、距离测度,聚类的具体实现步骤等。我们从两个方面对该算法的性能进行了测试,一是直接计算句子聚类的正确率,二是对说话人自适应效果的改进的作用,即比较使用此算法后系统性能的改进进行评价。实验表明:在使用GLR 距离作为距离测度的时候,该算法对句子的聚类正确率达85169 %;在识别实验中,该聚类算法的使用,使得用于说话人自适应的数据更加充分,提高了自适应的效果,系统的误识率已经接近利用已知说话人信息进行自适应时的误识率。  相似文献   

9.
自适应矢量量化在语音处理中有广泛的应用,提出了一种基于SFCM算法的自适应矢量量化码本的训练方法,其特点是通过模糊聚类方法,重新调整训练样本与码字之间的隶属度,达到最小编码失真,使码本更适合新说话人,且计算简单,方法的实验结果表明,可以使编码平均失真下降。  相似文献   

10.
提出一种可适应非平稳噪声环境的基于码本学习的改进谱减语音增强算法。该算法分为训练阶段和增强阶段。训练阶段,使用自回归模型对语音和噪声的频谱形状进行建模并构造语音和噪声码本;增强阶段,采用对数谱最小化算法估计出语音和噪声的频谱,通过谱相减消除噪声。算法在每个时间帧估计语音和噪声频谱,即使在语音存在时仍能够有效跟踪快速变化的非平稳噪声;采用自回归模型能得到噪声频谱的平滑估计,减少了音乐噪声。实验仿真表明,相比于传统谱减法和多带谱减法,改进的谱减法具有更好的噪声抑制性能并且语音失真更小。  相似文献   

11.
In this paper, parameter estimation of a state-space model of noise or noisy speech cepstra is investigated. A blockwise EM algorithm is derived for the estimation of the state and observation noise covariance from noise-only input data. It is supposed to be used during the offline training mode of a speech recognizer. Further a sequential online EM algorithm is developed to adapt the observation noise covariance on noisy speech cepstra at its input. The estimated parameters are then used in model-based speech feature enhancement for noise-robust automatic speech recognition. Experiments on the AURORA4 database lead to improved recognition results with a linear state model compared to the assumption of stationary noise.   相似文献   

12.
In this paper we report our recent research whose goal is to improve the performance of a novel speech recognizer based on an underlying statistical hidden dynamic model of phonetic reduction in the production of conversational speech. We have developed a path-stack search algorithm which efficiently computes the likelihood of any observation utterance while optimizing the dynamic regimes in the speech model. The effectiveness of the algorithm is tested on the speech data in the Switchboard corpus, in which the optimized dynamic regimes computed from the algorithm are compared with those from exhaustive search. We also present speech recognition results on the Switchboard corpus that demonstrate improvements of the recognizer’s performance compared with the use of the dynamic regimes heuristically set from the phone segmentation by a state-of-the-art hidden Markov model (HMM) system.  相似文献   

13.
This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words. It is also shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time. The neural network language model was thoroughly evaluated in a state-of-the-art large vocabulary continuous speech recognizer for several international benchmark tasks, in particular the Nist evaluations on broadcast news and conversational speech recognition. The new approach is compared to four-gram back-off language models trained with modified Kneser–Ney smoothing which has often been reported to be the best known smoothing method. Usually the neural network language model is interpolated with the back-off language model. In that way, consistent word error rate reductions for all considered tasks and languages were achieved, ranging from 0.4% to almost 1% absolute.  相似文献   

14.
基于RBF神经网络的抗噪语音识别   总被引:1,自引:0,他引:1  
针对目前在噪音环境下语音识别系统性能较差的问题,利用RBF神经网络具有最佳逼近性能、训练速度快等特性,分别采用聚类和全监督训练算法,实现了基于RBF神经网络的抗噪语音识别系统。聚类算法的隐含层训练采用K-均值聚类算法,输出层的学习采用线性最小二乘法;全监督算法中所有参数的调整基于梯度下降法,它是一种有监督学习算法,能够选出性能优良的参数。实验表明,在不同的信噪比下,全监督算法较之聚类算法有更高的识别率。  相似文献   

15.
用于拟人机器人的嵌入式语音交互系统研究   总被引:3,自引:0,他引:3  
陈斌  郭大勇  施克仁 《机器人》2003,25(5):452-455
本文介绍了一种用于拟人机器人的嵌入式语音交互系统.系统采用高质量的语音 采集模块及语音输出模块,以高性能数字信号处理器(DSP)TMS320VC5402为硬件核心.HMM语音识别引擎以LPC倒谱及其差分分量作为语音特征表达,改进的Baum Welch重估算法完成了多观察值序列下的语音模板训练.同时进行了语音特征不同表达形式对识别结果影响的对比实验.系统外围控制程序完成识别结果提示以及与上位机的通讯.系统在词汇量为200的非特定人、孤立词识别上取得了很好的效果.  相似文献   

16.
We present a system that can separate and recognize the simultaneous speech of two people recorded in a single channel. Applied to the monaural speech separation and recognition challenge, the system out-performed all other participants – including human listeners – with an overall recognition error rate of 21.6%, compared to the human error rate of 22.3%. The system consists of a speaker recognizer, a model-based speech separation module, and a speech recognizer. For the separation models we explored a range of speech models that incorporate different levels of constraints on temporal dynamics to help infer the source speech signals. The system achieves its best performance when the model of temporal dynamics closely captures the grammatical constraints of the task. For inference, we compare a 2-D Viterbi algorithm and two loopy belief-propagation algorithms. We show how belief-propagation reduces the complexity of temporal inference from exponential to linear in the number of sources and the size of the language model. The best belief-propagation method results in nearly the same recognition error rate as exact inference.  相似文献   

17.
针对抗噪声语音特征技术和基于MFCC特征的模型补偿技术在低信噪比时识别率不高的缺点,将抗噪声语音特征和模型补偿结合起来,提出了一种基于单边自相关序列(One—sided autocorrelation,OSA)MFCC特征的模型补偿噪声语音识别方法,以提高语音识别系统在低信噪比时的性能。对0~9十个英文数字和NOISEX92中的白噪声、F16噪声和FACTORY噪声的识别实验结果表明.本文提出的识别方法可以有效地提高OSA—MFCC识别器在噪声环境中的识别率,并且在低信噪比时其性能明显优于经过相同补偿处理的MFCC识别器。  相似文献   

18.
19.
一种联合语种识别的新型大词汇量连续语音识别算法   总被引:1,自引:1,他引:0  
单煜翔  邓妍  刘加 《自动化学报》2012,38(3):366-374
提出了一种联合语种识别的新型大词汇量连续语音识别(Large vocabulary continuous speech recognition, LVCSR)算法,并构建了实时处理系统. 该算法能够充分利用语音解码过程中收集的音素识别假设,在识别语音内容的同时识别语种类别.该系统可以应用于多语种环境,不仅可以以更小的系统整体计算开销替代独立的语种识别模块,更能有效应对在同一段语音中混有非目标语种的情况,极大地减少由非目标语种引入的无意义识别错误,避免错误积累对后续识别过程的误导.为将语音内容识别和语种识别紧密整合在一个统一语音识别解码过程中,本文提出了三种不同的算法对解码产生的音素格结构进行调整(重构):一方面去除语音识别中由发音字典和语言模型引入的特定目标语种偏置,另一方面在音素格中包含更加丰富的音素识别假设.实验证明, 音素格重构算法可有效提高联合识别中语种识别的精度.在汉语为目标语种、汉英混杂的电话对话语音库上测试表明,本文提出的联合识别算法将集外语种引起的无意义识别错误减少了91.76%,纯汉字识别错误率为54.98%.  相似文献   

20.
Computer speech recognition has been very successful in limited domains and for isolated word recognition. However, widespread use of large-vocabulary continuous-speech recognizers is limited by the speed of current recognizers, which cannot reach acceptable error rates while running in real time. This paper shows how to harness shared memory multiprocessors, which are becoming increasingly common, to increase the speed significantly, and therefore the accuracy or vocabulary size, of a speech recognizer. To cover the necessary background, we begin with a tutorial on speech recognition. We then describe the parallelization of an existing high-quality speech recognizer, achieving a speedup of a factor of 3, 5, and 6 on 4-, 8-, and 12-processors respectively for the benchmark North American business news (NAB) recognition task.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号