共查询到20条相似文献,搜索用时 109 毫秒
1.
2.
3.
4.
针对训练和识别语音数据较少的情况,本文提出了一种新的说话人识别算法.通过核映射,在高维特征空间对说话人的语音特征进行模糊矢量量化.为了增加说话人之间的可区分性,提出了一种基于高维特征空间的码字矢量的权值分配方法,对具有较强区分性的码字矢量分配较大的权值,并将产生的权值和说话人的码书一起形成说话人数据库.识别时,提出一种模糊核加权最近邻近分类器,在高维特征空间中对说话人进行匹配.实验表明,该算法在训练语音少于8s,识别语音为1s时,能够得到较好的识别结果. 相似文献
5.
6.
7.
一种改进的模糊C-均值聚类算法在说话人识别中的应用 总被引:3,自引:0,他引:3
提出了一种将改进的FCM聚类算法与矢量量化相结合的说话人识别的方法。先从语音信号中提取待识别的特征矢量集,再利用矢量量化来设计码本,最后用改进的算法对待识别语音进行辩识。该算法解决了FCM算法对初始值敏感、易陷入局部最优的问题。所使用的特征参数较少,计算比较简单,但识别率较高,且具有较好的鲁棒性。 相似文献
8.
9.
提出一种基于动态时间规整(DTW)和改进的学习矢量量化(LoPLVQ)的神经网络的语音识别方法.该方法用动态时间规整算法先对语音信号进行时间规整,然后通过改进的学习矢量量化神经网络进行语音的分类识别.实验表明,新系统在大规模语音识别方面不仅能缩短训练时间,而且具有较高的识别率. 相似文献
10.
给出了一种应用于电话语音自动拨号的实时语音识别方法。该系统对特定人的语音进行识别,并将识别结果映射成相应的电话号码。实验结果表明该方法具有很高的识别精度和实时的识别速度,并且只需很小的内存空间就可以实现,是一种有效的应用于电话语音自动拨号等方面的语音识别方法。 相似文献
11.
Weerackody V. Reichl W. Potamianos A. 《Wireless Communications, IEEE Transactions on》2002,1(2):282-291
Future wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. We consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose two unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment technique reduces the ASR error rate by more than a factor of 2.5 for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance 相似文献
12.
HMM在语音识别系统中的应用 总被引:1,自引:0,他引:1
介绍语音识别技术的应用状况与发展,对基于动态时间伸缩技术、隐含马尔科夫模型及人工神经网络的3种不同的语音识别系统进行了比较,重点介绍了隐含马尔科夫模型(HMM)在语音识别系统中的应用。其中基于HMM的语音识别系统是在UniSpeech芯片上实现基于DHMM的识别系统,然后又在同一平台上实现了基于CHMM的识别系统。 相似文献
13.
14.
The evaluation of the degree of speech impairment and the utility of computer recognition of impaired speech are separately and independently performed. Particular attention is paid to the question concerning whether or not there is a relationship between naive listeners' subjective judgments of impaired speech and the performance of a laboratory version of a speech recognition system. It is a difficult task to relate a speech impairment rating with speech recognition accuracy. Towards this end, a statistical causal model is proposed. This model is very appealing in its structure to support inference, and thus can be applied to perform various assessments such as the success of automatic recognition of dysarthric speech. The application of this model is illustrated with a case study of a dysarthric speaker compared against a normal speaker serving as a control 相似文献
15.
By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio. 相似文献
16.
一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别 总被引:4,自引:0,他引:4
视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。 相似文献
17.
18.
详细介绍一种基于神经网络的自学习非特定人语音识别方法,首次介绍一种语音识别知识的自动检验方法——LVV法,给出系统原理图和知识库的自动完善原理;介绍一种LEA判别法,实现梯度牛顿有效结合神经网络快速学习方法,并给出了实验结果。 相似文献
19.
基于电话用户交换机的语音识别系统研究 总被引:3,自引:0,他引:3
本论文对电话用户交换机研制了一个声控语音命令交换系统,该系统能够实现与特定人无关中小词汇量连续命令语音自动识别,研究中统计了用和命令语句,生成相应识别文法网络,识别系统的训练采用由子词模型构成的复合模型进行强化训练,识别采用令牌传递式改进Viterbi算法,提高系统的识别性能,论文比较了不同语音特征参数以及隐含马尔可夫模型状态数对电话语音识别精度的影响,研究中还开发识别系统拒识系统,在无拒识情况下 相似文献
20.
There has been progress in improving speech recognition using a tightly-coupled modality such as lip movement; and using additional input interfaces to improve recognition of commands in multimodal human? computer interfaces such as speech and pen-based systems. However, there has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely?coupled modality. The study investigated this idea by integrating information from gaze into an automatic speech recognition (ASR) system. A probabilistic framework for multimodal recognition was formalised and applied to the specific case of integrating gaze and speech. Gaze-contingent ASR systems were developed from a baseline ASR system by redistributing language model probability mass according to the visual attention. These systems were tested on a corpus of matched eye movement and related spontaneous conversational British English speech segments (n = 1355) for a visual-based, goal-driven task. The best performing systems had similar word error rates to the baseline ASR system and showed an increase in keyword spotting accuracy. The core values of this work may be useful for developing robust speech-centric multimodal decoding system functions. 相似文献