期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

余华赵力吴镇扬《电声技术》2004,(8):35-37

运用TMS320C5416实现了说话人自动识别系统，系统利用一种新的语音信号r阶的倒谱线性回归系数等参数构成识别的特征矢量集，运用模糊矢量量化技术实现了与文本有关的说话人识别。实验结果表明系统具有识别精度高、识别速度快等特点，是一种有效的说话人自动识别的硬件实现方法。相似文献

2.

语音识别无线开关控制装置的设计

《现代电子技术》2017,(14)

为了利用语音识别技术来操控无线控制设备的运转,设计一种用语音无线控制开、关设备(白炽灯等)的装置。该装置利用LD3320作为语音数据采集和处理芯片,STC12C5A60S2单片机作为语音采样对比模块的微控制器,STC15F104E单片机作为接收、控制微控制器,利用无线通信模块HC-12实现数据信号发射和接收。结果表明,该装置在语音识别和无线传输上表现出良好的性能,识别率达到97%左右,且实现了语音控制灯的开关动作。相似文献

3.

特定人汉语数码语音抗噪识别方法 总被引：1，自引：0，他引：1

徐文盛戴蓓倩方绍武陆伟《电路与系统学报》2000,5(2):58-61

本文提出一种连续隐邓尔可夫模型（ＣＨＭＭ）和人工神经网络（ＡＮＮ）相结合的鲁棒性识别方法。用于噪声环境下特定人数码语音识别,该方法以ＣＨＭＭ的输出作为系统的识别矢量,利用人工神经网络的模式分类和自学习功能,从识别矢量空间中提取语音预识别矢量,再由识别结果进行识别输出。实验证明,这种基于ＣＨＭＭＡＮＮ的数码语音识别方法明显地提高了系统的噪声鲁棒性,适用于中小词表语音识别系统。相似文献

4.

基于可区分性加权的模糊核说话人识别 总被引：2，自引：1，他引：1

下载免费PDF全文

林琳王树勋陈建《电子学报》2008,36(7):1446-1450

针对训练和识别语音数据较少的情况,本文提出了一种新的说话人识别算法.通过核映射,在高维特征空间对说话人的语音特征进行模糊矢量量化.为了增加说话人之间的可区分性,提出了一种基于高维特征空间的码字矢量的权值分配方法,对具有较强区分性的码字矢量分配较大的权值,并将产生的权值和说话人的码书一起形成说话人数据库.识别时,提出一种模糊核加权最近邻近分类器,在高维特征空间中对说话人进行匹配.实验表明,该算法在训练语音少于8s,识别语音为1s时,能够得到较好的识别结果. 相似文献

5.

有序聚类方法及其在神经网络语音识别中的应用 总被引：3，自引：1，他引：2

史笑兴顾明亮王太君何振亚《电路与系统学报》2000,5(2):99-103

本文提出了一种新的网络结构,我们称之为有序聚类网络。这种网络能够对语音信号进行特征提取,很好地解决神经网络语音识别中的时间规整问题。有序聚类网络从输入语音信号的特征矢量序列中撮出一组固定数目的特矢量,然后将这组特征矢量馈入神经网络分类器进行识别。和其他的神经网络语音识别方法相比较,用这种网络进行前端处理,可以缩短后端神经网络分类器的训练和识别时间,简化经分类器的网络产高的识别率。根据该们建立了相似文献

6.

基于VQ的说话人自动识别系统的实现

桂苹吴镇扬赵力王维新《电声技术》2003,(10):11-14

文中以语音信号的LPC倒谱系数、△倒谱系数、基音周期和△基音周期的混合特征参数作为识别说话人的特征，运用VQ技术实现了说话人自动识别。在10个说话人，1800个汉语数字和单词语音的语音库上进行了系统的识别实验，其中单音节语音的平均识别率达到了92％，双音节语音达到了96．67％，四音节语音达到了97．67％。相似文献

7.

一种改进的模糊C-均值聚类算法在说话人识别中的应用 总被引：3，自引：0，他引：3

杨彦赵力《电声技术》2006,(1):40-43

提出了一种将改进的FCM聚类算法与矢量量化相结合的说话人识别的方法。先从语音信号中提取待识别的特征矢量集,再利用矢量量化来设计码本,最后用改进的算法对待识别语音进行辩识。该算法解决了FCM算法对初始值敏感、易陷入局部最优的问题。所使用的特征参数较少,计算比较简单,但识别率较高,且具有较好的鲁棒性。相似文献

8.

自动说话人识别 总被引：7，自引：0，他引：7

王仁华《信号处理》1991,(4)

本文以语音信号LPC倒谱系数作为识别特征,运用矢量量化技术实现自动说话人识别.在一个42人、7700个语音的数据库上,我们进行了系统的识别实验,研究了不同系统参数对识别率的影响,得到了不少有参考价值的结果.本文还介绍了在此基础上研制成功的一个实时语声交互式身份确认系统,该系统作为计算机的语音锁,正确识别率达到95％以上. 相似文献

9.

改进的LVQ网络与DTW相结合的语音识别方法

吴金南宫宁生《微电子学与计算机》2009,26(5)

提出一种基于动态时间规整(DTW)和改进的学习矢量量化(LoPLVQ)的神经网络的语音识别方法.该方法用动态时间规整算法先对语音信号进行时间规整,然后通过改进的学习矢量量化神经网络进行语音的分类识别.实验表明,新系统在大规模语音识别方面不仅能缩短训练时间,而且具有较高的识别率. 相似文献

10.

语音识别技术在电话语音自动拨号的应用

卢玮姜晔《电声技术》2001,(2):30-32

给出了一种应用于电话语音自动拨号的实时语音识别方法。该系统对特定人的语音进行识别,并将识别结果映射成相应的电话号码。实验结果表明该方法具有很高的识别精度和实时的识别速度,并且只需很小的内存空间就可以实现,是一种有效的应用于电话语音自动拨号等方面的语音识别方法。相似文献

11.

An error-protected speech recognition system for wirelesscommunications

Weerackody V. Reichl W. Potamianos A. 《Wireless Communications, IEEE Transactions on》2002,1(2):282-291

Future wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. We consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose two unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment technique reduces the ASR error rate by more than a factor of 2.5 for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance 相似文献

12.

HMM在语音识别系统中的应用 总被引：1，自引：0，他引：1

苗苗马海武《现代电子技术》2006,29(16):64-66

介绍语音识别技术的应用状况与发展,对基于动态时间伸缩技术、隐含马尔科夫模型及人工神经网络的3种不同的语音识别系统进行了比较,重点介绍了隐含马尔科夫模型(HMM)在语音识别系统中的应用。其中基于HMM的语音识别系统是在UniSpeech芯片上实现基于DHMM的识别系统,然后又在同一平台上实现了基于CHMM的识别系统。相似文献

13.

全自动中文新闻字幕生成系统的设计与实现

下载免费PDF全文

郑李磊谢磊芦咪咪王晓暄杨玉莲张艳宁《电子学报》2011,39(Z1):69-74

本文设计与实现了一个全自动中文新闻字幕生成系统,输入为新闻视频,输出为视频对应的字幕文本.以<新闻联播>为语料,实现了音频提取、音频分类与切分、说话人识别、大词汇量连续语音识别、视频文件的播放和文本字幕的自动生成等多项功能.新闻字幕的自动生成,避免了繁重费时的人工字幕添加过程.实验表明,该系统识别率高,能够满足听障等特... 相似文献

14.

A statistical causal model for the assessment of dysarthric speechand the utility of computer-based speech recognition

Sy B.K. Horowitz D.M. 《IEEE transactions on bio-medical engineering》1993,40(12):1282-1298

The evaluation of the degree of speech impairment and the utility of computer recognition of impaired speech are separately and independently performed. Particular attention is paid to the question concerning whether or not there is a relationship between naive listeners' subjective judgments of impaired speech and the performance of a laboratory version of a speech recognition system. It is a difficult task to relate a speech impairment rating with speech recognition accuracy. Towards this end, a statistical causal model is proposed. This model is very appealing in its structure to support inference, and thus can be applied to perform various assessments such as the success of automatic recognition of dysarthric speech. The application of this model is illustrated with a case study of a dysarthric speaker compared against a normal speaker serving as a control 相似文献

15.

Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping

《Journal of Visual Communication and Image Representation》2015

By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio. 相似文献

16.

一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别 总被引：4，自引：0，他引：4

谢磊付中华蒋冬梅赵荣椿 Wernet Verhelst Hichem Sahli Jan Conlenis 《电子与信息学报》2005,27(1):64-68

视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80％以上。相似文献

17.

汉语语音识别的抗噪性前端算法及性能分析

林建臻孙甲松王作英《电声技术》2004,(3):45-48,52

讨论了欧洲电信标准委员会ETSI提出的分布式语音识别系统的抗噪前端特征提取算法,该算法融合多种抗噪技术。结合汉语语音的特点,进行了汉语语音识别整体框架下的算法实现,并进行了实验和分析,典型噪声环境下的识别结果证明,相对于基线MFCC特征提取算法,稳健性有较大提高。相似文献

18.

基于神经网络的自学习非特定人语音识别研究

徐秀平李柱峰《电声技术》2004,(6):30-32

详细介绍一种基于神经网络的自学习非特定人语音识别方法,首次介绍一种语音识别知识的自动检验方法——LVV法,给出系统原理图和知识库的自动完善原理;介绍一种LEA判别法,实现梯度牛顿有效结合神经网络快速学习方法,并给出了实验结果。相似文献

19.

基于电话用户交换机的语音识别系统研究 总被引：3，自引：0，他引：3

刘加胡凯军《电子学报》1999,27(1):5-7

本论文对电话用户交换机研制了一个声控语音命令交换系统,该系统能够实现与特定人无关中小词汇量连续命令语音自动识别,研究中统计了用和命令语句,生成相应识别文法网络,识别系统的训练采用由子词模型构成的复合模型进行强化训练,识别采用令牌传递式改进Ｖｉｔｅｒｂｉ算法,提高系统的识别性能,论文比较了不同语音特征参数以及隐含马尔可夫模型状态数对电话语音识别精度的影响,研究中还开发识别系统拒识系统,在无拒识情况下相似文献

20.

Gaze-contingent automatic speech recognition

Cooke N.J. Russell M. 《Signal Processing, IET》2008,2(4):369-380

There has been progress in improving speech recognition using a tightly-coupled modality such as lip movement; and using additional input interfaces to improve recognition of commands in multimodal human? computer interfaces such as speech and pen-based systems. However, there has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely?coupled modality. The study investigated this idea by integrating information from gaze into an automatic speech recognition (ASR) system. A probabilistic framework for multimodal recognition was formalised and applied to the specific case of integrating gaze and speech. Gaze-contingent ASR systems were developed from a baseline ASR system by redistributing language model probability mass according to the visual attention. These systems were tested on a corpus of matched eye movement and related spontaneous conversational British English speech segments (n = 1355) for a visual-based, goal-driven task. The best performing systems had similar word error rates to the baseline ASR system and showed an increase in keyword spotting accuracy. The core values of this work may be useful for developing robust speech-centric multimodal decoding system functions. 相似文献