首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A new, simple and practical way of fusing audio and visual information to enhance audiovisual automatic speech recognition within the framework of an application of large-vocabulary speech recognition of French Canadian speech is presented, and the experimental methodology is described in detail. The visual information about mouth shape is extracted off-line using a cascade of weak classifiers and a Kalman filter, and is combined with the large-vocabulary speech recognition system of the Centre de Recherche Informatique de Montreal. The visual classification is performed by a pair-wise kernel-based linear discriminant analysis (KLDA) applied on a principal component analysis (PCA) subspace, followed by a binary combination and voting algorithm on 35 French phonetic classes. Three fusion approaches are compared: (1) standard low-level feature-based fusion, (2) decision-based fusion within the framework of the transferable belief model (an interpretation of the Dempster-Shafer evidential theory), and (3) a combination of (1) and (2). For decision-based fusion, the audio information is considered to be a precise Bayesian source, while the visual information is considered an imprecise evidential source. This treatment ensures that the visual information does not significantly degrade the audio information in situations where the audio performs well (e.g., a controlled noise-free environment). Results show significant improvement in the word error rate to a level comparable to that of more sophisticated systems. To the authors? knowledge, this work is the first to address large-vocabulary audiovisual recognition of French Canadian speech and decision-based audiovisual fusion within the transferable belief model.  相似文献   

2.
用于局部放电图象识别的统计特征研究   总被引:13,自引:3,他引:13  
结合图象识别技术,提出一种采用局部放电灰度图象的统计特征区分局部放电类型的方法。局部放电灰度图象统计特征由图象的矩特征和相关统计特征构成;其中矩特征描述局部放电灰度图象基本灰度分布状态,相关统计特征描述局部放电正、负工频半波图象之间的相关程度。设计出模拟变压器内部放电与外部放电的五种放电模型,并通过试验获得大量放电样本数据,采用局部放电灰度图象统计特征和人工神经网络分类器,对于五种放电样本获得了较高的识别率,表明该方法具有良好的应用效果。  相似文献   

3.
基于HMM的语音信号情感识别研究   总被引:2,自引:0,他引:2  
包含在语音信号中的情感信息是一种很重要的信息,它是人们感知事物不可缺少的部分。本文在语音识别的基础上提出了应用隐马尔可夫模型(HMM)进行语音信号情感识别的研究。从情感语音的分类、情感语音资料的获取、情感语音特征提取及情感语音识别等方面,讨论了应用连续隐马尔可夫模型进行情感识别的整个过程,并得到了比较理想的识别结果。  相似文献   

4.
基于模糊聚类的神经网络模型及其在渗流分析中的应用   总被引:5,自引:0,他引:5  
本文采用模糊聚类理论方法对因子集进行模糊聚类 ,然后利用神经网络的方法建立样本因子集类别变量特征值与样本观测值之间的预测模型 ,提出了将模糊聚类、模糊模式识别以及神经网络三者有机结合的预测理论。并通过某大坝渗流计算实例对传统的统计预报模型和基于模糊聚类的神经网络预测模型进行了比较 ,结果表明后者的预报精度比前者要高。  相似文献   

5.
暴雨洪水管理模型(SWMM)是研究城市洪涝问题的有效手段之一,其敏感参数取值不仅影响模型参数率定效率,还影响模型预测的精确性。为了准确辨识敏感参数,本文从局部和全局的角度,分别采用修正Morris筛选法和互信息法对参数进行敏感性分析。研究结果表明:洪峰流量和径流系数的敏感参数均为透水区曼宁系数和最小入渗率。本文选用的修正Morris筛选法和互信息法都可以识别出主要敏感参数,但前者对于敏感参数的识别能力有限,后者识别效率较低,建议在需要快速辨识模型主要敏感参数时采用修正Morris筛选法,进一步识别模型其他敏感等级参数时结合互信息法。  相似文献   

6.
基于SVM的汉语语音情感识别研究   总被引:1,自引:0,他引:1  
随着信息技术的发展,对人机交互能力的要求不断提高,情感信息处理已成为提高人机交互能力的一个重要课题.本文提出了一种汉语语音情感分类方法,主要研究了4种基本的人类情感:高兴、愤怒、恐惧、悲伤.从汉语语音信号中提取了能量、基频、语速等特征,利用支持向量机方法识别,取得了43.7%的平均识别率.  相似文献   

7.
在语音识别系统中端点检测有误差会降低系统的识别率,进行有效准确的端点检测是语音识别的重要步骤.当信噪比较低时,传统的端点检测方法不能有效的工作.为了提高系统的识别率,本文提出了一种更有效的端点检测算法,基于LPC美尔倒谱特征的端点检测方法.它是基于倒谱特征方法的一种改进.实验证明,该算法在低信噪比的情况下,能够准确的检测出语音信号的端点.通过对三种不同的端点检测算法的比较,证明了基于LPC美尔倒谱特征算法在低信噪比的情况下有较高的检测正确率.  相似文献   

8.
赵彦  孙俊  时凯欣 《电气传动》2021,51(7):59-66
人工智能养老机器人有助于解决当前日趋严峻的居家养老问题,但各品牌智能家居设备互不兼容,成熟的机器人产品又欠缺老人语音语义识别,缺乏自学习能力且控制响应速度慢、灵敏度不高。针对上述问题,采用Tornado框架和Home Assistant框架有效整合市场主流智能家居设备,在主机和人形机器人上构建N-pod主机,使用深度卷积神经网络实现语音识别,使用粒子群优化智能算法对采集的老人身体指标、居家环境信息实施分簇处理,形成语音控制多品牌智能家居设备的控制解决方案,达到千人千面的养老定制需求。经过实验和实际应用,该设备适用于居家养老人群,语音识别的正确率、控制指令的正确率和响应灵敏度均高于同类产品,具有较大的经济和社会效益。  相似文献   

9.
Speech recognition is a hot topic in the field of artificial intelligence. Generally, speech recognition models can only run on large servers or dedicated chips. This paper presents a keyword speech recognition system based on a neural network and a conventional STM32 chip. To address the limited Flash and ROM resources on the STM32 MCU chip, the deployment of the speech recognition model is optimized to meet the requirements of keyword recognition. Firstly, the audio information obtained through sensors is subjected to MFCC (Mel Fre-quency Cepstral Coefficient) feature extraction, and the extracted MFCC features are input into a CNN (Convolutional Neural Network) for deep feature extraction. Then, the features are input into a fully connected layer, and finally, the speech keyword is classified and predicted. Deploying the model to the STM32F429, the prediction model achieves an accuracy of 90.58%, a decrease of less than 1% compared to the accuracy of 91.49% running on a computer, with good performance.  相似文献   

10.
针对噪声环境下人脸识别率和说话人识别率低的问题,在研究特征层融合的基础上,结合归一化技术和SVM理论,提出了一种融合人脸和语音的多生物特征识别模型。首先采用离散余弦变换和局部保持投影算法提取人脸特征及SVM方法提取语音特征,在特征层进行融合得到融合特征后,计算测试身份与模板间的距离,为了减少计算量和提高识别性能,对匹配距离进行归一化处理,最后输入到SVM进行识别。仿真结果表明,在噪声环境下,当信噪比降低时,融合识别率要明显高于单个系统的识别率,达到了身份识别的目的。  相似文献   

11.
提出一种语音命令控制车载音响操作的设计方案,以德国Infineon公司新推出的具有DSP和单片机双核的Soc语音处理芯片UniSpeech-SDA80D51为核心组成非特定人车载音响语音控制器系统,实现对SL1102C1车载音响的语音控制.介绍了语音控制器系统的构成、专用语音处理芯片SDA80D51功能及工作原理、非特...  相似文献   

12.
基于元音MFCC的说话人识别系统研究   总被引:1,自引:0,他引:1  
说话人识别从本质上看是从语音信息中提取说话人特征,并通过一定的方式进行模式识别的过程.辨别说话人的方法很多,本文认为先从语音中提出元音,再通过计算元音的MFCC(美尔频标倒谱系数)特征参数,并与DTW(动态时间规整)结合进行多人多单词试验,实验证明这种识别方式能提高识别率5%左右--从原字平均识别率为83%提高到取元音后平均识别率为88%.  相似文献   

13.
基于暂态零序电流特征的小电流接地选线装置   总被引:3,自引:1,他引:2  
提出了一种利用故障时暂态量信息的易于工程实现的小电流综合选线方案,并以此开发出了YH-B811小电流选线装置.装置算法基于自适应捕捉特征频带内的暂态零序电流幅值与零序电流低频有功分量幅值比较的综合选线原理.前者可以随系统运行方式、故障条件的变化自适应地捕捉特征频带内暂态电流分量幅值并通过比较选出故障线路;后者是当故障初始角较小、过渡电阻很大、暂态分量很小时,提取零序电流有功低频分量幅值进行比较选线.该综合算法可适用于中性点不接地、非直接接地系统单相接地故障的各种情况.装置硬件部分是基于MCF5272和FPGA的高精度快速工作平台.  相似文献   

14.
大屏幕显示系统已在电力系统各类控制中心被广泛使用。鉴于大屏幕与控制台之间截然不同的显示特点,简单套用后者的人机界面并不合适,有必要对大屏幕人机界面进行专门的研究。设计了适用于控制中心的新型大屏幕人机界面。该系统以多通道界面为基础,由多通道信息采集、多通道信息识别、信息整合、多通道信息输出等模块组成。在输入端,该系统采用以语音为主、手势为辅的多通道整合模式,较之鼠标、键盘更为自然和实用;输出端则充分利用可视化及语音合成技术,以调度员更易理解和接受的图形、语音等多媒体形式展现系统信息。  相似文献   

15.
In this paper we present two supervised speaker adaptation methods, including a feature normalization and an MCE/GPD algorithm, developed to implement an MSVQ-based adaptive Chinese syllable recognition system. In the MSVQ-based speech recognition, each recognition unit is represented as a time sequence of codebooks. The first proposed method is feature normalization, in which we model the inter-speaker variability as a linear transformation. By applying the feature normalization, the target speaker speech is normalized to reduce the inter-speaker acoustic variability. In the second adaptation method we first present an implementation of the MCE/GPD algorithm for discriminatively training an MSVQ-based speech recognizer. It is expected that this method can separate the confusion classes and can enhance speaker adaptation capability. By applying the MCE/GPD algorithm, the MSVQ-based recognizer parameters are adjusted iteratively to accomplish the objective of minimum classification error rate. We carried out recognition experiments of highly confusing Chinese syllables to assess its performance. Using the standard Chinese syllable database CRDB in China, the results show that when the two adaptation methods are combined, the error rate reduction on open data is over 62% with a single set of adaptation training data. Therefore, when the amount of adaptation data is limited, the adaptation methods can lead to substantial improvement. Upon increasing the training data, the capability of speaker adaptation is improved by using the MCE/GPD training only, so it can be used for tracking spectral evolution over time and provides a robust means for adaptive speech recognition. © 1997 John Wiley & Sons, Ltd.  相似文献   

16.
基于DSP的语音识别智能控制系统   总被引:1,自引:0,他引:1  
介绍了语音识别的基本原理及用浮点数字信号处理器(DSP)TMS320C32实现语音识别算法的一些原则和方法,阐述了语音识别的DSP实现技术,系统以预测倒谱参数为特征参数,并采用计算量相对较小的改进的动态时间规整(DTW)算法实现语音参数模板匹配,能够实现特定人、孤立词、小词汇量的语音识别,并用MATLAB进行了算法仿真,从而将语音识别技术应用到智能控制系统中,给出了实验结果和误差分析。试验结果表明,系统正确识别率在89.96%,具有一定的实用价值。  相似文献   

17.
针对传统的局部放电模式识别存在的特征提取单一、识别准确率低等缺点,提出了一种基于D-S证据组合规则的双模型融合局部放电模式识别方法.根据基于相位信息的局部放电(PRPD)谱图的统计数据特征和图像特征的特点,分别建立了反向传播(BP)识别模型和卷积神经网络(CNN)识别模型.根据2个识别模型的识别结果,提出了基于信息熵改进的D-S证据组合规则以解决常见的悖论问题,基于此建立了判定模型,更好地融合了2个识别模型的输出结果,实现了2种特征识别的优势互补.根据实际数据测试,与单一模型对比,所提方法可以稳定、准确地识别局部放电模式.  相似文献   

18.
为了研究"世界记忆遗产"东巴经典古籍的音频分类,以通过语音情感特征提取的方法分类鉴别东巴音频类别,并实现对东巴经典语音的情感状态识别,同时提高人机交互性能,提出采用Mel频率倒谱系数(MFCC)实现语音情感特征的提取。通过引入MFCC的一阶差分、二阶差分描述语音特征的动态特征,并整合短时能量特征,最终形成MFCC和短时能量相叠加的语音信号特征参数,达到提取反映语音情感特征的目的。实验验证表明,该语音信号特征提取方法能够更明显地区分出包含在语音中的情感信息,为语音情感特征的识别研究及东巴古籍音频分类鉴别提供理论基础。  相似文献   

19.
为了增强变电站二次测试智能化水平,提高测试效率,提出了一种基于智能语音识别的电力二次测试系统。采用一种针对电力专用词库的二次语音识别引擎,设计了基于离线识别模式的嵌入式测试系统。在实验室与现场,利用该系统分别实测了智能终端、保护装置、合并单元等设备。实测结果表明,电力专用词汇语音识别引擎提高了整体识别率和稳定性,且该测试系统的语音操控能够准确无误地完成整个测试过程,增强了电力二次测试仪的操控性、灵活性和易用性。  相似文献   

20.
In this paper, we proposed a method for improving the recognition performance of 145 prominent consonant–vowel (CV) units in Indian languages for low bit‐rate coded speech. Proposed CV recognition method is carried out in two levels to reduce the similarity among a large number of CV classes. In the first level, vowel category of CV unit will be recognized, and in the second level, consonant category will be recognized. At each level of the proposed method, complementary evidences from support vector machine and hidden Markov models are combined to enhance the recognition performance. Effectiveness of the proposed two‐level CV recognition method is demonstrated by performing the recognition of isolated CV units and CV units collected from the Telugu broadcast news database. In this work, vowel onset point (VOP) is used as an anchor point for extracting accurate features from the CV unit. Therefore, a method is proposed for accurate detection of VOP in clean and coded speech. The proposed VOP detection method is based on the spectral energy in 500–2500 Hz frequency band of the speech segments present in the glottal closure region. Speech coders considered in this work are GSM full rate (ETSI 06.10), CELP (FS‐1016), and MELP (TI 2.4 kbps). Significant improvement in CV recognition performance is achieved using the proposed two‐level method compared with the existing methods under both clean and coded conditions. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号