首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。  相似文献   

2.
高脉冲噪声坏境中双门限法语音端点检测研究   总被引:1,自引:0,他引:1  
刘超  庄圣贤 《电子科技》2013,26(4):116-118,123
语音端点检测是对有效语音段的识别关键技术,准确的端点检测使语音信号的后续处理计算量减少,有效地节约资源。现在多数语音端点检测技术例如能频值、谱熵、小波能量熵变换等都能准确检测出有效的语音段。文中介绍了一种双门限端点检测法,即利用短时平均过零率和短时平均能量法进行双门限检测,再设置一个最短时间门限,有效地在高脉冲噪声环境中准确识别汉语发音。通过与其他方法对比实验,文中双门限技术在短时高脉冲噪声环境下能有效提高语音识别率。仿真结果表明,端点检测正确率达93%。  相似文献   

3.
采用话音激活检测(Voiced Activity Detection,VAD)术的目的是检测语音通信时是否有话音存在,检测到静音时加以抑制,使其不占用或极少占用信道带宽,检测到话音时才对其进行压缩编码与传输。鲁棒性语音识别系统、数字移动通信和因特网实时语音传输等领域要求在恶劣声学环境条件下进行VAD检测,以节省带宽并抑制噪声,因此VAD技术是目前语音处理领域的重要问题。文中给出的几种最新VAD算法(EZCR—VAD,STAT-VAD和E-VAD)是在低信噪比环境下的话音检测具有很好的鲁棒性的算法。  相似文献   

4.
Based on the observation that dissimilar speech enhancement algorithms perform differently for different types of interference and noise conditions, we propose a context-adaptive speech pre-processing scheme, which performs adaptive selection of the most advantageous speech enhancement algorithm for each condition. The selection process is based on an unsupervised clustering of the acoustic feature space and a subsequent mapping function that identifies the most appropriate speech enhancement channel for each audio input, corresponding to unknown environmental conditions. Experiments performed on the MoveOn motorcycle speech and noise database validate the practical value of the proposed scheme for speech enhancement and demonstrate a significant improvement in terms of speech recognition accuracy, when compared to the one of the best performing individual speech enhancement algorithm. This is expressed as accuracy gain of 3.3% in terms of word recognition rate. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in fast-varying noise environments.  相似文献   

5.
This paper presents a novel audio and video information fusion approach that greatly improves automatic recognition of people in video sequences. To that end, audio and video information is first used independently to obtain confidence values that indicate the likelihood that a specific person appears in a video shot. Finally, a post-classifier is applied to fuse audio and visual confidence values. The system has been tested on several news sequences and the results indicate that a significant improvement in the recognition rate can be achieved when both modalities are used together.  相似文献   

6.
Endpoint detection is one of the most important steps in speech recognition. In a high SNR environment, the algorithm based on short-time energy and zero rate could be used. But when the SNR is low, this method may not be accurate. Some researchers proposed an algorithm which is based on MFCC Euclidean distance. It has a better performance in a noise environment. But that algorithm needs two thresholds to find the start and end point. However, when the values of two thresholds are not suitable, the detected result could be extremely bad. In this paper, we proposed an improved algorithm which is based on MFCC cosine value. This method can reduce errors, since it only needs one single threshold. The benefit of this improved algorithm is that the result can surely contain the real voice component. According to the experiment data, this improved algorithm can improve the speech recognition rate by 10% even in noise environment (SNR = 0). Thus, it proved that this improved methods has better robustness.  相似文献   

7.
针对语音识别实际应用过程中的噪声问题,给出了一种新的抗噪声的特征提取算法,即先利用小波变换将语音信号进行小波子带分解,再根据人耳的听觉掩蔽效应,由谱压缩的技术,将小波变换后的子带语音信号进行压缩,从而提取其对应的语音特征。通过MATLAB软件建立实验平台,仿真实验结果表明该语音特征可以在噪声环境下得到较高的识别率。新的特征参数即充分利用了小波的抗噪声特性又有效地降低了语音识别中的训练环境和识别环境间的失配,具有抗噪声的特点。  相似文献   

8.
李聪  葛洪伟 《信号处理》2018,34(7):867-875
由于环境噪声的影响,实际应用中说话人识别系统性能会出现急剧下降。提出了一种基于高斯混合模型-通用背景模型和自适应并行模型组合的鲁棒性语音身份识别方法。自适应并行模型组合是一种噪声鲁棒性的特征补偿算法,能够有效减少训练环境与测试环境之间的不匹配现象,从而提高系统识别准确率和抗噪性能。首先,算法从测试语音中估计出噪声特征,然后用一个单高斯模型对噪声特征进行拟合得到噪声均值和协方差。最后,根据得出的噪声均值和协方差,调整训练好的高斯混合模型均值向量和协方差矩阵,使其尽可能地匹配测试环境。实验结果表明,该方法可以准确地重构干净语音的高斯混合模型参数,并且能够显著提高说话人识别的准确率,特别是在低信噪比情况下。   相似文献   

9.
This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.  相似文献   

10.
针对帧差法和背景差分法检测运动目标准确率低,自适应能力弱等缺陷,提出了一种改进五帧差分法与背景差分法和模板匹配相结合的运动目标检测和识别算法;通过改进的五帧差分和背景差分法融合的算法从视频图像序列中检测出运动目标;利用形态学方法去除噪声,改善运动目标提取效果;在Harris算法提取图像匹配特征值的基础上角点配准,提高图像识别的准确率,通过提取目标特征与自适应模板图像进行特征匹配的方法实现了目标检测识别和跟踪。仿真结果和实验表明该方法有噪声和部分遮挡的运动目标有良好的检测识别效果,识别率达到了95%。  相似文献   

11.
This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results  相似文献   

12.
Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention recently because of its robustness in noisy environments. An important issue in decision fusion based AVSR system is the determination of appropriate integration weight for the speech modalities to integrate and ensure better performance under various SNR conditions. Generally, the integration weight is calculated from the relative reliability of two modalities. This paper investigates the effect of reliability measure on integration weight estimation and proposes a genetic algorithm (GA) based reliability measure which uses optimum number of best recognition hypotheses rather than N best recognition hypotheses to determine an appropriate integration weight. Further improvement in recognition accuracy is achieved by optimizing the above measured integration weight by genetic algorithm. The performance of the proposed integration weight estimation scheme is demonstrated for isolated word recognition (incorporating commonly used functions in mobile phones) via multi-speaker database experiment. The results show that the proposed schemes improve robust recognition accuracy over the conventional unimodal systems, and a couple of related existing bimodal systems, namely, the baseline reliability ratio-based system and N best recognition hypotheses reliability ratio-based system under various SNR conditions.  相似文献   

13.
张志华  王炳锡  彭煊 《电声技术》2005,(5):52-54,69
给出一种新的话音检测方法,即在SNR算法的基础上,应用线性判别分析(LDA)对语音特征参数进行降维。在大噪声环境下,该方法提高了系统的稳健性。同时将这种新的方法与基于信噪比(SNR)和基于噪声/语音统计量(N&S STAT)的算法做了比较,实验表明该方法可以提高检测效率。  相似文献   

14.
性能优越的混合域数字音频盲水印算法   总被引:2,自引:0,他引:2  
提出了一种性能优越的自适应混合域数字音频水印嵌入算法,首先将原始数字音频划分成音频数据段;然后选取巴克码作为同步码,并在时域内将同步码嵌入到音频段的前部分;最后对音频段后部分实施DWT和DCT,并采用量化调制策略将水印信息嵌入到频域系数中。实验结果表明,该算法对常规信号处理与去同步攻击均具有较好的顽健性。  相似文献   

15.
基于EMD和改进双门限法的语音端点检测   总被引:3,自引:0,他引:3  
语音端点检测的准确与否直接影响到语音识别系统的计算复杂度和识别能力,在基于短时能量和过零率的端点检测算法中,能量计算方法不尽合理而且在低信噪比下检测效果大大降低。对此提出了一种基于经验模式分解和改进双门限法的语音端点检测算法,仿真结果表明在低信噪比情况下本文算法有更好的端点检测能力,显示了算法的优越性。  相似文献   

16.
基于模糊分类器的能量可变噪声环境下的词边界检测   总被引:1,自引:0,他引:1  
词边界检测误差是语音识别中产生错误的主要原因之一,常规的检测算法在低信噪比尤其在背景噪声能量可变的环境下不能有效工作。文中用语音信号的精确时频参数和过零率来训练模糊神经分类器,进行词边界检测。在不同背景噪声下的实验结果表明,该方法可适应背景噪声能量的变化,得到高正确率的词边界检测。  相似文献   

17.
张飞宇 《电子科技》2012,25(10):43-45,48
在线协同学习平台中,为方便用户更加快捷地查找网络中的多媒体文件,完成基于教学视频内容信息检索,视频语音识别是重要环节。教学视频语音识别系统是基于隐马尔可夫模型下语音识别的一个实例,旨在实现教学音/视频文件中文字提取的功能,具有重要的应用价值。文中对语音识别系统的应用软件进行了需求分析,通过其相关功能的性能测试结果表明,该系统实现并展示了将视频中的音频信息文字转换的过程。  相似文献   

18.
本文提出了一种基于音视模板匹配的新闻视频识别方法。在模板建立过程中,从新闻视频片头中的主题音乐提取音频模板,从主持人镜头中的扩展人脸区域提取视觉模板,这两者共同构成音视模板;在识别过程中,对电视视频流先进行音频模板匹配,然后由匹配通过的候选时间点定位到相应的视频镜头,接着通过视觉模板对镜头中的扩展人脸区域进行匹配,进而确定主持人镜头,最后完成新闻视频识别。实验结果表明,该方法计算效率高、简单易操作,具有较好的实用价值。  相似文献   

19.
张佩  夏秀渝  胡连锋  李志昌 《通信技术》2009,42(11):160-162
基于麦克风阵列的声源定位技术可以广泛应用于音视频会议、说话人跟踪与识别以及助听器等众多场合中。根据语音信号的短时平稳特性,文中提出了一种改进的基于MUSIC算法进行声源二维定位的方法。该方法按帧交叉进行声源数估计和声源方位估计,最后对多帧信号的估计值进行统计、平均得到最终的方位估计和较准确的声源数估计。仿真结果表明,这种方法能有效解决由于声源数估计不准确导致的峰值搜索时偏差较大的问题,并且具有良好的抗噪性能。  相似文献   

20.
应用于语音识别片上系统的语音检测算法   总被引:2,自引:0,他引:2  
语音识别技术的研究已经进入实用化阶段,而实用化语音识别系统中的一个关键技术就是可靠的语音检测。本文提出了一种基于有限状态机模型的实时语音检测算法(FSM-SD)。采用对数最大似然判决帧能量检测器和过零率检测器控制各状态之间的跳转关系。针对语音识别中的MFCC(Mel频标倒谱系数)和LPCC(线性预测倒谱参数)特征提取过程,分别得到两种不同的帧能量计算方法。将FSM-SD应用到在OAK DSP上实现的小词表汉语语音识别系统,通过实验验证了其对系统识别性能和噪声稳健性的有效保证。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号