首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
研究一种应用小波特征向量和多类支持向量机进行病态语音识别的方法,该方法基于连续小波变换提取语音特征向量,利用多类支持向量机进行病态语音分类。为了简化二分类支持向量机进行多类分类时所带来的计算复杂性,根据一类支持向量机分类思想提出一种多类分类算法。该算法能够使每一类样本都独立地获得一个决策函数,通过决策函数的最大值来判断样本所属的类。实验表明,在病态语音识别系统中,多类支持向量机与小波特征向量相结合具有良好的识别效果和应用价值。  相似文献   

2.
该文针对短语音(语段时长小于等于1s)和易混淆语音的语种识别进行研究。选取东方多语种识别竞赛数据集为实验数据集,对比了音素对数似然比特征、梅尔频率倒谱系数特征、深度瓶颈层特征(DBF)在短语音及易混淆语种识别中的性能,证明DBF在两种识别任务中均具有较好的性能。为提升识别准确率提出DBF-I-VECTOR语种识别改进系统,该系统分别将基线DBF-I-VECTOR系统的短语音识别等错误率最优结果从12.26%降低为10.55%,易混淆语音识别等错误率(EER)最优结果从5.53%降低为2.86%。在对比改进系统后端的余弦距离(CDS)、概率线性判别分析(PLDA)、支持向量机(SVM)、极端梯度提升(XGBoost)、随机森林(RF)分类性能时发现RF在短语音任务中分类效果最好,SVM在易混淆任务中分类效果最好。  相似文献   

3.
支持向量机的汉语连续语音声调识别方法   总被引:2,自引:1,他引:1  
声调信息在汉语语音识别中具有非常重要的意义。采用支持向量机对连续汉语连续语音进行声调识别实验,首先采用基于Teager能量算子和过零率的两级判别策略对连续语音进行浊音段提取,然后建立了适合于支持向量机分类模型的等维声调特征向量。使用6个二类SVM模型对非特定人汉语普通话的4种声调进行分类识别,与BP神经网络相比,支持向量机具有更高的识别率。  相似文献   

4.
面部表情自动分类是情感信息处理研究中的重要内容,为了提高表情识别的准确率以及鲁棒性,提出了一种基于混淆交叉支撑向量机树的面部表情自动分类方法。该方法依据伪Zernike矩特征,以混淆交叉支撑向量机树对矩特征进行学习,实现面部表情的自动分类。混淆交叉支撑向量机树的结构使模型能够根据教师信号将面部表情识别问题分解,在不同的层次上以相对较低的复杂度解决子问题;在训练阶段,对当前中间节点划分的两个子样本集进行混淆交叉,增强了模型在面部表情识别上的整体泛化性能以及鲁棒性。实验对Cohn-Kanade面部表情数据库中的6类基本表情进行自动分类,准确率达到96.31%;与同样基于该数据库的识别方法相比,该方法在识别正确率和鲁棒性上具有较大的优势。  相似文献   

5.
提出了一种基于支持向量机的往复压缩机气阀故障诊断方法。把往复压缩机气阀的振动信号作为识别故障的特征向量,运用支持向量机方法,训练后得到用于往复压缩机气阀故障诊断的支持向量机网络。由对测试样本的分类结果可知,支持向量机网络的分类结果和实际故障情况一致。  相似文献   

6.
对等网络技术P2P(Peer-to-Peer)在丰富了互联网应用的同时也带来了很多安全问题,因此,如何进行P2P流量的识别是网络管理研究的热点和难点问题。其中支持向量机在P2P识别问题中具有较好的效果,然而支持向量机的分类性能很大程度取决于核函数参数和惩罚参数。基于遗传算法、粒子群算法的支持向量机参数优化方法都存在易陷入局部最优解的问题,优化性能需要进一步改善。为进一步改善支持向量机参数优化问题,提出一种基于混沌粒子群的支持向量机参数优化方法,并将其应用于P2P流量识别问题。利用真实的校园网网络流量作为研究对象进行分类实验,结果表明,混沌粒子群优化的支持向量机具有更高的P2P分类正确率和计算效率。  相似文献   

7.
提出一种基于遗传算法和多超球面一类支持向量机的隐秘图像检测方案。为了得到最能反映分类本质的特征从而有效实现分类识别,采用遗传算法进行图像特征选择,将支持向量机的分类效果作为适应度函数值返回,指导遗传算法搜索最优的特征选择方案。实验结果表明,与仅采用支持向量机分类而未进行特征选择的隐秘检测方案相比,该方案提高了隐秘图像检测的识别率。  相似文献   

8.
NAP序列核函数在话者识别中的应用   总被引:1,自引:1,他引:0       下载免费PDF全文
邢玉娟  李明 《计算机工程》2010,36(8):194-196
针对话者识别系统中特征向量不定长和交叉信道干扰等问题,提出一种基于超向量的扰动属性投影(NAP)核函数。该函数是一种新型的序列核函数,使支持向量机能在整体语音序列上分类,移除核函数空间中与话者识别无关的信道子空间信息。仿真实验结果表明,该函数可有效提高支持向量机的分类性能和话者识别系统的识别准确率。  相似文献   

9.
基于免疫优化多输出最小二乘支持向量机及其应用*   总被引:1,自引:0,他引:1  
传统的支持向量机是一种两类问题的判别方法, 不能直接应用于多类分类问题。为了解决这个问题,提出了一种免疫优化多输出最小二乘支持向量机方法。该方法利用免疫算法来优化最小二乘支持向量机的参数。将该方法应用于污水生化处理过程建模及语音情感识别,仿真结果表明,该方法具有更高的精度。  相似文献   

10.
基于支持向量机和小波分析的说话人识别   总被引:2,自引:0,他引:2  
为解决说话人识别问题,提出了一种基于支持向量机和小波分析的识别方法以及其框架模型,即将小波分析应用于信号预处理,并以此为基础,利用其奇异点检测原理将语音信号和噪声分离,实现语音增强,最终基于样本进行训练和测试,采用SVM实现说话人的分类识别.  相似文献   

11.
基于主题的语言模型自适应方法应尽可能提高语言模型权重系数的更新速度并降低语言模型的调用量以满足语音识别实时性要求。本文采用基于聚类的方法实现连续相邻二元词对的量化表示并以此刻画语音识别预测历史和各个文本主题中心,依据语音识别历史矢量和各个文本主题中心矢量的相似度更新语言模型权重系数并摒弃全局语言模型。同传统的基于EM算法的自适应方法相比,实验表明该方法明显提高了语音识别性能和实时性,识别错误率相对下降5.1% ,说明该方法可比较准确地判断测试内容所属文本主题。  相似文献   

12.
为解决采用矢量量化的方法进行说话人识别时出现的失真问题,根据汉语语音的发音特性,提出了将矢量量化与语音特征的聚类技术相结合的方法,在进行矢量量化码书训练之前,先对特征矢量进行聚类筛选。实验结果表明,当测试语音片段长度为4 s时,在保持95%左右识别率下,采用普通矢量量化方法需64码本数,而采用该文方法只需8码本数,降低了8倍。结果说明该方法不但在一定程度上解决了因训练样本不足而引起的失真问题,而且通过方法的改进,实现了采用较低码字数产生较好的识别结果,从而提高识别效率。  相似文献   

13.
The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.  相似文献   

14.
Audio-visual speech recognition employing both acoustic and visual speech information is a novel extension of acoustic speech recognition and it significantly improves the recognition accuracy in noisy environments. Although various audio-visual speech-recognition systems have been developed, a rigorous and detailed comparison of the potential geometric visual features from speakers' faces is essential. Thus, in this paper the geometric visual features are compared and analyzed rigorously for their importance in audio-visual speech recognition. Experimental results show that among the geometric visual features analyzed, lip vertical aperture is the most relevant; and the visual feature vector formed by vertical and horizontal lip apertures and the first-order derivative of the lip corner angle leads to the best recognition results. Speech signals are modeled by hidden Markov models (HMMs) and using the optimized HMMs and geometric visual features the accuracy of acoustic-only, visual-only, and audio-visual speech recognition methods are compared. The audio-visual speech recognition scheme has a much improved recognition accuracy compared to acoustic-only and visual-only speech recognition especially at high noise levels. The experimental results showed that a set of as few as three labial geometric features are sufficient to improve the recognition rate by as much as 20% (from 62%, with acoustic-only information, to 82%, with audio-visual information at a signal-to-noise ratio of 0 dB).  相似文献   

15.
The Spert-II fixed point vector microprocessor system performs training and recall faster than commercial workstations for neural networks used in speech recognition research. We have packaged a prototype full custom vector microprocessor, TO, as the Spert-II (Synthetic Perceptron Testbed II) workstation accelerator system. We originally developed Spert-II to accelerate multiparameter neural network training for speech recognition research. Our speech research algorithms constantly change. Also, neural nets are often integrated with other tasks to form complete applications. We thus desired a general purpose, easily programmable accelerator that could speed up a range of tasks  相似文献   

16.
关勇  李鹏  刘文举  徐波 《自动化学报》2009,35(4):410-416
传统抗噪算法无法解决人声背景下语音识别(Automatic speech recognition, ASR)系统的鲁棒性问题. 本文提出了一种基于计算听觉场景分析(Computational auditory scene analysis, CASA)和语者模型信息的混合语音分离系统. 该系统在CASA框架下, 利用语者模型信息和因子最大矢量量化(Factorial-max vector quantization, MAXVQ)方法进行实值掩码估计, 实现了两语者混合语音中有效地分离出目标说话人语音的目标, 从而为ASR系统提供了鲁棒的识别前端. 在语音分离挑战(Speech separation challenge, SSC)数据集上的评估表明, 相比基线系统, 本文所提出的系统的语音识别正确率提高了15.68%. 相关的实验结果也验证了本文提出的多语者识别和实值掩码估计的有效性.  相似文献   

17.
The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the expectation maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.  相似文献   

18.
Content-based audio signal classification into broad categories such as speech, music, or speech with noise is the first step before any further processing such as speech recognition, content-based indexing, or surveillance systems. In this paper, we propose an efficient content-based audio classification approach to classify audio signals into broad genres using a fuzzy c-means (FCM) algorithm. We analyze different characteristic features of audio signals in time, frequency, and coefficient domains and select the optimal feature vector by employing a noble analytical scoring method to each feature. We utilize an FCM-based classification scheme and apply it on the extracted normalized optimal feature vector to achieve an efficient classification result. Experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art audio classification systems by more than 11% in classification performance.  相似文献   

19.
孤立词语音识别技术,采用的是模式匹配法,是语音识别技术的核心之一。首先,用户将词汇表中的每一词依次说一遍,并且将其特征矢量作为模板存入棋板库。然后,将输入语音的特征矢量依次与模板库中的每个模板进行相似度比较,将相似度最高者作为识别结果输出。本文介绍了孤立词语音识别技术的研究现状及几种常见的技术方法,并且分析探讨了孤立词语音识别技术的应用和发展前景。  相似文献   

20.
提出一种新型车载语音识别系统,采用帧能量与帧过零率的乘积作为指标量进行语音端点检测,以MFCC作为语音信号特征矢量,基于HMM语音识别模型进行语音识别。同时提出了一种新的抗噪语音识别方法,改进型重复Wiener滤波结合PUM模型进行抗噪语音识别,较好的抑制了噪声干扰,提高了语音识别率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号