首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
为克服由传统语音情感识别模型的缺陷导致的识别正确率不高的问题,将过程神经元网络引入到语音情感识别中来。通过提取基频、振幅、音质特征参数作为语音情感特征参数,利用小波分析去噪,主成分分析(PCA)消除冗余,用过程神经元网络对生气、高兴、悲伤和惊奇四种情感进行识别。实验结果表明,与传统的识别模型相比,使用过程神经元网络具有较好的识别效果。  相似文献   

2.
情感特征提取是语音情感准确识别的关键,传统方法采用单一特征或者简单组合特征提取方法,单一特征无法全面反映语音情感变化,简单组合特征会使特征间产生大量冗余特征,影响识别正确结果.为了提高语音情感识别率,提了一种蚁群算法的语音情感智能识别方法.首先采用语音识别正确率和特征子集维数加权作为目标函数,然后利用蚁群算法找到最优语音特征子集,消除特征冗余信息.通过汉话和丹麦语两种情感语音库进行仿真测试,仿真结果表明,改进方法不仅消除了冗余、无用特征,降低了特征维数,而且提高了语音情感识别率,是一种有效的语音情感智能识别方法.  相似文献   

3.
研究语音信号噪声抑制问题,针对噪声污染干扰正确语音的传输,传统采用的HHT噪声抑制方法有多尺度滤波和阈值去噪,对所有的IMF分量进行处理,没有将IMF分量中的有用信号和噪声信号区别开来,去噪效果受到抑制.为使去噪效果更好,提出一种新的基于能量分析的阈值去噪方法,对含噪信号经过Hilbert-Huang变换后的IMF分量,对于信号和噪声能量分布的特点进行能量分析,将加噪信号中有用信号和噪声信号分离开,再利用阈值去噪方法完成去噪.通过仿真,可观察出语音信号的噪声得到了抑制,能够准确识别语音信号,并且比小波方法简单,不用选择小波基和确定分解层数,不用选择判断阈值,就能够达到或接近小波去噪的水平.  相似文献   

4.
本文利用小波去噪法,来实现对混叠了外界噪声的语音信号的滤波和消除噪声.首先介绍了小波降噪法原理及特点,然后完成了小波去噪法在MATLAB中的实现,并且计算出信噪比以及设计GUI界面,直观显示出利用小波去噪算法的降噪效果.GUI界面显示的频域和时域降噪图形表明,基于小波去噪法的语音降噪能够很好的实现对噪声的滤除以及恢复原始信号.  相似文献   

5.
研究语音识别率问题,语音信号是一种非平稳信号,含有大量噪声信息,目前大多数识别算法线性理论,难以正确识别语音信号非线性变化过程,识别正确率低。通过将隐马尔可夫模型(HMM)和SVM相结合组成一个混合抗噪语音识别模型(HMM-SVM)。同时用HMM模型对语音信号时序进行建模,并得到待识别语音信号的输出概率,然后将输出概率作为SVM的输入进行学习,得到语音分类信息,最后通过利用HMM-SVM识别结果做出正确识别决策。仿真结果表明,HMM-SVM提高语音识别正确率,尤其在低信噪比环境下,明显改善了语音识别系统的性能。  相似文献   

6.
通过小波阈值方法可以去除语音中的噪声,但它的结果中会出现诸如Pesudo-Gibbs现象之类的情况.为消除此类情况,将平移不变量小波变换引入到语音信号去噪中,并结合阈值方法进行去噪处理.经过仿真实验,证明这种方一法比一般的阈值方法有很大改进,提高了信噪比.  相似文献   

7.
介绍了光反馈自混合干涉系统模型.阐述了基于小波变换的去噪原理.分析了小波多分辨率分解与重构法和非线性阈值去噪法.利用两者相结合的方法有效地消除了光反馈自混合干涉信号中的高频噪声.  相似文献   

8.
刘艳  倪万顺 《计算机应用》2015,35(3):868-871
前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。  相似文献   

9.
基于正交小波包分解的语音去噪增强   总被引:2,自引:0,他引:2  
对带噪语音信号进行增强,是语音信号处理中一个重要的研究课题.由于噪声影响语音质量,这抑制背景噪声,利用小波包良好的时频分析能力,能较好模拟人耳基底膜频率分析特性的特点,提出基于正交小波包的语音去噪增强算法,算法首先把含噪语音信号分解于不同的频率范围内,根据"3σ规则",确定不同频率下的阈值,并采用动态阈值法对各层进行阈值处理,最后对处理后的语音信号反变换得到增强后的信号.在MATLAB平台上,对带噪语音信号去噪增强,实验结果表明,方法在抑制白噪声的同时减小了语音信息的损失,输出语音在保证可懂度的同时达到了较好的输出语音效果.  相似文献   

10.
由于不同环境下噪声特性不同,多种环境下的语音去噪成为研究难点。提出一种基于小波语谱图分析的去噪技术。该方法的特点在于:利用小波变换的多分辨性对带噪语音进行多尺度分析,利用语谱图列自相关函数的特性划分语音段和噪声段,利用点连续检测法去除语音段残留的噪声。实验显示,小波语谱图分析去噪法对多种环境下的宽带噪声,抑制效果显著。  相似文献   

11.
Here, formation of continuous attractor dynamics in a nonlinear recurrent neural network is used to achieve a nonlinear speech denoising method, in order to implement robust phoneme recognition and information retrieval. Formation of attractor dynamics in recurrent neural network is first carried out by training the clean speech subspace as the continuous attractor. Then, it is used to recognize noisy speech with both stationary and nonstationary noise. In this work, the efficiency of a nonlinear feedforward network is compared to the same one with a recurrent connection in its hidden layer. The structure and training of this recurrent connection, is designed in such a way that the network learns to denoise the signal step by step, using properties of attractors it has formed, along with phone recognition. Using these connections, the recognition accuracy is improved 21% for the stationary signal and 14% for the nonstationary one with 0db SNR, in respect to a reference model which is a feedforward neural network.  相似文献   

12.
The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.  相似文献   

13.
The multi-modal emotion recognition lacks the explicit mapping relation between emotion state and audio and image features, so extracting the effective emotion information from the audio/visual data is always a challenging issue. In addition, the modeling of noise and data redundancy is not solved well, so that the emotion recognition model is often confronted with the problem of low efficiency. The deep neural network (DNN) performs excellently in the aspects of feature extraction and highly non-linear feature fusion, and the cross-modal noise modeling has great potential in solving the data pollution and data redundancy. Inspired by these, our paper proposes a deep weighted fusion method for audio-visual emotion recognition. Firstly, we conduct the cross-modal noise modeling for the audio and video data, which eliminates most of the data pollution in the audio channel and the data redundancy in visual channel. The noise modeling is implemented by the voice activity detection(VAD), and the data redundancy in the visual data is solved through aligning the speech area both in audio and visual data. Then, we extract the audio emotion features and visual expression features via two feature extractors. The audio emotion feature extractor, audio-net, is a 2D CNN, which accepting the image-based Mel-spectrograms as input data. On the other hand, the facial expression feature extractor, visual-net, is a 3D CNN to which facial expression image sequence is feeded. To train the two convolutional neural networks on the small data set efficiently, we adopt the strategy of transfer learning. Next, we employ the deep belief network(DBN) for highly non-linear fusion of multi-modal emotion features. We train the feature extractors and the fusion network synchronously. And finally the emotion classification is obtained by the support vector machine using the output of the fusion network. With consideration of cross-modal feature fusion, denoising and redundancy removing, our fusion method show excellent performance on the selected data set.  相似文献   

14.
针对多数语音识别系统在噪音环境下性能急剧下降的问题,提出了一种新的语音识别特征提取方法。该方法是建立在听觉模型的基础上,通过组合语音信号和其差分信号的上升过零率获得频率信息,通过峰值检测和非线性幅度加权来获取强度信息,二者组合在一起,得到输出语音特征,再分别用BP神经网络和HMM进行训练和识别。仿真实现了不同信噪比下不依赖人的50词的语音识别,给出了识别的结果,证明了组合差分信息的过零与峰值幅度特征具有较强的抗噪声性能。  相似文献   

15.
在语音情感识别研究中,已有基于深度学习的方法大多没有针对语音时频两域的特征进行建模,且存在网络模型训练时间长、识别准确性不高等问题。语谱图是语音信号转换后具有时频两域的特殊图像,为了充分提取语谱图时频两域的情感特征,提出了一种基于参数迁移和卷积循环神经网络的语音情感识别模型。该模型把语谱图作为网络的输入,引入AlexNet网络模型并迁移其预训练的卷积层权重参数,将卷积神经网络输出的特征图重构后输入LSTM(Long Short-Term Memory)网络进行训练。实验结果表明,所提方法加快了网络训练的速度,并提高了情感识别的准确率。  相似文献   

16.
This paper explores the excitation source features of speech production mechanism for characterizing and recognizing the emotions from speech signal. The excitation source signal is obtained from speech signal using linear prediction (LP) analysis, and it is also known as LP residual. Glottal volume velocity (GVV) signal is also used to represent excitation source, and it is derived from LP residual signal. Speech signal has high signal to noise ratio around the instants of glottal closure (GC). These instants of glottal closure are also known as epochs. In this paper, the following excitation source features are proposed for characterizing and recognizing the emotions: sequence of LP residual samples and their phase information, parameters of epochs and their dynamics at syllable and utterance levels, samples of GVV signal and its parameters. Auto-associative neural networks (AANN) and support vector machines (SVM) are used for developing the emotion recognition models. Telugu and Berlin emotion speech corpora are used to evaluate the developed models. Anger, disgust, fear, happy, neutral and sadness are the six emotions considered in this study. About 42 % to 63 % of average emotion recognition performance is observed using different excitation source features. Further, the combination of excitation source and spectral features has shown to improve the emotion recognition performance up to 84 %.  相似文献   

17.
为了解决语音信号中帧与帧之间的重叠,提高语音信号的自适应能力,本文提出基于隐马尔可夫(HMM)与遗传算法神经网络改进的语音识别系统.该改进方法主要利用小波神经网络对Mel频率倒谱系数(MFCC)进行训练,然后利用HMM对语音信号进行时序建模,计算出语音对HMM的输出概率的评分,结果作为遗传神经网络的输入,即得语音的分类识别信息.实验结果表明,改进的语音识别系统比单纯的HMM有更好的噪声鲁棒性,提高了语音识别系统的性能.  相似文献   

18.
针对深度学习算法在语音情感特征提取方面的不足以及识别准确率不高的问题,本文通过提取语音数据中有效的情感特征,并将特征进行多尺度拼接融合,构造语音情感特征,提高深度学习模型对特征的表现能力。传统递归神经网络无法解决语音情感识别长时依赖问题,本文采用双层LSTM模型来改进语音情感识别效果,提出一种混合多尺度卷积与双层LSTM模型相结合的模型。实验结果表明,在中科院自动化所汉语情感数据库(CASIA)和德国柏林情感公开数据集(Emo-DB)下,本文所提语音情感识别模型相较于其他情感识别模型在准确率方面有较大提高。  相似文献   

19.
针对现有语音情绪识别中存在无关特征多和准确率较差的问题,提出一种基于混合分布注意力机制与混合神经网络的语音情绪识别方法。该方法在2个通道内,分别使用卷积神经网络和双向长短时记忆网络进行语音的空间特征和时序特征提取,然后将2个网络的输出同时作为多头注意力机制的输入矩阵。同时,考虑到现有多头注意力机制存在的低秩分布问题,在注意力机制计算方式上进行改进,将低秩分布与2个神经网络的输出特征的相似性做混合分布叠加,再经过归一化操作后将所有子空间结果进行拼接,最后经过全连接层进行分类输出。实验结果表明,基于混合分布注意力机制与混合神经网络的语音情绪识别方法比现有其他方法的准确率更高,验证了所提方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号