首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
To make human–computer interaction more naturally and friendly, computers must enjoy the ability to understand human’s affective states the same way as human does. There are many modals such as face, body gesture and speech that people use to express their feelings. In this study, we simulate human perception of emotion through combining emotion-related information using facial expression and speech. Speech emotion recognition system is based on prosody features, mel-frequency cepstral coefficients (a representation of the short-term power spectrum of a sound) and facial expression recognition based on integrated time motion image and quantized image matrix, which can be seen as an extension to temporal templates. Experimental results showed that using the hybrid features and decision-level fusion improves the outcome of unimodal systems. This method can improve the recognition rate by about 15 % with respect to the speech unimodal system and by about 30 % with respect to the facial expression system. By using the proposed multi-classifier system that is an improved hybrid system, recognition rate would increase up to 7.5 % over the hybrid features and decision-level fusion with RBF, up to 22.7 % over the speech-based system and up to 38 % over the facial expression-based system.  相似文献   

2.
Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.  相似文献   

3.

Nowadays, automatic speech emotion recognition has numerous applications. One of the important steps of these systems is the feature selection step. Because it is not known which acoustic features of person’s speech are related to speech emotion, much effort has been made to introduce several acoustic features. However, since employing all of these features will lower the learning efficiency of classifiers, it is necessary to select some features. Moreover, when there are several speakers, choosing speaker-independent features is required. For this reason, the present paper attempts to select features which are not only related to the emotion of speech, but are also speaker-independent. For this purpose, the current study proposes a multi-task approach which selects the proper speaker-independent features for each pair of classes. The selected features are then given to the classifier. Finally, the outputs of the classifiers are appropriately combined to achieve an output of a multi-class problem. Simulation results reveal that the proposed approach outperforms other methods and offers higher efficiency in terms of detection accuracy and runtime.

  相似文献   

4.
为提高语音情感识别精度,对基本声学特征构建的多维特征集合,采用二次特征选择方法综合考虑特征参数与情感类别之间的内在特性,从而建立优化的、具有有效情感可分性的特征子集;在语音情感识别阶段,设计二叉树结构的多分类器以综合考虑系统整体性能与复杂度,采用核融合方法改进SVM模型,使用多核SVM识别混淆度最大的情感。算法在Berlin情感语音库五种情感状态的样本上进行验证,实验结果表明二次特征选择与核融合相结合的方法在有效提高情感识别精度的同时,对噪声具有一定的鲁棒性。  相似文献   

5.
6.
Gao  Qiang  Wang  Chu-han  Wang  Zhe  Song  Xiao-lin  Dong  En-zeng  Song  Yu 《Multimedia Tools and Applications》2020,79(37-38):27057-27074
Multimedia Tools and Applications - As a high-level function of the human brain, emotion is the external manifestation of people’s psychological characteristics. The emotion has a great...  相似文献   

7.
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features.  相似文献   

8.
9.
Multimedia Tools and Applications - Humans use many modalities such as face, speech and body gesture to express their feeling. So, to make emotional computers and make the human-computer...  相似文献   

10.
语音情感识别是近年来新兴的研究课题之一,特征参数的提取直接影响到最终的识别效率,特征降维可以提取出最能区分不同情感的特征参数。提出了特征参数在语音情感识别中的重要性,介绍了语音情感识别系统的基本组成,重点对特征参数的研究现状进行了综述,阐述了目前应用于情感识别的特征降维常用方法,并对其进行了分析比较。展望了语音情感识别的可能发展趋势。  相似文献   

11.
12.
语音情感识别的精度很大程度上取决于不同情感间的特征差异性。从分析语音的时频特性入手,结合人类的听觉选择性注意机制,提出一种基于语谱特征的语音情感识别算法。算法首先模拟人耳的听觉选择性注意机制,对情感语谱信号进行时域和频域上的分割提取,从而形成语音情感显著图。然后,基于显著图,提出采用Hu不变矩特征、纹理特征和部分语谱特征作为情感识别的主要特征。最后,基于支持向量机算法对语音情感进行识别。在语音情感数据库上的识别实验显示,提出的算法具有较高的语音情感识别率和鲁棒性,尤其对于实用的烦躁情感的识别最为明显。此外,不同情感特征间的主向量分析显示,所选情感特征间的差异性大,实用性强。  相似文献   

13.
针对单一语音特征对语音情感表达不完整的问题,将具有良好量化和插值特性的LSF参数与体现人耳听觉特性的MFCC参数相融合,提出基于线谱权重的MFCC(WMFCC)新特征。同时,通过高斯混合模型来对该参数建立模型空间,进一步得到GW-MFCC模型空间参数,以获取更高维的细节信息,进一步提高情感识别性能。采用柏林情感语料库进行验证,新参数的识别率比传统的MFCC和LSF分别有5.7%和6.9%的提高。实验结果表明,提出的WMFCC以及GW-MFCC参数可以有效地表现语音情感信息,提高语音情感识别率。  相似文献   

14.
Multimedia Tools and Applications - Research in emotion recognition seeks to develop insights into the variances of features of emotion in one common domain. However, automatic emotion recognition...  相似文献   

15.
针对现有语音情绪识别中存在无关特征多和准确率较差的问题,提出一种基于混合分布注意力机制与混合神经网络的语音情绪识别方法。该方法在2个通道内,分别使用卷积神经网络和双向长短时记忆网络进行语音的空间特征和时序特征提取,然后将2个网络的输出同时作为多头注意力机制的输入矩阵。同时,考虑到现有多头注意力机制存在的低秩分布问题,在注意力机制计算方式上进行改进,将低秩分布与2个神经网络的输出特征的相似性做混合分布叠加,再经过归一化操作后将所有子空间结果进行拼接,最后经过全连接层进行分类输出。实验结果表明,基于混合分布注意力机制与混合神经网络的语音情绪识别方法比现有其他方法的准确率更高,验证了所提方法的有效性。  相似文献   

16.
International Journal of Speech Technology - Speech emotion recognition is one of the fastest growing areas of interest in the field of affective computing. Emotion detection aids...  相似文献   

17.
Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.  相似文献   

18.
语音情感识别研究进展*   总被引:4,自引:1,他引:4  
首先介绍了语音情感识别系统的组成,重点对情感特征和识别算法的研究现状进行了综述,分析了主要的语音情感特征,阐述了代表性的语音情感识别算法以及混合模型,并对其进行了分析比较。最后,指出了语音情感识别技术的可能发展趋势。  相似文献   

19.
20.
Speech emotion recognition has been one of the interesting issues in speech processing over the last few decades. Modelling of the emotion recognition process serves to understand as well as assess the performance of the system. This paper compares two different models for speech emotion recognition using vocal tract features namely, the first four formants and their respective bandwidths. The first model is based on a decision tree and the second one employs logistic regression. Whereas the decision tree models are based on machine learning, regression models have a strong statistical basis. The logistic regression models and the decision tree models developed in this work for several cases of binary classifications were validated by speech emotion recognition experiments conducted on a Malayalam emotional speech database of 2800 speech files, collected from ten speakers. The models are not only simple, but also meaningful since they indicate the contribution of each predictor. The experimental results indicate that speech emotion recognition using formants and bandwidths was better modelled using decision trees, which gave higher emotion recognition accuracies compared to logistic regression. The highest accuracy obtained using decision tree was 93.63%, for the classification of positive valence emotional speech as surprised or happy, using seven features. When using logistic regression for the same binary classification, the highest accuracy obtained was 73%, with eight features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号