期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks

Mahdi Bejani Davood Gharavian Nasrollah Moghaddam Charkari 《Neural computing & applications》2014,24(2):399-412

To make human–computer interaction more naturally and friendly, computers must enjoy the ability to understand human’s affective states the same way as human does. There are many modals such as face, body gesture and speech that people use to express their feelings. In this study, we simulate human perception of emotion through combining emotion-related information using facial expression and speech. Speech emotion recognition system is based on prosody features, mel-frequency cepstral coefficients (a representation of the short-term power spectrum of a sound) and facial expression recognition based on integrated time motion image and quantized image matrix, which can be seen as an extension to temporal templates. Experimental results showed that using the hybrid features and decision-level fusion improves the outcome of unimodal systems. This method can improve the recognition rate by about 15 % with respect to the speech unimodal system and by about 30 % with respect to the facial expression system. By using the proposed multi-classifier system that is an improved hybrid system, recognition rate would increase up to 7.5 % over the hybrid features and decision-level fusion with RBF, up to 22.7 % over the speech-based system and up to 38 % over the facial expression-based system. 相似文献

2.

An optimal two stage feature selection for speech emotion recognition using acoustic features

Swarna Kuchibhotla Hima Deepthi Vankayalapati Koteswara Rao Anne 《International Journal of Speech Technology》2016,19(4):657-667

Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion. 相似文献

3.

Speaker independent feature selection for speech emotion recognition: A multi-task approach

Kalhor Elham Bakhtiari Behzad 《Multimedia Tools and Applications》2021,80(6):8127-8146

Nowadays, automatic speech emotion recognition has numerous applications. One of the important steps of these systems is the feature selection step. Because it is not known which acoustic features of person’s speech are related to speech emotion, much effort has been made to introduce several acoustic features. However, since employing all of these features will lower the learning efficiency of classifiers, it is necessary to select some features. Moreover, when there are several speakers, choosing speaker-independent features is required. For this reason, the present paper attempts to select features which are not only related to the emotion of speech, but are also speaker-independent. For this purpose, the current study proposes a multi-task approach which selects the proper speaker-independent features for each pair of classes. The selected features are then given to the classifier. Finally, the outputs of the classifiers are appropriately combined to achieve an output of a multi-class problem. Simulation results reveal that the proposed approach outperforms other methods and offers higher efficiency in terms of detection accuracy and runtime.

相似文献

4.

使用二次特征选择及核融合的语音情感识别

姜晓庆夏克文林永良《计算机工程与应用》2017,53(3):7-11

为提高语音情感识别精度,对基本声学特征构建的多维特征集合,采用二次特征选择方法综合考虑特征参数与情感类别之间的内在特性,从而建立优化的、具有有效情感可分性的特征子集;在语音情感识别阶段,设计二叉树结构的多分类器以综合考虑系统整体性能与复杂度,采用核融合方法改进SVM模型,使用多核SVM识别混淆度最大的情感。算法在Berlin情感语音库五种情感状态的样本上进行验证,实验结果表明二次特征选择与核融合相结合的方法在有效提高情感识别精度的同时,对噪声具有一定的鲁棒性。相似文献

5.

Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network

Davood Gharavian Mansour Sheikhan Alireza Nazerieh Sahar Garoucy 《Neural computing & applications》2012,21(8):2115-2126

相似文献

6.

EEG based emotion recognition using fusion feature extraction method

Gao Qiang Wang Chu-han Wang Zhe Song Xiao-lin Dong En-zeng Song Yu 《Multimedia Tools and Applications》2020,79(37-38):27057-27074

Multimedia Tools and Applications - As a high-level function of the human brain, emotion is the external manifestation of people’s psychological characteristics. The emotion has a great... 相似文献

7.

A clustering based feature selection method in spectro-temporal domain for speech recognition

Nafiseh Esfandian Farbod Razzazi Alireza Behrad 《Engineering Applications of Artificial Intelligence》2012,25(6):1194-1202

Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features. 相似文献

8.

Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition

Monorama Swain Subhasmita Sahoo Aurobinda Routray P. Kabisatpathy Jogendra N. Kundu 《International Journal of Speech Technology》2015,18(3):387-393

相似文献

9.

Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks

Gharavian Davood Bejani Mehdi Sheikhan Mansour 《Multimedia Tools and Applications》2017,76(2):2331-2352

Multimedia Tools and Applications - Humans use many modalities such as face, speech and body gesture to express their feeling. So, to make emotional computers and make the human-computer... 相似文献

10.

语音情感识别中特征参数的研究进展

李杰周萍《传感器与微系统》2012,31(2):4-7

语音情感识别是近年来新兴的研究课题之一,特征参数的提取直接影响到最终的识别效率,特征降维可以提取出最能区分不同情感的特征参数。提出了特征参数在语音情感识别中的重要性,介绍了语音情感识别系统的基本组成,重点对特征参数的研究现状进行了综述,阐述了目前应用于情感识别的特征降维常用方法,并对其进行了分析比较。展望了语音情感识别的可能发展趋势。相似文献

11.

Four-stage feature selection to recognize emotion from speech signals

A. Milton S. Tamil Selvi 《International Journal of Speech Technology》2015,18(4):505-520

相似文献

12.

面向语音情感识别的语谱特征提取算法研究

唐闺臣冯月芹梁瑞宇包永强赵力《计算机工程与应用》2016,52(21):152-156

语音情感识别的精度很大程度上取决于不同情感间的特征差异性。从分析语音的时频特性入手,结合人类的听觉选择性注意机制,提出一种基于语谱特征的语音情感识别算法。算法首先模拟人耳的听觉选择性注意机制,对情感语谱信号进行时域和频域上的分割提取,从而形成语音情感显著图。然后,基于显著图,提出采用Hu不变矩特征、纹理特征和部分语谱特征作为情感识别的主要特征。最后,基于支持向量机算法对语音情感进行识别。在语音情感数据库上的识别实验显示,提出的算法具有较高的语音情感识别率和鲁棒性,尤其对于实用的烦躁情感的识别最为明显。此外,不同情感特征间的主向量分析显示,所选情感特征间的差异性大,实用性强。相似文献

13.

采用GW-MFCC模型空间参数的语音情感识别

沈燕肖仲喆李冰洁周孝进周强陶智《计算机工程与应用》2015,51(10):219-222

针对单一语音特征对语音情感表达不完整的问题,将具有良好量化和插值特性的LSF参数与体现人耳听觉特性的MFCC参数相融合,提出基于线谱权重的MFCC（WMFCC）新特征。同时,通过高斯混合模型来对该参数建立模型空间,进一步得到GW-MFCC模型空间参数,以获取更高维的细节信息,进一步提高情感识别性能。采用柏林情感语料库进行验证,新参数的识别率比传统的MFCC和LSF分别有5.7%和6.9%的提高。实验结果表明,提出的WMFCC以及GW-MFCC参数可以有效地表现语音情感信息,提高语音情感识别率。相似文献

14.

Unsupervised domain adaptation for speech emotion recognition using PCANet

Huang Zhengwei Xue Wentao Mao Qirong Zhan Yongzhao 《Multimedia Tools and Applications》2017,76(5):6785-6799

Multimedia Tools and Applications - Research in emotion recognition seeks to develop insights into the variances of features of emotion in one common domain. However, automatic emotion recognition... 相似文献

15.

基于混合分布注意力机制与混合神经网络的语音情绪识别方法

陈巧红于泽源贾宇波《计算机工程与科学》2022,44(12):2246-2254

针对现有语音情绪识别中存在无关特征多和准确率较差的问题,提出一种基于混合分布注意力机制与混合神经网络的语音情绪识别方法。该方法在2个通道内,分别使用卷积神经网络和双向长短时记忆网络进行语音的空间特征和时序特征提取,然后将2个网络的输出同时作为多头注意力机制的输入矩阵。同时,考虑到现有多头注意力机制存在的低秩分布问题,在注意力机制计算方式上进行改进,将低秩分布与2个神经网络的输出特征的相似性做混合分布叠加,再经过归一化操作后将所有子空间结果进行拼接,最后经过全连接层进行分类输出。实验结果表明,基于混合分布注意力机制与混合神经网络的语音情绪识别方法比现有其他方法的准确率更高,验证了所提方法的有效性。相似文献

16.

Machine learning techniques for speech emotion recognition using paralinguistic acoustic features

Jha Tulika Kavya Ramisetty Christopher Jabez Arunachalam Vasan 《International Journal of Speech Technology》2022,25(3):707-725

International Journal of Speech Technology - Speech emotion recognition is one of the fastest growing areas of interest in the field of affective computing. Emotion detection aids... 相似文献

17.

Robust speech recognition using spatial-temporal feature distribution characteristics

Berlin Chen Wei-Hau ChenShih-Hsiang Lin Wen-Yi Chu 《Pattern recognition letters》2011,32(7):919-926

Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper. 相似文献

18.

语音情感识别研究进展* 总被引：4，自引：1，他引：4

赵腊生张强魏小鹏《计算机应用研究》2009,26(2):428-432

首先介绍了语音情感识别系统的组成,重点对情感特征和识别算法的研究现状进行了综述,分析了主要的语音情感特征,阐述了代表性的语音情感识别算法以及混合模型,并对其进行了分析比较。最后,指出了语音情感识别技术的可能发展趋势。相似文献

19.

Modular nonparametric feature analysis for face recognition

CHING Qiang CHEN Wei qi 《计算机工程与科学》2013,35(5):112

相似文献

20.

Modelling speech emotion recognition using logistic regression and decision trees

Agnes Jacob 《International Journal of Speech Technology》2017,20(4):897-905

Speech emotion recognition has been one of the interesting issues in speech processing over the last few decades. Modelling of the emotion recognition process serves to understand as well as assess the performance of the system. This paper compares two different models for speech emotion recognition using vocal tract features namely, the first four formants and their respective bandwidths. The first model is based on a decision tree and the second one employs logistic regression. Whereas the decision tree models are based on machine learning, regression models have a strong statistical basis. The logistic regression models and the decision tree models developed in this work for several cases of binary classifications were validated by speech emotion recognition experiments conducted on a Malayalam emotional speech database of 2800 speech files, collected from ten speakers. The models are not only simple, but also meaningful since they indicate the contribution of each predictor. The experimental results indicate that speech emotion recognition using formants and bandwidths was better modelled using decision trees, which gave higher emotion recognition accuracies compared to logistic regression. The highest accuracy obtained using decision tree was 93.63%, for the classification of positive valence emotional speech as surprised or happy, using seven features. When using logistic regression for the same binary classification, the highest accuracy obtained was 73%, with eight features. 相似文献