首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 412 毫秒
1.
The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.  相似文献   

2.
With an essential demand of human emotional behavior understanding and human machine interaction for the recent electronic applications, speaker emotion recognition is a key component which has attracted a great deal of attention among the researchers. Even though a handful of works are available in the literature for speaker emotion classification, the important challenges such as, distinct emotions, low quality recording, and independent affective states are still need to be addressed with good classifier and discriminative features. Accordingly, a new classifier, called fractional deep belief network (FDBN) is developed by combining deep belief network (DBN) and Fractional Calculus. This new classifier is trained with the multiple features such as tonal power ratio, spectral flux, pitch chroma and Mel frequency cepstral coefficients (MFCC) to make the emotional classes more separable through the spectral characteristics. The proposed FDBN classifier with integrated feature vectors is tested using two databases such as, Berlin database of emotional speech and real time Telugu database. The performance of the proposed FDBN and existing DBN classifiers are validated using False Acceptance Rate (FAR), False Rejection Rate (FRR) and Accuracy. The experimental results obtained by the proposed FDBN shows the accuracy of 98.39 and 95.88 % in Berlin and Telugu database.  相似文献   

3.
The work presented in this paper explores the effectiveness of incorporating the excitation source parameters such as strength of excitation and instantaneous fundamental frequency (\(F_0\)) for emotion recognition task from speech and electroglottographic (EGG) signals. The strength of excitation (SoE) is an important parameter indicating the pressure with which glottis closes at the glottal closure instants (GCIs). The SoE is computed by the popular zero frequency filtering (ZFF) method which accurately estimates the glottal signal characteristics by attenuating or removing the high frequency vocaltract interactions in speech. The arbitrary impulse sequence, obtained from the estimated GCIs, is used to derive the instantaneous \(F_0\). The SoE and the instantaneous \(F_0\) parameters are combined with the conventional mel frequency cepstral coefficients (MFCC) to improve the recognition rates of distinct emotions (Anger, Happy and Sad) using Gaussian mixture models as classifier. The performances of the proposed combination of SoE and instantaneous \(F_0\) and their dynamic features with MFCC coefficients are compared with the emotion utterances (4 emotions and neutral) from classical German full blown emotion speech database (EmoDb) having simultaneous speech and EGG signals and Surrey Audio Visual Expressed Emotion database (3 emotions and neutral) for both speaker dependent and speaker independent emotion recognition scenarios. To reinforce the effectiveness of the proposed features and for better statistical consistency of the emotion analysis, a fairly large emotion speech database of 220 utterances per emotion in Tamil language with simultaneous EGG recordings, is used in addition to EmoDb. The effectiveness of SoE and instantaneous \(F_0\) in characterizing different emotions is also confirmed by the improved emotion recognition performance in Tamil speech-EGG emotion database.  相似文献   

4.
为了解决声音和图像情感识别的不足,提出一种新的情感识别方式:触觉情感识别。对CoST(corpus of social touch)数据集进行了一系列触觉情感识别研究,对CoST数据集进行数据预处理,提出一些关于触觉情感识别的特征。利用极限学习机分类器探究不同手势下的情感识别,对14种手势下的3种情感(温柔、正常、暴躁)进行识别,准确度较高,且识别速度快识别时间短。结果表明,手势的不同会影响情感识别的准确率,其中手势“stroke”的识别效果在不同分类器下的分类精度均为最高,且有较好的分类精度,达到72.07%;极限学习机作为触觉情感识别的分类器,具有较好的分类效果,识别速度快;有的手势本身对应着某种情感,从而影响分类结果。  相似文献   

5.
6.
In this paper, we present human emotion recognition systems based on audio and spatio-temporal visual features. The proposed system has been tested on audio visual emotion data set with different subjects for both genders. The mel-frequency cepstral coefficient (MFCC) and prosodic features are first identified and then extracted from emotional speech. For facial expressions spatio-temporal features are extracted from visual streams. Principal component analysis (PCA) is applied for dimensionality reduction of the visual features and capturing 97 % of variances. Codebook is constructed for both audio and visual features using Euclidean space. Then occurrences of the histograms are employed as input to the state-of-the-art SVM classifier to realize the judgment of each classifier. Moreover, the judgments from each classifier are combined using Bayes sum rule (BSR) as a final decision step. The proposed system is tested on public data set to recognize the human emotions. Experimental results and simulations proved that using visual features only yields on average 74.15 % accuracy, while using audio features only gives recognition average accuracy of 67.39 %. Whereas by combining both audio and visual features, the overall system accuracy has been significantly improved up to 80.27 %.  相似文献   

7.
Recognition of emotion in speech has recently matured to one of the key disciplines in speech analysis serving next generation human-machine interaction and communication. However, compared to automatic speech recognition, that emotion recognition from an isolated word or a phrase is inappropriate for conversation. Because a complete emotional expression may stride across several sentences, and may fetch-up on any word in dialogue. In this paper, we present a segment-based emotion recognition approach to continuous Mandarin Chinese speech. In this proposed approach, the unit for recognition is not a phrase or a sentence but an emotional expression in dialogue. To that end, the following procedures are presented: First, we evaluate the performance of several classifiers in short sentence speech emotion recognition architectures. The results of the experiments show that the WD-KNN classifier achieves the best accuracy for the 5-class emotion recognition what among the five classification techniques. We then implemented a continuous Mandarin Chinese speech emotion recognition system with an emotion radar chart which is based on WD-KNN; this system can represent the intensity of each emotion component in speech. This proposed approach shows how emotions can be recognized by speech signals, and in turn how emotional states can be visualized.  相似文献   

8.
A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks – multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5 M and 5 F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.  相似文献   

9.
For human-machine communication to be as effective as human-to-human communication, research on speech emotion recognition is essential. Among the models and the classifiers used to recognize emotions, neural networks appear to be promising due to the network’s ability to learn and the diversity in configuration. Following the convolutional neural network, a capsule neural network (CapsNet) with inputs and outputs that are not scalar quantities but vectors allows the network to determine the part-whole relationships that are specific 6 for an object. This paper performs speech emotion recognition based on CapsNet. The corpora for speech emotion recognition have been augmented by adding white noise and changing voices. The feature parameters of the recognition system input are mel spectrum images along with the characteristics of the sound source, vocal tract and prosody. For the German emotional corpus EMO-DB, the average accuracy score for 4 emotions, neutral, boredom, anger and happiness, is 99.69%. For Vietnamese emotional corpus BKEmo, this score is 94.23% for 4 emotions, neutral, sadness, anger and happiness. The accuracy score is highest when combining all the above feature parameters, and this score increases significantly when combining mel spectrum images with the features directly related to the fundamental frequency.  相似文献   

10.
为了提高语音情感识别系统的识别准确率,本文在传统支持向量机(SVM)方法的基础之上,提出了一种基于PCA的多级SVM情感分类算法。首先将容易区分的情感分开,针对混淆度大且不能再利用多级分类策略直接进行区分的情感,采用主成分分析法(PCA)进行特征降维,然后逐级地判断出输入语音所属的情感类型。与传统基于SVM分类算法的语音情感识别相比,本文提出的方法可将7种情感的平均识别率提高5.05%,并且特征维度可降低58.3%,从而证明了本文所提出的方法的正确性与有效性。  相似文献   

11.
In this work, spectral features extracted from sub-syllabic regions and pitch synchronous analysis are proposed for speech emotion recognition. Linear prediction cepstral coefficients, mel frequency cepstral coefficients and the features extracted from high amplitude regions of spectrum are used to represent emotion specific spectral information. These features are extracted from consonant, vowel and transition regions of each syllable to study the contribution of these regions toward recognition of emotions. Consonant, vowel and the transition regions are determined using vowel onset points. Spectral features extracted from each pitch cycle, are also used to recognize emotions present in speech. The emotions used in this study are: anger, fear, happy, neutral and sad. The emotion recognition performance using sub-syllabic speech segments are compared with the results of conventional block processing approach, where entire speech signal is processed frame by frame. The proposed emotion specific features are evaluated using simulated emotion speech corpus, IITKGP-SESC (Indian Institute of Technology, KharaGPur-Simulated Emotion Speech Corpus). The emotion recognition results obtained using IITKGP-SESC are compared with the results of Berlin emotion speech corpus. Emotion recognition systems are developed using Gaussian mixture models and auto-associative neural networks. The purpose of this study is to explore sub-syllabic regions to identify the emotions embedded in a speech signal, and if possible, to avoid processing of entire speech signal for emotion recognition without serious compromise in the performance.  相似文献   

12.
人在不同情感下的语音信号其非平稳性尤为明显,传统的MFCC只能反映语音信号的静态特征,经验模态分解能够精细地刻画语音信号的非平稳特性。为提取情感语音的非平稳特征,用经验模态分解将情感语音信号分解为一系列固有模态函数分量,通过Mel滤波器后取其对数能量,进行DCT反变换后得到改进的MFCC作为情感识别的新特征,采用支持向量机对高兴、生气、厌烦和恐惧等四种语音情感识别。仿真实验结果表明:改进的MFCC识别率达到77.17%,在不同的信噪比下,识别率最大可提高3.26%。  相似文献   

13.
Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.  相似文献   

14.
基于PCA和SVM的普通话语音情感识别   总被引:1,自引:0,他引:1  
蒋海华  胡斌 《计算机科学》2015,42(11):270-273
在语音情感识别中,情感特征的选取与抽取是重要环节。目前,还没有非常有效的语音情感特征被提出。因此,在包含6种情感的普通话情感语料库中,根据普通话不同于西方语种的特点,选取了一些有效的情感特征,包含Mel频率倒谱系数、基频、短时能量、短时平均过零率和第一共振峰等,进行提取并计算得到不同的统计量;接着采用主成分分析(PCA)进行抽取;最后利用基于支持向量机(SVM)的语音情感识别系统进行分类。实验结果表明, 与其他一些重要的研究结果相比,该方法得到了较高的平均情感识别率, 且情感特征的选取、抽取及建模是合理、有效的。  相似文献   

15.
Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.  相似文献   

16.
This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions. Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise. In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients, linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition results with different speech features. This analysis help select the best feature set for speaker recognition under stressed condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude (CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question and Neutral, are used for analysis of speaker recognition under stressed condition.  相似文献   

17.
The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human–machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features.  相似文献   

18.
Voting ensembles for spoken affect classification   总被引:1,自引:0,他引:1  
Affect or emotion classification from speech has much to benefit from ensemble classification methods. In this paper we apply a simple voting mechanism to an ensemble of classifiers and attain a modest performance increase compared to the individual classifiers. A natural emotional speech database was compiled from 11 speakers. Listener-judges were used to validate the emotional content of the speech. Thirty-eight prosody-based features correlating characteristics of speech with emotional states were extracted from the data. A classifier ensemble was designed using a multi-layer perceptron, support vector machine, K* instance-based learner, K-nearest neighbour, and random forest of decision trees. A simple voting scheme determined the most popular prediction. The accuracy of the ensemble is compared with the accuracies of the individual classifiers.  相似文献   

19.
Recognizing Human Emotional State From Audiovisual Signals   总被引:1,自引:0,他引:1  
Machine recognition of human emotional state is an important component for efficient human-computer interaction. The majority of existing works address this problem by utilizing audio signals alone, or visual information only. In this paper, we explore a systematic approach for recognition of human emotional state from audiovisual signals. The audio characteristics of emotional speech are represented by the extracted prosodic, Mel-frequency Cepstral Coefficient (MFCC), and formant frequency features. A face detection scheme based on HSV color model is used to detect the face from the background. The visual information is represented by Gabor wavelet features. We perform feature selection by using a stepwise method based on Mahalanobis distance. The selected audiovisual features are used to classify the data into their corresponding emotions. Based on a comparative study of different classification algorithms and specific characteristics of individual emotion, a novel multiclassifier scheme is proposed to boost the recognition performance. The feasibility of the proposed system is tested over a database that incorporates human subjects from different languages and cultural backgrounds. Experimental results demonstrate the effectiveness of the proposed system. The multiclassifier scheme achieves the best overall recognition rate of 82.14%.  相似文献   

20.
Over the last decade, an increasing number of studies have focused on automated recognition of human emotions by machines. However, performances of machine emotion recognition studies are difficult to interpret because benchmarks have not been established. To provide such a benchmark, we compared machine with human emotion recognition. We gathered facial expressions, speech, and physiological signals from 17 individuals expressing 5 different emotional states. Support vector machines achieved an 82% recognition accuracy based on physiological and facial features. In experiments with 75 humans on the same data, a maximum recognition accuracy of 62.8% was obtained. As machines outperformed humans, automated emotion recognition might be ready to be tested in more practical applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号