首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
由于人类情感的表达受文化和社会的影响,不同语言语音情感的特征差异较大,导致单一语言语音情感识别模型泛化能力不足。针对该问题,提出了一种基于多任务注意力的多语言语音情感识别方法。通过引入语言种类识别辅助任务,模型在学习不同语言共享情感特征的同时也能学习各语言独有的情感特性,从而提升多语言情感识别模型的多语言情感泛化能力。在两种语言的维度情感语料库上的实验表明,所提方法相比于基准方法在Valence和Arousal任务上的相对UAR均值分别提升了3.66%~5.58%和1.27%~6.51%;在四种语言的离散情感语料库上的实验表明,所提方法的相对UAR均值相比于基准方法提升了13.43%~15.75%。因此,提出的方法可以有效地抽取语言相关的情感特征并提升多语言情感识别的性能。  相似文献   

2.
The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.  相似文献   

3.
This paper proposes a new deep learning architecture for context-based multi-label multi-task emotion recognition. The architecture is built from three main modules: (1) a body features extraction module, which is a pre-trained Xception network, (2) a scene features extraction module, based on a modified VGG16 network, and (3) a fusion-decision module. Moreover, three categorical and three continuous loss functions are compared in order to point out the importance of the synergy between loss functions when it comes to multi-task learning. Then, we propose a new loss function, the multi-label focal loss (MFL), based on the focal loss to deal with imbalanced data. Experimental results on EMOTIC dataset show that MFL with the Huber loss gave better results than any other combination and outperformed the current state of art on the less frequent labels.  相似文献   

4.
Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.  相似文献   

5.
语音情感识别是近年来新兴的研究课题之一,特征参数的提取直接影响到最终的识别效率,特征降维可以提取出最能区分不同情感的特征参数。提出了特征参数在语音情感识别中的重要性,介绍了语音情感识别系统的基本组成,重点对特征参数的研究现状进行了综述,阐述了目前应用于情感识别的特征降维常用方法,并对其进行了分析比较。展望了语音情感识别的可能发展趋势。  相似文献   

6.
7.
语音情感识别的精度很大程度上取决于不同情感间的特征差异性。从分析语音的时频特性入手,结合人类的听觉选择性注意机制,提出一种基于语谱特征的语音情感识别算法。算法首先模拟人耳的听觉选择性注意机制,对情感语谱信号进行时域和频域上的分割提取,从而形成语音情感显著图。然后,基于显著图,提出采用Hu不变矩特征、纹理特征和部分语谱特征作为情感识别的主要特征。最后,基于支持向量机算法对语音情感进行识别。在语音情感数据库上的识别实验显示,提出的算法具有较高的语音情感识别率和鲁棒性,尤其对于实用的烦躁情感的识别最为明显。此外,不同情感特征间的主向量分析显示,所选情感特征间的差异性大,实用性强。  相似文献   

8.
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features.  相似文献   

9.
为提高语音情感识别精度,对基本声学特征构建的多维特征集合,采用二次特征选择方法综合考虑特征参数与情感类别之间的内在特性,从而建立优化的、具有有效情感可分性的特征子集;在语音情感识别阶段,设计二叉树结构的多分类器以综合考虑系统整体性能与复杂度,采用核融合方法改进SVM模型,使用多核SVM识别混淆度最大的情感。算法在Berlin情感语音库五种情感状态的样本上进行验证,实验结果表明二次特征选择与核融合相结合的方法在有效提高情感识别精度的同时,对噪声具有一定的鲁棒性。  相似文献   

10.
11.
12.
语音情感识别研究进展*   总被引:4,自引:1,他引:4  
首先介绍了语音情感识别系统的组成,重点对情感特征和识别算法的研究现状进行了综述,分析了主要的语音情感特征,阐述了代表性的语音情感识别算法以及混合模型,并对其进行了分析比较。最后,指出了语音情感识别技术的可能发展趋势。  相似文献   

13.
Emotion recognition has become an important component of human–computer interaction systems. Research on emotion recognition based on electroencephalogram (EEG) signals are mostly conducted by the analysis of all channels' EEG signals. Although some progresses are achieved, there are still several challenges such as high dimensions, correlation between different features and feature redundancy in the realistic experimental process. These challenges have hindered the applications of emotion recognition to portable human–computer interaction systems (or devices). This paper explores how to find out the most effective EEG features and channels for emotion recognition so as to only collect data as less as possible. First, discriminative features of EEG signals from different dimensionalities are extracted for emotion classification, including the first difference, multiscale permutation entropy, Higuchi fractal dimension, and discrete wavelet transform. Second, relief algorithm and floating generalized sequential backward selection algorithm are integrated as a novel channel selection method. Then, support vector machine is employed to classify the emotions for verifying the performance of the channel selection method and extracted features. At last, experimental results demonstrate that the optimal channel set, which are mostly located at the frontal, has extremely high similarity on the self‐collected data set and the public data set and the average classification accuracy is achieved up to 91.31% with the selected 10‐channel EEG signals. The findings are valuable for the practical EEG‐based emotion recognition systems.  相似文献   

14.
Multimedia Tools and Applications - Multimedia content analysis and understanding, such as action recognition and image classification, is a fundamental research problem. One effective strategy to...  相似文献   

15.
Multi-task learning (MTL) aims to enhance the generalization performance of supervised regression or classification by learning multiple related tasks simultaneously. In this paper, we aim to extend the current MTL techniques to high dimensional data sets with structured input and structured output (SISO), where the SI means the input features are structured and the SO means the tasks are structured. We investigate a completely ignored problem in MTL with SISO data: the interplay of structured feature selection and task relationship modeling. We hypothesize that combining the structure information of features and task relationship inference enables us to build more accurate MTL models. Based on the hypothesis, we have designed an efficient learning algorithm, in which we utilize a task covariance matrix related to the model parameters to capture the task relationship. In addition, we design a regularization formulation for incorporating the structured input features in MTL. We have developed an efficient iterative optimization algorithm to solve the corresponding optimization problem. Our algorithm is based on the accelerated first order gradient method in conjunction with the projected gradient scheme. Using two real-world data sets, we demonstrate the utility of the proposed learning methods.  相似文献   

16.
To make human–computer interaction more naturally and friendly, computers must enjoy the ability to understand human’s affective states the same way as human does. There are many modals such as face, body gesture and speech that people use to express their feelings. In this study, we simulate human perception of emotion through combining emotion-related information using facial expression and speech. Speech emotion recognition system is based on prosody features, mel-frequency cepstral coefficients (a representation of the short-term power spectrum of a sound) and facial expression recognition based on integrated time motion image and quantized image matrix, which can be seen as an extension to temporal templates. Experimental results showed that using the hybrid features and decision-level fusion improves the outcome of unimodal systems. This method can improve the recognition rate by about 15 % with respect to the speech unimodal system and by about 30 % with respect to the facial expression system. By using the proposed multi-classifier system that is an improved hybrid system, recognition rate would increase up to 7.5 % over the hybrid features and decision-level fusion with RBF, up to 22.7 % over the speech-based system and up to 38 % over the facial expression-based system.  相似文献   

17.
Current emotion recognition computational techniques have been successful on associating the emotional changes with the EEG signals, and so they can be identified and classified from EEG signals if appropriate stimuli are applied. However, automatic recognition is usually restricted to a small number of emotions classes mainly due to signal’s features and noise, EEG constraints and subject-dependent issues. In order to address these issues, in this paper a novel feature-based emotion recognition model is proposed for EEG-based Brain–Computer Interfaces. Unlike other approaches, our method explores a wider set of emotion types and incorporates additional features which are relevant for signal pre-processing and recognition classification tasks, based on a dimensional model of emotions: Valenceand Arousal. It aims to improve the accuracy of the emotion classification task by combining mutual information based feature selection methods and kernel classifiers. Experiments using our approach for emotion classification which combines efficient feature selection methods and efficient kernel-based classifiers on standard EEG datasets show the promise of the approach when compared with state-of-the-art computational methods.  相似文献   

18.
给出了基于公共码书的说话人分布特征的定义。提出了基于分布特征统计的说话人识别算法,根据所有参考说话人的训练语音建立公共码书,实现对语音特征空间的分类,统计各参考说话人训练语音的在公共码字上的分布特征进行建模。识别中引入双序列比对方法进行识别语音的分布特征统计与参考说话人模型间的相似度匹配,实现对说话人的辨认。实验表明,该方法保证识别率的情况下,进一步提高了基于VQ的说话人识别的速度。  相似文献   

19.
Over the past few decades, biometric recognition firmly established itself as one of the areas of tremendous potential to make scientific discovery and to advance state-of-the- art research in security domain. Hardly, there is a single area of IT left untouched by increased vulnerabilities, on-line scams, e-fraud, illegal activities, and event pranks in virtual worlds. In parallel with biometric development, which went from focus on single biometric recognition methods to multi-modal information fusion, another rising area of research is virtual world’s security and avatar recognition. This article explores links between multi-biometric system architecture and virtual worlds face recognition, and proposes methodology which can be of benefit for both applications.  相似文献   

20.
模糊认知图(Fuzzy Cognitive Map,FCM)作为一种图分析方法已在数据分类方面得到应用,为了提高其在语音情感识别中的分类精度,提出了融合FCM的方法。其中包括特征级融合和决策级融合两种方式。详细分析了这两种方式并提出将传统的模糊认知图的数值型输出转化为概率型输出,为不同特征提供了统一范围的初级识别结果。在此基础上,提出了自适应权值决策级融合方法。该方法充分考虑了分类器对不同特征的识别准确率差异。实验证明,提出的融合FCM方法相较于单一特征和单一分类器,具有更优的分类性能,同时大大降低了情感间的混淆程度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号