首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Human–human interaction consists of various nonverbal behaviors that are often emotion-related. To establish rapport, it is essential that the listener respond to reactive emotion in a way that makes sense given the speaker's emotional state. However, human–robot interactions generally fail in this regard because most spoken dialogue systems play only a question-answer role. Aiming for natural conversation, we examine an emotion processing module that consists of a user emotion recognition function and a reactive emotion expression function for a spoken dialogue system to improve human–robot interaction. For the emotion recognition function, we propose a method that combines valence from prosody and sentiment from text by decision-level fusion, which considerably improves the performance. Moreover, this method reduces fatal recognition errors, thereby improving the user experience. For the reactive emotion expression function, the system's emotion is divided into emotion category and emotion level, which are predicted using the parameters estimated by the recognition function on the basis of distributions inferred from human–human dialogue data. As a result, the emotion processing module can recognize the user's emotion from his/her speech, and expresses a reactive emotion that matches. Evaluation with ten participants demonstrated that the system enhanced by this module is effective to conduct natural conversation.  相似文献   

2.
We describe an ontological model for representation and integration of electroencephalographic (EEG) data and apply it to detect human emotional states. The model (BIO_EMOTION) is an ontology-based context model for emotion recognition and acts as a basis for: (1) the modeling of users’ contexts, including user profiles, EEG data, the situation and environment factors, and (2) supporting reasoning on the users’ emotional states. Because certain ontological concepts in the EEG domain are ill-defined, we formally represent and store these concepts, their taxonomies and high-level representation (i.e., rules) in the model. To evaluate the effectiveness for inferring emotional states, DEAP dataset is used for model reasoning. Result shows that our model reaches an average recognition ratio of 75.19 % on Valence and 81.74 % on Arousal for eight participants. As mentioned above, the BIO-EMOTION model acts like a bridge between users’ emotional states and low-level bio-signal features. It can be integrated in user modeling techniques, and be used to model web users’ emotional states in human-centric web aiming to provide active, transparent, safe and reliable services to users. This work aims at, in other words, creating an ontology-based context model for emotion recognition using EEG. Particularly, this model completely implements the loop body of the W2T data cycle once: from low-level EEG feature acquisition to emotion recognition. A long-term goal for the study is to complete this model to implement the whole W2T data cycle.  相似文献   

3.
As a new cyber physical application, emotion recognition has been shown to make human-in-the-loop cyber-physical system (HilCPS) more efficient and sustainable. Therefore, emotion recognition is of great significance for HilCPS. Electroencephalogram (EEG) signals contain abundant and useful information, and can objectively reflect human emotional states. According to EEG signals, using machine learning to recognize emotion is the main method at present. This method depends on the quantity and quality of samples as well as the capability of classification model. However, the quantity of EEG samples is often insufficient and the quality of EEG samples is often irregular. Meanwhile, EEG samples possess strong nonlinearity. Therefore, an EEG emotion recognition method based on transfer learning (TL) and echo state network (ESN) for HilCPS is proposed in this paper. First, a selection algorithm of EEG samples based on average Frechet distance is proposed to improve the sample quality. Second, a feature transfer algorithm of EEG samples based on transfer component analysis is proposed to expand the sample quantity. Third, in order to solve the problem of the nonlinearity of EEG samples, a classification model of EEG samples based on ESN is constructed to accurately classify emotional states. Finally, experimental results show that compared with traditional methods, the proposed method can expand the quantity of the high-quality EEG samples and effectively improve the accuracy of emotion recognition.  相似文献   

4.
Tang  Zhichuan  Li  Xintao  Xia  Dan  Hu  Yidan  Zhang  Lingtao  Ding  Jun 《Multimedia Tools and Applications》2022,81(5):7085-7101

Self-assessment methods are widely used in art therapy evaluation, but emotional recognition methods using physiological signals’ features are more objectively and accurately. In this study, we proposed an electroencephalogram (EEG)-based art therapy evaluation method that could evaluate the therapeutic effect based on the emotional changes before and after the art therapy. Twelve participants were recruited in a two-step experiment (emotion stimulation step and drawing therapy step), and their EEG signals and self-assessment scores were collected. The self-assessment model (SAM) was used to obtain and label the actual emotional states; the long short-term memory (LSTM) network was used to extract the deep temporal features of EEG to recognize emotions. Further, the classification performances in different sequence lengths, time-window lengths and frequency combinations were compared and analyzed. The results showed that emotion recognition models with LSTM deep temporal features achieved the better classification performances than the state-of-the-art methods with non-temporal features; the classification accuracies in high-frequency bands (α, β, and γ bands) were higher than those in low-frequency bands (δ and θ bands); there was a highest emotion classification accuracy (93.24%) in 10-s sequence length, 2-s time-window length and 5-band frequency combination. Our proposed method could be used for emotion recognition effectively and accurately, and was an objective approach to assist therapists or patients in evaluating the effect of art therapy.

  相似文献   

5.
With an essential demand of human emotional behavior understanding and human machine interaction for the recent electronic applications, speaker emotion recognition is a key component which has attracted a great deal of attention among the researchers. Even though a handful of works are available in the literature for speaker emotion classification, the important challenges such as, distinct emotions, low quality recording, and independent affective states are still need to be addressed with good classifier and discriminative features. Accordingly, a new classifier, called fractional deep belief network (FDBN) is developed by combining deep belief network (DBN) and Fractional Calculus. This new classifier is trained with the multiple features such as tonal power ratio, spectral flux, pitch chroma and Mel frequency cepstral coefficients (MFCC) to make the emotional classes more separable through the spectral characteristics. The proposed FDBN classifier with integrated feature vectors is tested using two databases such as, Berlin database of emotional speech and real time Telugu database. The performance of the proposed FDBN and existing DBN classifiers are validated using False Acceptance Rate (FAR), False Rejection Rate (FRR) and Accuracy. The experimental results obtained by the proposed FDBN shows the accuracy of 98.39 and 95.88 % in Berlin and Telugu database.  相似文献   

6.
Recognition of emotion in speech has recently matured to one of the key disciplines in speech analysis serving next generation human-machine interaction and communication. However, compared to automatic speech recognition, that emotion recognition from an isolated word or a phrase is inappropriate for conversation. Because a complete emotional expression may stride across several sentences, and may fetch-up on any word in dialogue. In this paper, we present a segment-based emotion recognition approach to continuous Mandarin Chinese speech. In this proposed approach, the unit for recognition is not a phrase or a sentence but an emotional expression in dialogue. To that end, the following procedures are presented: First, we evaluate the performance of several classifiers in short sentence speech emotion recognition architectures. The results of the experiments show that the WD-KNN classifier achieves the best accuracy for the 5-class emotion recognition what among the five classification techniques. We then implemented a continuous Mandarin Chinese speech emotion recognition system with an emotion radar chart which is based on WD-KNN; this system can represent the intensity of each emotion component in speech. This proposed approach shows how emotions can be recognized by speech signals, and in turn how emotional states can be visualized.  相似文献   

7.
Emotion recognition using physiological signals has gained momentum in the field of human computer–interaction. This work focuses on developing a user‐independent emotion recognition system that would classify five emotions (happiness, sadness, fear, surprise and disgust) and neutral state. The various stages such as design of emotion elicitation protocol, data acquisition, pre‐processing, feature extraction and classification are discussed. Emotional data were obtained from 30 undergraduate students by using emotional video clips. Power and entropy features were obtained in three ways – by decomposing and reconstructing the signal using empirical mode decomposition, by using a Hilbert–Huang transform and by applying a discrete Fourier transform to the intrinsic mode functions (IMFs). Statistical analysis using analysis of variance indicates significant differences among the six emotional states (p < 0.001). Classification results indicate that applying the discrete Fourier transform instead of the Hilbert transform to the IMFs provides comparatively better accuracy for all the six classes with an overall accuracy of 52%. Although the accuracy is less, it reveals the possibility of developing a system that could identify the six emotional states in a user‐independent manner using electrocardiogram signals. The accuracy of the system can be improved by investigating the power and entropy of the individual IMFs.  相似文献   

8.
The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.  相似文献   

9.
《Advanced Robotics》2013,27(1-2):47-67
Depending on the emotion of speech, the meaning of the speech or the intention of the speaker differs. Therefore, speech emotion recognition, as well as automatic speech recognition is necessary to communicate precisely between humans and robots for human–robot interaction. In this paper, a novel feature extraction method is proposed for speech emotion recognition using separation of phoneme class. In feature extraction, the signal variation caused by different sentences usually overrides the emotion variation and it lowers the performance of emotion recognition. However, as the proposed method extracts features from speech in parts that correspond to limited ranges of the center of gravity of the spectrum (CoG) and formant frequencies, the effects of phoneme variation on features are reduced. Corresponding to the range of CoG, the obstruent sounds are discriminated from sonorant sounds. Moreover, the sonorant sounds are categorized into four classes by the resonance characteristics revealed by formant frequency. The result shows that the proposed method using 30 different speakers' corpora improves emotion recognition accuracy compared with other methods by 99% significance level. Furthermore, the proposed method was applied to extract several features including prosodic and phonetic features, and was implemented on 'Mung' robots as an emotion recognizer of users.  相似文献   

10.
Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines. Speech Emotion Recognition (SER) is one of the critical sources for human evaluation, which is applicable in many real-world applications such as healthcare, call centers, robotics, safety, and virtual reality. This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker’s emotional state. The authors designed a Temporal Convolutional Network (TCN) core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification. The proposed network extracts valid sequential cues automatically from speech signals, which performed better than state-of-the-art (SOTA) and traditional machine learning algorithms. Results of the proposed method show a high recognition rate compared with SOTA methods. The final unweighted accuracy of 80.84%, and 92.31%, for interactive emotional dyadic motion captures (IEMOCAP) and berlin emotional dataset (EMO-DB), indicate the robustness and efficiency of the designed model.  相似文献   

11.
Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.  相似文献   

12.
During Covid pandemic, many individuals are suffering from suicidal ideation in the world. Social distancing and quarantining, affects the patient emotionally. Affective computing is the study of recognizing human feelings and emotions. This technology can be used effectively during pandemic for facial expression recognition which automatically extracts the features from the human face. Monitoring system plays a very important role to detect the patient condition and to recognize the patterns of expression from the safest distance. In this paper, a new method is proposed for emotion recognition and suicide ideation detection in COVID patients. This helps to alert the nurse, when patient emotion is fear, cry or sad. The research presented in this paper has introduced Image Processing technology for emotional analysis of patients using Machine learning algorithm. The proposed Convolution Neural Networks (CNN) architecture with DnCNN preprocessing enhances the performance of recognition. The system can analyze the mood of patients either in real time or in the form of video files from CCTV cameras. The proposed method accuracy is more when compared to other methods. It detects the chances of suicide attempt based on stress level and emotional recognition.  相似文献   

13.
The ability to recognize emotion is one of the hallmarks of emotional intelligence, an aspect of human intelligence that has been argued to be even more important than mathematical and verbal intelligences. This paper proposes that machine intelligence needs to include emotional intelligence and demonstrates results toward this goal: developing a machine's ability to recognize the human affective state given four physiological signals. We describe difficult issues unique to obtaining reliable affective data and collect a large set of data from a subject trying to elicit and experience each of eight emotional states, daily, over multiple weeks. This paper presents and compares multiple algorithms for feature-based recognition of emotional state from this data. We analyze four physiological signals that exhibit problematic day-to-day variations: The features of different emotions on the same day tend to cluster more tightly than do the features of the same emotion on different days. To handle the daily variations, we propose new features and algorithms and compare their performance. We find that the technique of seeding a Fisher Projection with the results of sequential floating forward search improves the performance of the Fisher Projection and provides the highest recognition rates reported to date for classification of affect from physiology: 81 percent recognition accuracy on eight classes of emotion, including neutral  相似文献   

14.
To make human–computer interaction more naturally and friendly, computers must enjoy the ability to understand human’s affective states the same way as human does. There are many modals such as face, body gesture and speech that people use to express their feelings. In this study, we simulate human perception of emotion through combining emotion-related information using facial expression and speech. Speech emotion recognition system is based on prosody features, mel-frequency cepstral coefficients (a representation of the short-term power spectrum of a sound) and facial expression recognition based on integrated time motion image and quantized image matrix, which can be seen as an extension to temporal templates. Experimental results showed that using the hybrid features and decision-level fusion improves the outcome of unimodal systems. This method can improve the recognition rate by about 15 % with respect to the speech unimodal system and by about 30 % with respect to the facial expression system. By using the proposed multi-classifier system that is an improved hybrid system, recognition rate would increase up to 7.5 % over the hybrid features and decision-level fusion with RBF, up to 22.7 % over the speech-based system and up to 38 % over the facial expression-based system.  相似文献   

15.
Human emotion recognition using brain signals is an active research topic in the field of affective computing. Music is considered as a powerful tool for arousing emotions in human beings. This study recognized happy, sad, love and anger emotions in response to audio music tracks from electronic, rap, metal, rock and hiphop genres. Participants were asked to listen to audio music tracks of 1 min for each genre in a noise free environment. The main objectives of this study were to determine the effect of different genres of music on human emotions and indicating age group that is more responsive to music. Thirty men and women of three different age groups (15–25 years, 26–35 years and 36–50 years) underwent through the experiment that also included self reported emotional state after listening to each type of music. Features from three different domains i.e., time, frequency and wavelet were extracted from recorded EEG signals, which were further used by the classifier to recognize human emotions. It has been evident from results that MLP gives best accuracy to recognize human emotion in response to audio music tracks using hybrid features of brain signals. It is also observed that rock and rap genres generated happy and sad emotions respectively in subjects under study. The brain signals of age group (26–35 years) gave best emotion recognition accuracy in accordance to the self reported emotions.  相似文献   

16.
为获得更丰富的情感信息、有效识别长语音的情感状态,提出基于D-S证据理论的多粒度语段融合语音情感识别方法。采用2种分段方法对语音样本分段,用SVM对语段进行识别,再利用D-S证据理论对各语音段识别结果进行决策融合,得到2种分段方法下语音的情感识别结果,将这2个识别结果进一步融合得到最终结果。实验结果表明,该方法具有较好的整体识别性能,能有效提高语音情感的识别率。  相似文献   

17.
Multi-modal affective data such as EEG and physiological signals is increasingly utilized to analyze of human emotional states. Due to the noise existed in collected affective data, however, the performance of emotion recognition is still not satisfied. In fact, the issue of emotion recognition can be regarded as channel coding, which focuses on reliable communication through noise channels. Using affective data and its label, the redundant codeword would be generated to correct signals noise and recover emotional label information. Therefore, we utilize multi-label output codes method to improve accuracy and robustness of multi-dimensional emotion recognition by training a redundant codeword model, which is the idea of error-correcting output codes. The experiment results on DEAP dataset show that the multi-label output codes method outperforms other traditional machine learning or pattern recognition methods for the prediction of emotional multi-labels.  相似文献   

18.
Speech signals and glottal signals convey speakers’ emotional state along with linguistic information. To recognize speakers’ emotions and respond to it expressively is very much important for human-machine interaction. To develop a subject independent speech emotion/stress recognition system, by identifying speaker's emotion from their voices, features from OpenSmile toolbox, higher order spectral features and feature selection algorithm, is proposed in this work. Feature selection plays an important role in overcoming the challenge of dimensionality in several applications. This paper proposes a new particle swarm optimization assisted Biogeography-based algorithm for feature selection. The simulations were conducted using Berlin Emotional Speech Database (BES), Surrey Audio-Visual Expressed Emotion Database (SAVEE), Speech under Simulated and Actual Stress (SUSAS) and also validated using eight benchmark datasets. These datasets are of different dimensions and classes. Totally eight different experiments were conducted and obtained the recognition rates in range of 90.31%–99.47% (BES database), 62.50%–78.44% (SAVEE database) and 85.83%–98.70% (SUSAS database). The obtained results convincingly prove the effectiveness of the proposed feature selection algorithm when compared to the previous works and other metaheuristic algorithms (BBO and PSO).  相似文献   

19.
针对汉语语音情感识别问题,提出了一种基于脉冲耦合神经网络(PCNN)的识别方法。该方法将语音转化为语谱图后输入到PCNN,得到输出图像的神经元点火序列及其熵序列作为语音情感的特征,利用其特征实现语音情感识别。实验结果表明,该方法可以有效地识别“高兴”与“平常”这两种不同的情感。该方法将PCNN引入到语音情感识别的应用研究中,开拓了语音和图像信号结合处理的新领域,同时对于PCNN的理论研究和实际应用具有重要的现实意义。  相似文献   

20.
Extracting and understanding of emotion is of high importance for the interaction between human and machine communication systems. The most expressive way to display the human’s emotion is through facial expression analysis. This paper proposes a multiple emotion recognition system that can recognize combinations of up to a maximum of three different emotions using an active appearance model (AAM), the proposed classification standard, and a k-nearest neighbor (k-NN) classifier in mobile environments. AAM can take the expression of variations that are calculated by the proposed classification standard according to changes in human expressions in real time. The proposed k-NN can classify basic emotions (normal, happy, sad, angry, surprise) as well as more ambiguous emotions by combining the basic emotions in real time, and each recognized emotion that can be subdivided has strength. Whereas most previous methods of emotion recognition recognize various kind of a single emotion, this paper recognizes various emotions with a combination of the five basic emotions. To be easily understood, the recognized result is presented in three ways on a mobile camera screen. The result of the experiment was an average 85 % recognition rate and a 40 % performance showed optimized emotions. The implemented system can be represented by one of the example for augmented reality on displaying combination of real face video and virtual animation with user’s avatar.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号