首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
研究了情绪的维度空间模型与语音声学特征之间的关系以及语音情感的自动识别方法。介绍了基本情绪的维度空间模型,提取了唤醒度和效价度对应的情感特征,采用全局统计特征减小文本差异对情感特征的影响。研究了生气、高兴、悲伤和平静等情感状态的识别,使用高斯混合模型进行4种基本情感的建模,通过实验设定了高斯混合模型的最佳混合度,从而较好地拟合了4种情感在特征空间中的概率分布。实验结果显示,选取的语音特征适合于基本情感类别的识别,高斯混合模型对情感的建模起到了较好的效果,并且验证了二维情绪空间中,效价维度上的情感特征对语音情感识别的重要作用。  相似文献   

2.
The popularity of computer games has exploded in recent years, yet methods of evaluating user emotional state during play experiences lag far behind. There are few methods of assessing emotional state, and even fewer methods of quantifying emotion during play. This paper presents a novel method for continuously modeling emotion using physiological data. A fuzzy logic model transformed four physiological signals into arousal and valence. A second fuzzy logic model transformed arousal and valence into five emotional states relevant to computer game play: boredom, challenge, excitement, frustration, and fun. Modeled emotions compared favorably with a manual approach, and the means were also evaluated with subjective self-reports, exhibiting the same trends as reported emotions for fun, boredom, and excitement. This approach provides a method for quantifying emotional states continuously during a play experience.  相似文献   

3.
Emotion recognition is a crucial application in human–computer interaction. It is usually conducted using facial expressions as the main modality, which might not be reliable. In this study, we proposed a multimodal approach that uses 2-channel electroencephalography (EEG) signals and eye modality in addition to the face modality to enhance the recognition performance. We also studied the use of facial images versus facial depth as the face modality and adapted the common arousal–valence model of emotions and the convolutional neural network, which can model the spatiotemporal information from the modality data for emotion recognition. Extensive experiments were conducted on the modality and emotion data, the results of which showed that our system has high accuracies of 67.8% and 77.0% in valence recognition and arousal recognition, respectively. The proposed method outperformed most state-of-the-art systems that use similar but fewer modalities. Moreover, the use of facial depth has outperformed the use of facial images. The proposed method of emotion recognition has significant potential for integration into various educational applications.  相似文献   

4.
The work presented in this paper aims at assessing human emotions using peripheral as well as electroencephalographic (EEG) physiological signals on short-time periods. Three specific areas of the valence–arousal emotional space are defined, corresponding to negatively excited, positively excited, and calm-neutral states. An acquisition protocol based on the recall of past emotional life episodes has been designed to acquire data from both peripheral and EEG signals. Pattern classification is used to distinguish between the three areas of the valence–arousal space. The performance of several classifiers has been evaluated on 10 participants and different feature sets: peripheral features, EEG time–frequency features, EEG pairwise mutual information (MI) features. Comparison of results obtained using either peripheral or EEG signals confirms the interest of using EEGs to assess valence and arousal in emotion recall conditions. The obtained accuracy for the three emotional classes is 63% using EEG time–frequency features, which is better than the results obtained from previous studies using EEG and similar classes. Fusion of the different feature sets at the decision level using a summation rule also showed to improve accuracy to 70%. Furthermore, the rejection of non-confident samples finally led to a classification accuracy of 80% for the three classes.  相似文献   

5.

Human–robot interaction was always based on estimation of human emotions from human facial expressions, voice and gestures. Human emotions were always categorized in a discretized manner, while we estimate facial images from common datasets for continuous emotions. Linear regression was used in this study which numerically quantizes human emotions as valence and arousal by displaying the raw images on the two-respective coordinate axis. The face image datasets from the Japanese female facial expression (JAFFE) dataset and the extended Cohn–Kanade (CK+) dataset were used in this experiment. Human emotions for the above-mentioned datasets were interpreted by 85 participants who were used in the experimentation. The best result from a series of experiments shows that the minimum of root mean square error for the JAFFE dataset was 0.1661 for valence and 0.1379 for arousal. The proposed method has been compared with previous methods such as songs, sentences, and it is observed that the proposed method for common datasets testing showed an outstanding emotion estimation performance.

  相似文献   

6.
Whang MC  Lim JS  Boucsein W 《Human factors》2003,45(4):623-634
Despite rapid advances in technology, computers remain incapable of responding to human emotions. An exploratory study was conducted to find out what physiological parameters might be useful to differentiate among 4 emotional states, based on 2 dimensions: pleasantness versus unpleasantness and arousal versus relaxation. The 4 emotions were induced by exposing 26 undergraduate students to different combinations of olfactory and auditory stimuli, selected in a pretest from 12 stimuli by subjective ratings of arousal and valence. Changes in electroencephalographic (EEG), heart rate variability, and electrodermal measures were used to differentiate the 4 emotions. EEG activity separates pleasantness from unpleasantness only in the aroused but not in the relaxed domain, where electrodermal parameters are the differentiating ones. All three classes of parameters contribute to a separation between arousal and relaxation in the positive valence domain, whereas the latency of the electrodermal response is the only differentiating parameter in the negative domain. We discuss how such a psychophysiological approach may be incorporated into a systemic model of a computer responsive to affective communication from the user.  相似文献   

7.
Visual complexity is an apparent feature in website design yet its effects on cognitive and emotional processing are not well understood. The current study examined website complexity within the framework of aesthetic theory and psychophysiological research on cognition and emotion. We hypothesized that increasing the complexity of websites would have a detrimental cognitive and emotional impact on users. In a passive viewing task (PVT) 36 website screenshots differing in their degree of complexity (operationalized by JPEG file size; correlation with complexity ratings in a preliminary study r=.80) were presented to 48 participants in randomized order. Additionally, a standardized visual search task (VST) assessing reaction times, and a one-week-delayed recognition task on these websites were conducted and participants rated all websites for arousal and valence. Psychophysiological responses were assessed during the PVT and VST. Visual complexity was related to increased experienced arousal, more negative valence appraisal, decreased heart rate, and increased facial muscle tension (musculus corrugator). Visual complexity resulted in increased reaction times in the VST and decreased recognition rates. Reaction times in the VST were related to increases in heart rate and electrodermal activity. These findings demonstrate that visual complexity of websites has multiple effects on human cognition and emotion, including experienced pleasure and arousal, facial expression, autonomic nervous system activation, task performance, and memory. It should thus be considered an important factor in website design.  相似文献   

8.
Different physiological signals are of different origins and may describe different functions of the human body. This paper studied respiration (RSP) signals alone to figure out its ability in detecting psychological activity. A deep learning framework is proposed to extract and recognize emotional information of respiration. An arousal-valence theory helps recognize emotions by mapping emotions into a two-dimension space. The deep learning framework includes a sparse auto-encoder (SAE) to extract emotion-related features, and two logistic regression with one for arousal classification and the other for valence classification. For the development of this work an international database for emotion classification known as Dataset for Emotion Analysis using Physiological signals (DEAP) is adopted for model establishment. To further evaluate the proposed method on other people, after model establishment, we used the affection database established by Augsburg University in Germany. The accuracies for valence and arousal classification on DEAP are 73.06% and 80.78% respectively, and the mean accuracy on Augsburg dataset is 80.22%. This study demonstrates the potential to use respiration collected from wearable deices to recognize human emotions.  相似文献   

9.
Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is currently one of the most challenging tasks in the field of affective computing. This article presents our recent advances in assessing dimensional representations of emotion, such as arousal, expectation, power, and valence, in an audiovisual human–computer interaction scenario. Building on previous studies which demonstrate that long-range context modeling tends to increase accuracies of emotion recognition, we propose a fully automatic audiovisual recognition approach based on Long Short-Term Memory (LSTM) modeling of word-level audio and video features. LSTM networks are able to incorporate knowledge about how emotions typically evolve over time so that the inferred emotion estimates are produced under consideration of an optimal amount of context. Extensive evaluations on the Audiovisual Sub-Challenge of the 2011 Audio/Visual Emotion Challenge show how acoustic, linguistic, and visual features contribute to the recognition of different affective dimensions as annotated in the SEMAINE database. We apply the same acoustic features as used in the challenge baseline system whereas visual features are computed via a novel facial movement feature extractor. Comparing our results with the recognition scores of all Audiovisual Sub-Challenge participants, we find that the proposed LSTM-based technique leads to the best average recognition performance that has been reported for this task so far.  相似文献   

10.
11.
Remote communication between people typically relies on audio and vision although current mobile devices are increasingly based on detecting different touch gestures such as swiping. These gestures could be adapted to interpersonal communication by using tactile technology capable of producing touch stimulation to a user's hand. It has been suggested that such mediated social touch would allow for new forms of emotional communication. The aim was to study whether vibrotactile stimulation that imitates human touch can convey intended emotions from one person to another. For this purpose, devices were used that converted touch gestures of squeeze and finger touch to vibrotactile stimulation. When one user squeezed his device or touched it with finger(s), another user felt corresponding vibrotactile stimulation on her device via four vibrating actuators. In an experiment, participant dyads comprising a sender and receiver were to communicate variations in the affective dimensions of valence and arousal using the devices. The sender's task was to create stimulation that would convey unpleasant, pleasant, relaxed, or aroused emotional intention to the receiver. Both the sender and receiver rated the stimulation using scales for valence and arousal so that the match between sender's intended emotions and receiver's interpretations could be measured. The results showed that squeeze was better at communicating unpleasant and aroused emotional intention, while finger touch was better at communicating pleasant and relaxed emotional intention. The results can be used in developing technology that enables people to communicate via touch by choosing touch gesture that matches the desired emotion.  相似文献   

12.
In this paper, we suggest a new approach of genetic programming for music emotion classification. Our approach is based on Thayer’s arousal-valence plane which is one of representative human emotion models. Thayer’s plane which says human emotions is determined by the psychological arousal and valence. We map music pieces onto the arousal-valence plane, and classify the music emotion in that space. We extract 85 acoustic features from music signals, rank those by the information gain and choose the top k best features in the feature selection process. In order to map music pieces in the feature space onto the arousal-valence space, we apply genetic programming. The genetic programming is designed for finding an optimal formula which maps given music pieces to the arousal-valence space so that music emotions are effectively classified. k-NN and SVM methods which are widely used in classification are used for the classification of music emotions in the arousal-valence space. For verifying our method, we compare with other six existing methods on the same music data set. With this experiment, we confirm the proposed method is superior to others.  相似文献   

13.
Finding pieces with a similar emotional distribution throughout the same composition was the aim of this work. A comparative analysis of musical performances by using emotion tracking was proposed. A dimensional approach of dynamic music emotion recognition was used in the analysis. Music data annotation and regressor training were done. Values of arousal and valence, predicted by regressors, were used to compare performances. The obtained results confirm the validity of the assumption that tracking and analyzing the values of arousal and valence over time in different performances of the same composition can be used to indicate their similarities. Detailed results of analyzing different performances of Prelude No.1 by Frédéric Chopin were presented. They enabled to find the most similar performances to the performance by Arthur Rubinstein, for example. The author found which performances of the same composition were closer to each other and which were quite distant in terms of the shaping of arousal and valence over time. The presented method gives access to knowledge on the shaping of emotions by a performer, which had previously been available only to music professionals.  相似文献   

14.
情绪是一种大脑产生的主观认知的概括。脑信号解码技术可以以一种较客观的方式来有效地研究人的情绪及其相关认知行为。本文提出了一种基于图注意力网络的脑电情绪识别方法(multi-path graph attention networks, MPGAT),该方法通过对脑电信号通道建图,利用卷积层提取脑电信号的时域特征以及各频带的特征,使用图注意力网络进一步捕捉情绪脑电信号的局部特征以及各脑区之间的内在功能关系,进而构建出更好的脑电信号表征。MPGAT在SEED和SEED-IV数据集的跨被试情绪识别平均准确率分别为86.03%、72.71%,在DREAMER数据集的效价(valence)和唤醒(arousal)维度的跨被试平均准确率分别为76.35%和75.46%,达到并部分超过了目前最先进脑电情绪识别方法的性能。本文所提出的脑电信号处理方法有望为情绪认知科学研究与情绪脑机接口系统提供新的技术手段。  相似文献   

15.
This research applies an innovative way to measure and identify user’s emotion with different ingredient color. How to find an intuitive way to understand human emotion is the key point in this research. The RGB color system that is widely used of all forms computer system is an accumulative color system in which red, green, and blue light are added together showing entire color. This study was based on Thayer’s emotion model which classifies the emotions with two vectors, valence and arousal, and gathers the emotion color with RGB as input for calculating and forecasting user’s emotion. In this experiment, using 320 data divide to quarter into emotion groups to train the weight in the neural network and uses 160 data to prove the accuracy. The result reveals that this model can be valid reckon the emotion by reply color response from user. In other hand, this experiment found that trend of the different ingredient of color on Cartesian coordinate system figures out the distinguishing intensity in RGB color system. Via the foregoing detect emotion model is going to design an affective computing intelligence framework try to embed the emotion component in it.  相似文献   

16.
Traditionally, emotion recognition is performed in response to stimuli that engage either one (vision: image or hearing: audio) or two (vision and hearing: video) human senses. An immersive environment can be generated by engaging more than two human senses while interacting with multimedia content and is known as MULtiple SEnsorial media (mulsemedia). This study aims to create a new dataset of multimodal physiological signals to recognize emotions in response to such content. To this end, four multimedia clips are selected and synchronized with fan, heater, olfaction dispenser, and haptic vest to augment cold air, hot air, olfaction, and haptic effects respectively. Furthermore, physiological responses including electroencephalography (EEG), galvanic skin response (GSR), and photoplethysmography (PPG) are observed to analyze human emotional responses while experiencing mulsemedia content. A t-test applied using arousal and valence scores show that engaging more than two human senses evokes significantly different emotions. Statistical tests on EEG, GSR, and PPG responses also show a significant difference between multimedia and mulsemedia content. Classification accuracy of 85.18% and 76.54% is achieved for valence and arousal, respectively, using K-nearest neighbor classifier and feature-level fusion strategy.  相似文献   

17.
针对PAD(愉悦度、激活度、优势度)预测精度问题,提出将最小二乘支持向量机(least squares support vector machine,LSSVM)经粒子群优化(particle swarm optimization,PSO)算法优化再与情感聚类分析结合的聚类PSO-LSSVM模型。对TYUT2.0和柏林语音库的三种情感语音提取情感特征,基于特征与标注的P、A、D对三种单一情感分别建立各类情感维度PSO-LSSVM模型以及对三种情感建立混合情感维度PSO-LSSVM模型;然后利用混合情感维度PSO-LSSVM模型预测P、A、D,并计算其与基本情感PAD的距离;最后将距离大于阈值的情感聚类为混合情感,将距离小于阈值的情感聚类为与其距离最近的情感,并利用对应情感的回归模型预测其P、A、D。研究显示,该模型对P、A、D的预测误差较LSSVM和PSO-LSSVM模型更小,且预测值与标注值的相关性更强,说明聚类PSO-LSSVM模型对P、A、D的预测更加可靠、准确。  相似文献   

18.
Representation of facial expressions using continuous dimensions has shown to be inherently more expressive and psychologically meaningful than using categorized emotions, and thus has gained increasing attention over recent years. Many sub-problems have arisen in this new field that remain only partially understood. A comparison of the regression performance of different texture and geometric features and the investigation of the correlations between continuous dimensional axes and basic categorized emotions are two of these. This paper presents empirical studies addressing these problems, and it reports results from an evaluation of different methods for detecting spontaneous facial expressions within the arousal–valence (AV) dimensional space. The evaluation compares the performance of texture features (SIFT, Gabor, LBP) against geometric features (FAP-based distances), and the fusion of the two. It also compares the prediction of arousal and valence, obtained using the best fusion method, to the corresponding ground truths. Spatial distribution, shift, similarity, and correlation are considered for the six basic categorized emotions (i.e. anger, disgust, fear, happiness, sadness, surprise). Using the NVIE database, results show that the fusion of LBP and FAP features performs the best. The results from the NVIE and FEEDTUM databases reveal novel findings about the correlations of arousal and valence dimensions to each of six basic emotion categories.  相似文献   

19.
Research was conducted to develop a methodology to model the emotional content of music as a function of time and musical features. Emotion is quantified using the dimensions valence and arousal, and system-identification techniques are used to create the models. Results demonstrate that system identification provides a means to generalize the emotional content for a genre of music. The average R2 statistic of a valid linear model structure is 21.9% for valence and 78.4% for arousal. The proposed method of constructing models of emotional content generalizes previous time-series models and removes ambiguity from classifiers of emotion.  相似文献   

20.
A Regression Approach to Music Emotion Recognition   总被引:3,自引:0,他引:3  
Content-based retrieval has emerged in the face of content explosion as a promising approach to information access. In this paper, we focus on the challenging issue of recognizing the emotion content of music signals, or music emotion recognition (MER). Specifically, we formulate MER as a regression problem to predict the arousal and valence values (AV values) of each music sample directly. Associated with the AV values, each music sample becomes a point in the arousal-valence plane, so the users can efficiently retrieve the music sample by specifying a desired point in the emotion plane. Because no categorical taxonomy is used, the regression approach is free of the ambiguity inherent to conventional categorical approaches. To improve the performance, we apply principal component analysis to reduce the correlation between arousal and valence, and RReliefF to select important features. An extensive performance study is conducted to evaluate the accuracy of the regression approach for predicting AV values. The best performance evaluated in terms of the R 2 statistics reaches 58.3% for arousal and 28.1% for valence by employing support vector machine as the regressor. We also apply the regression approach to detect the emotion variation within a music selection and find the prediction accuracy superior to existing works. A group-wise MER scheme is also developed to address the subjectivity issue of emotion perception.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号