首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
情绪识别作为人机交互的热门领域,其技术已经被应用于医学、教育、安全驾驶、电子商务等领域.情绪主要由面部表情、声音、话语等进行表达,不同情绪表达时的面部肌肉、语气、语调等特征也不相同,使用单一模态特征确定的情绪的不准确性偏高,考虑到情绪表达主要通过视觉和听觉进行感知,本文提出了一种基于视听觉感知系统的多模态表情识别算法,分别从语音和图像模态出发,提取两种模态的情感特征,并设计多个分类器为单特征进行情绪分类实验,得到多个基于单特征的表情识别模型.在语音和图像的多模态实验中,提出了晚期融合策略进行特征融合,考虑到不同模型间的弱依赖性,采用加权投票法进行模型融合,得到基于多个单特征模型的融合表情识别模型.本文使用AFEW数据集进行实验,通过对比融合表情识别模型与单特征的表情识别模型的识别结果,验证了基于视听觉感知系统的多模态情感识别效果要优于基于单模态的识别效果.  相似文献   

2.
As technology advances, robots and virtual agents will be introduced into the home and healthcare settings to assist individuals, both young and old, with everyday living tasks. Understanding how users recognize an agent׳s social cues is therefore imperative, especially in social interactions. Facial expression, in particular, is one of the most common non-verbal cues used to display and communicate emotion in on-screen agents (Cassell et al., 2000). Age is important to consider because age-related differences in emotion recognition of human facial expression have been supported (Ruffman et al., 2008), with older adults showing a deficit for recognition of negative facial expressions. Previous work has shown that younger adults can effectively recognize facial emotions displayed by agents (Bartneck and Reichenbach, 2005, Courgeon et al., 2009, Courgeon et al., 2011, Breazeal, 2003); however, little research has compared in-depth younger and older adults’ ability to label a virtual agent׳s facial emotions, an import consideration because social agents will be required to interact with users of varying ages. If such age-related differences exist for recognition of virtual agent facial expressions, we aim to understand if those age-related differences are influenced by the intensity of the emotion, dynamic formation of emotion (i.e., a neutral expression developing into an expression of emotion through motion), or the type of virtual character differing by human-likeness. Study 1 investigated the relationship between age-related differences, the implication of dynamic formation of emotion, and the role of emotion intensity in emotion recognition of the facial expressions of a virtual agent (iCat). Study 2 examined age-related differences in recognition expressed by three types of virtual characters differing by human-likeness (non-humanoid iCat, synthetic human, and human). Study 2 also investigated the role of configural and featural processing as a possible explanation for age-related differences in emotion recognition. First, our findings show age-related differences in the recognition of emotions expressed by a virtual agent, with older adults showing lower recognition for the emotions of anger, disgust, fear, happiness, sadness, and neutral. These age-related difference might be explained by older adults having difficulty discriminating similarity in configural arrangement of facial features for certain emotions; for example, older adults often mislabeled the similar emotions of fear as surprise. Second, our results did not provide evidence for the dynamic formation improving emotion recognition; but, in general, the intensity of the emotion improved recognition. Lastly, we learned that emotion recognition, for older and younger adults, differed by character type, from best to worst: human, synthetic human, and then iCat. Our findings provide guidance for design, as well as the development of a framework of age-related differences in emotion recognition.  相似文献   

3.
Human–human interaction consists of various nonverbal behaviors that are often emotion-related. To establish rapport, it is essential that the listener respond to reactive emotion in a way that makes sense given the speaker's emotional state. However, human–robot interactions generally fail in this regard because most spoken dialogue systems play only a question-answer role. Aiming for natural conversation, we examine an emotion processing module that consists of a user emotion recognition function and a reactive emotion expression function for a spoken dialogue system to improve human–robot interaction. For the emotion recognition function, we propose a method that combines valence from prosody and sentiment from text by decision-level fusion, which considerably improves the performance. Moreover, this method reduces fatal recognition errors, thereby improving the user experience. For the reactive emotion expression function, the system's emotion is divided into emotion category and emotion level, which are predicted using the parameters estimated by the recognition function on the basis of distributions inferred from human–human dialogue data. As a result, the emotion processing module can recognize the user's emotion from his/her speech, and expresses a reactive emotion that matches. Evaluation with ten participants demonstrated that the system enhanced by this module is effective to conduct natural conversation.  相似文献   

4.
为了解决声音和图像情感识别的不足,提出一种新的情感识别方式:触觉情感识别。对CoST(corpus of social touch)数据集进行了一系列触觉情感识别研究,对CoST数据集进行数据预处理,提出一些关于触觉情感识别的特征。利用极限学习机分类器探究不同手势下的情感识别,对14种手势下的3种情感(温柔、正常、暴躁)进行识别,准确度较高,且识别速度快识别时间短。结果表明,手势的不同会影响情感识别的准确率,其中手势“stroke”的识别效果在不同分类器下的分类精度均为最高,且有较好的分类精度,达到72.07%;极限学习机作为触觉情感识别的分类器,具有较好的分类效果,识别速度快;有的手势本身对应着某种情感,从而影响分类结果。  相似文献   

5.
多模态情感识别是当前情感计算研究领域的重要内容,针对人脸表情和动作姿态开展双模态情感识别研究,提出一种基于双边稀疏偏最小二乘的表情和姿态的双模态情感识别方法.首先,从视频图像系列中分别提取表情和姿态两种模态的空时特征作为情感特征矢量.然后,通过双边稀疏偏最小二乘(BSPLS)的数据降维方法来进一步提取两组模态中的情感特征,并组合成新的情感特征向量.最后,采用了两种分类器来进行情感的分类识别.以国际上广泛采用的FABO表情和姿态的双模态情感数据库为实验数据,并与多种子空间方法(主成分分析、典型相关分析、偏最小二乘回归)进行对比实验来评估本文方法的识别性能.实验结果表明,两种模态融合后相比单模态更加有效,双边稀疏偏最小二乘(BSPLS)算法在几种方法中得到最高的情感识别率.  相似文献   

6.
We present a system for facial expression recognition that is evaluated on multiple databases. Automated facial expression recognition systems face a number of characteristic challenges. Firstly, obtaining natural training data is difficult, especially for facial configurations expressing emotions like sadness or fear. Therefore, publicly available databases consist of acted facial expressions and are biased by the authors’ design decisions. Secondly, evaluating trained algorithms towards real-world behavior is challenging, again due to the artificial conditions in available image data. To tackle these challenges and since our goal is to train classifiers for an online system, we use several databases in our evaluation. Comparing classifiers across data-bases determines the classifiers capability to generalize more reliable than traditional self-classification.  相似文献   

7.
Psychological research findings suggest that humans rely on the combined visual channels of face and body more than any other channel when they make judgments about human communicative behavior. However, most of the existing systems attempting to analyze the human nonverbal behavior are mono-modal and focus only on the face. Research that aims to integrate gestures as an expression mean has only recently emerged. Accordingly, this paper presents an approach to automatic visual recognition of expressive face and upper-body gestures from video sequences suitable for use in a vision-based affective multi-modal framework. Face and body movements are captured simultaneously using two separate cameras. For each video sequence single expressive frames both from face and body are selected manually for analysis and recognition of emotions. Firstly, individual classifiers are trained from individual modalities. Secondly, we fuse facial expression and affective body gesture information at the feature and at the decision level. In the experiments performed, the emotion classification using the two modalities achieved a better recognition accuracy outperforming classification using the individual facial or bodily modality alone.  相似文献   

8.
This paper describes a novel structural approach to recognize the human facial features for emotion recognition. Conventionally, features extracted from facial images are represented by relatively poor representations, such as arrays or sequences, with a static data structure. In this study, we propose to extract facial expression features vectors as Localized Gabor Features (LGF) and then transform these feature vectors into FacE Emotion Tree Structures (FEETS) representation. It is an extension of the Human Face Tree Structures (HFTS) representation presented in (Cho and Wong in Lecture notes in computer science, pp 1245–1254, 2005). This facial representation is able to simulate as human perceiving the real human face and both the entities and relationship could contribute to the facial expression features. Moreover, a new structural connectionist architecture based on a probabilistic approach to adaptive processing of data structures is presented. The so-called probabilistic based recursive neural network (PRNN) model extended from Frasconi et al. (IEEE Trans Neural Netw 9:768–785, 1998) is developed to train and recognize human emotions by generalizing the FEETS representation. For empirical studies, we benchmarked our emotion recognition approach against other well known classifiers. Using the public domain databases, such as Japanese Female Facial Expression (JAFFE) (Lyons et al. in IEEE Trans Pattern Anal Mach Intell 21(12):1357–1362, 1999; Lyons et al. in third IEEE international conference on automatic face and gesture recognition, 1998) database and Cohn–Kanade AU-Coded Facial Expression (CMU) Database (Cohn et al. in 7th European conference on facial expression measurement and meaning, 1997), our proposed system might obtain an accuracy of about 85–95% for subject-dependent and subject-independent conditions. Moreover, by testing images having artifacts, the proposed model significantly supports the robust capability to perform facial emotion recognition.  相似文献   

9.
Extracting and understanding of emotion is of high importance for the interaction between human and machine communication systems. The most expressive way to display the human’s emotion is through facial expression analysis. This paper proposes a multiple emotion recognition system that can recognize combinations of up to a maximum of three different emotions using an active appearance model (AAM), the proposed classification standard, and a k-nearest neighbor (k-NN) classifier in mobile environments. AAM can take the expression of variations that are calculated by the proposed classification standard according to changes in human expressions in real time. The proposed k-NN can classify basic emotions (normal, happy, sad, angry, surprise) as well as more ambiguous emotions by combining the basic emotions in real time, and each recognized emotion that can be subdivided has strength. Whereas most previous methods of emotion recognition recognize various kind of a single emotion, this paper recognizes various emotions with a combination of the five basic emotions. To be easily understood, the recognized result is presented in three ways on a mobile camera screen. The result of the experiment was an average 85 % recognition rate and a 40 % performance showed optimized emotions. The implemented system can be represented by one of the example for augmented reality on displaying combination of real face video and virtual animation with user’s avatar.  相似文献   

10.
11.
根据情感的连续空间模型,提出一种改进的排序式选举算法,实现多个情感分类器的融合,取得了很好的情感识别效果。首先以隐马尔可夫模型(HMM)和人工神经网络(ANN)为基础,设计了三种分类器;然后用改进的排序式选举算法,实现对三种分类器的融合。分别利用普通话情感语音库和德语情感语音库进行实验,结果表明,与几种传统融合算法相比,改进的排序式选举法能够取得更好的融合效果,其识别性能明显优于单分类器。该算法不仅简单,而且可移植性好,可用于其他任意多个情感分类器的融合。  相似文献   

12.
基于情感识别的智能教学系统研究   总被引:1,自引:0,他引:1  
针对传统的智能教学系统(ITS)在情感方面的缺失,提出了基于情感识别技术的ITS模型.该系统模型在传统的教学系统上新增情感识别模块,利用人脸表情识别以及文本识别等技术所构建,可以获取和识别学生的学习情感,并根据学习情感进行相应的情感激励策略,实现情感化的教学.  相似文献   

13.
Development of approaches for reducing the prediction error has been an active research field in collaborative filtering recommender systems since the accuracy of the prediction plays a crucial role in user purchase preferences. Unlike the conventional collaborative filtering methods which directly use the computed user-to-user similarity values, this paper presents a genetic algorithm approach for refining them before using in the prediction process. The approach was found to yield promising results according to the statistical analysis performed on a variety numbers of neighbours for various similarity metrics including Pearson’s Correlation, Extended Jaccard Coefficient and Vector Cosine Similarity along with a metric that assigns random weights to be used as a benchmark. Results show that the evolutionary approach has significantly reduced the prediction error using the evolved weights and Vector Cosine Similarity has shown the best performance.  相似文献   

14.
The traditional collaborative filtering algorithm is a successful recommendation technology. The core idea of this algorithm is to calculate user or item similarity based on user ratings and then to predict ratings and recommend items based on similar users’ or similar items’ ratings. However, real applications face a problem of data sparsity because most users provide only a few ratings, such that the traditional collaborative filtering algorithm cannot produce satisfactory results. This paper proposes a new topic model-based similarity and two recommendation algorithms: user-based collaborative filtering with topic model algorithm (UCFTM, in this paper) and item-based collaborative filtering with topic model algorithm (ICFTM, in this paper). Each review is processed using the topic model to generate review topic allocations representing a user’s preference for a product’s different features. The UCFTM algorithm aggregates all topic allocations of reviews by the same user and calculates the user most valued features representing product features that the user most values. User similarity is calculated based on user most valued features, whereas ratings are predicted from similar users’ ratings. The ICFTM algorithm aggregates all topic allocations of reviews for the same product, and item most valued features representing the most valued features of the product are calculated. Item similarity is calculated based on item most valued features, whereas ratings are predicted from similar items’ ratings. Experiments on six data sets from Amazon indicate that when most users give only one review and one rating, our algorithms exhibit better prediction accuracy than other traditional collaborative filtering and state-of-the-art topic model-based recommendation algorithms.  相似文献   

15.
目的 针对当前视频情感判别方法大多仅依赖面部表情、而忽略了面部视频中潜藏的生理信号所包含的情感信息,本文提出一种基于面部表情和血容量脉冲(BVP)生理信号的双模态视频情感识别方法。方法 首先对视频进行预处理获取面部视频;然后对面部视频分别提取LBP-TOP和HOG-TOP两种时空表情特征,并利用视频颜色放大技术获取BVP生理信号,进而提取生理信号情感特征;接着将两种特征分别送入BP分类器训练分类模型;最后利用模糊积分进行决策层融合,得出情感识别结果。结果 在实验室自建面部视频情感库上进行实验,表情单模态和生理信号单模态的平均识别率分别为80%和63.75%,而融合后的情感识别结果为83.33%,高于融合前单一模态的情感识别精度,说明了本文融合双模态进行情感识别的有效性。结论 本文提出的双模态时空特征融合的情感识别方法更能充分地利用视频中的情感信息,有效增强了视频情感的分类性能,与类似的视频情感识别算法对比实验验证了本文方法的优越性。另外,基于模糊积分的决策层融合算法有效地降低了不可靠决策信息对融合的干扰,最终获得更优的识别精度。  相似文献   

16.
Automatic emotion recognition from speech signals is one of the important research areas, which adds value to machine intelligence. Pitch, duration, energy and Mel-frequency cepstral coefficients (MFCC) are the widely used features in the field of speech emotion recognition. A single classifier or a combination of classifiers is used to recognize emotions from the input features. The present work investigates the performance of the features of Autoregressive (AR) parameters, which include gain and reflection coefficients, in addition to the traditional linear prediction coefficients (LPC), to recognize emotions from speech signals. The classification performance of the features of AR parameters is studied using discriminant, k-nearest neighbor (KNN), Gaussian mixture model (GMM), back propagation artificial neural network (ANN) and support vector machine (SVM) classifiers and we find that the features of reflection coefficients recognize emotions better than the LPC. To improve the emotion recognition accuracy, we propose a class-specific multiple classifiers scheme, which is designed by multiple parallel classifiers, each of which is optimized to a class. Each classifier for an emotional class is built by a feature identified from a pool of features and a classifier identified from a pool of classifiers that optimize the recognition of the particular emotion. The outputs of the classifiers are combined by a decision level fusion technique. The experimental results show that the proposed scheme improves the emotion recognition accuracy. Further improvement in recognition accuracy is obtained when the scheme is built by including MFCC features in the pool of features.  相似文献   

17.
基于手机手势识别的媒体控制界面   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种能够识别通过手机示意的自然手势、进而控制媒体播放的通用型人机界面。用户通过挥动个人的手机表达操作意图,由手机内置三轴加速度传感器获取相应的手势数据,采用动态时间弯曲等多种算法对用户的手势进行识别,实现对多媒体播放的通用控制。实验结果表明,该界面对手机的几种通用手势均能获得较高的识别率,能在实际应用中对媒体进行简单、方便的控制。  相似文献   

18.
In this paper, we propose a gesture-based interface designed to interact with panoramic scenes. The system combines novel static gestures with a fast hand tracking method. Our proposal is to use static gestures as shortcuts to activate functionalities of the system (i.e. volume up/down, mute, pause, etc.), and hand tracking to freely explore the panoramic video. The overall system is multi-user, and incorporates a user identification module based on face recognition, which is able both to recognize returning users and to add new users online. The system exploits depth data, making it robust to challenging illumination conditions. We show through experimental results the performance of every component of the system compared to the state of the art. We also show the results of a usability study performed with several untrained users.  相似文献   

19.
20.
This article describes the validation study of our software that uses combined webcam and microphone data for real-time, continuous, unobtrusive emotion recognition as part of our FILTWAM framework. FILTWAM aims at deploying a real-time multimodal emotion recognition method for providing more adequate feedback to the learners through an online communication skills training. Herein, timely feedback is needed that reflects on the intended emotions they show and which is also useful to increase learners’ awareness of their own behavior. At least, a reliable and valid software interpretation of performed face and voice emotions is needed to warrant such adequate feedback. This validation study therefore calibrates our software. The study uses a multimodal fusion method. Twelve test persons performed computer-based tasks in which they were asked to mimic specific facial and vocal emotions. All test persons’ behavior was recorded on video and two raters independently scored the showed emotions, which were contrasted with the software recognition outcomes. A hybrid method for multimodal fusion of our multimodal software shows accuracy between 96.1% and 98.6% for the best-chosen WEKA classifiers over predicted emotions. The software fulfils its requirements of real-time data interpretation and reliable results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号