融合动作特征的多模态情绪识别 Multimodal emotion recognition with action features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合动作特征的多模态情绪识别

作者姓名：	孙亚男温玉辉舒叶芷刘永进

作者单位：	清华大学计算机科学与技术系，北京 100084

基金项目：	清华大学自主科研计划(20211080093)；博士后面上资助(2021M701891)；国家自然科学基金(62202257，61725204)

摘要：	近年来，利用计算机技术实现基于多模态数据的情绪识别成为自然人机交互和人工智能领域重要的研究方向之一。利用视觉模态信息的情绪识别工作通常都将重点放在脸部特征上，很少考虑动作特征以及融合动作特征的多模态特征。虽然动作与情绪之间有着紧密的联系，但是从视觉模态中提取有效的动作信息用于情绪识别的难度较大。以动作与情绪的关系作为出发点，在经典的 MELD 多模态情绪识别数据集中引入视觉模态的动作数据，采用 ST-GCN 网络模型提取肢体动作特征，并利用该特征实现基于 LSTM 网络模型的单模态情绪识别。进一步在 MELD 数据集文本特征和音频特征的基础上引入肢体动作特征，提升了基于 LSTM 网络融合模型的多模态情绪识别准确率，并且结合文本特征和肢体动作特征提升了上下文记忆模型的文本单模态情绪识别准确率，实验显示虽然肢体动作特征用于单模态情绪识别的准确度无法超越传统的文本特征和音频特征，但是该特征对于多模态情绪识别具有重要作用。基于单模态和多模态特征的情绪识别实验验证了人体动作中含有情绪信息，利用肢体动作特征实现多模态情绪识别具有重要的发展潜力。
Multimodal emotion recognition with action features

Authors:	SUN Ya-nan WEN Yu-hui SHU Ye-zhi LIU Yong-jin

Affiliation:	Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Abstract:	In recent years, using knowledge of computer science to realize emotion recognition based on multimodal data has become an important research direction in the fields of natural human-computer interaction and artificial intelligence. The emotion recognition research using visual modality information usually focuses on facial features, rarely considering action features or multimodal features fused with action features. Although action has a close relationship with emotion, it is difficult to extract valid action information from the visual modality. In this paper, we started with the relationship between action and emotion, and introduced action data extracted from visual modality to classic multimodal emotion recognition dataset, MELD. The body action features were extracted based on ST-GCN model, and the action features were applied to the LSTM model-based single-modal emotion recognition task. In addition, body action features were introduced to bi-modal emotion recognition in MELD dataset, improving the performance of the fusion model based on the LSTM network. The combination of body action features and text features enhanced the recognition accuracy of the context model with pre-trained memory compared with that only using the text features. The results of the experiment show that although the accuracy of body action features for emotion recognition is not higher than those of traditional text features and audio features, body action features play an important role in the process of multimodal emotion recognition. The experiments on emotion recognition based on single-modal and multimodal features validate that people use actions to convey their emotions, and that using body action features for emotion recognition has great potential.

Keywords:	action features emotion recognition multimodality action and emotion visual modality

	点击此处可从《》浏览原始摘要信息
	点击此处可从《》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏