首页 | 本学科首页   官方微博 | 高级检索  
     

基于感知重采样和多模态融合的连续情感识别
引用本文:李健,张倩,陈海丰,李晶,王丽燕.基于感知重采样和多模态融合的连续情感识别[J].计算机应用研究,2023,40(12).
作者姓名:李健  张倩  陈海丰  李晶  王丽燕
作者单位:陕西科技大学 电子信息与人工智能学院,陕西科技大学 电子信息与人工智能学院,陕西科技大学 电子信息与人工智能学院,陕西科技大学 电子信息与人工智能学院,陕西科技大学 文理学院
基金项目:陕西科技大学博士科研启动基金资助项目(126022325);陕西省自然科学基础研究计划资助项目(grant 2022JQ-662)
摘    要:情感识别在人机交互中发挥着重要的作用,连续情感识别因其能检测到更广泛更细微的情感而备受关注。在多模态连续情感识别中,针对现有方法获取的时序信息包含较多冗余以及多模态交互信息捕捉不全面的问题,提出基于感知重采样和多模态融合的连续情感识别方法。首先感知重采样模块通过非对称交叉注意力机制去除模态冗余信息,将包含时序关系的关键特征压缩到隐藏向量中,降低后期融合的计算复杂度。其次多模态融合模块通过交叉注意力机制捕捉模态间的互补信息,并利用自注意力机制获取模态内的隐藏信息,使特征信息更丰富全面。在Ulm-TSST和Aff-Wild2数据集上唤醒度和愉悦度的CCC均值分别为63.62%和50.09%,证明了该模型的有效性。

关 键 词:情感识别    感知重采样    多模态融合    注意力机制
收稿时间:2023/4/25 0:00:00
修稿时间:2023/11/14 0:00:00

Continuous emotion recognition based on perceiver resampling and multimodal fusion
Li Jian,Zhang Qian,Chen Haifeng,Li Jing and Wang Liyan.Continuous emotion recognition based on perceiver resampling and multimodal fusion[J].Application Research of Computers,2023,40(12).
Authors:Li Jian  Zhang Qian  Chen Haifeng  Li Jing and Wang Liyan
Abstract:Emotion recognition plays a crucial role in human-computer interaction, and continuous emotion recognition has gained significant attention due to its ability to capture a broader range of emotions, including more subtle ones. In the field of multimodal continuous emotion recognition, this paper proposed a continuous emotion recognition method based on perceiver resampling and multimodal fusion for the problems that the temporal series information obtained by the existing methods contains more redundancy and the obtained multimodal interactive information is not comprehensive. Firstly, the perceiver resampling module removed redundant information, focused on key information, compressed the key features with temporal relationships into hidden vectors, and reduced the computational complexity of the later fusion. Secondly, the multimodal fusion module captured the interactive information between modalities through cross-attention mechanism, and used the self-attention mechanism to obtain the hidden information within each modality, so as to make the feature information richer and more comprehensive. The mean CCC values of arousal and valence on the Ulm-TSST and Aff-Wild2 datasets are 63.62% and 50.09%, respectively, which prove the effectiveness of the model.
Keywords:emotion recognition  perceiver resampling  multimodal fusion  attention mechanism
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号