基于视听觉感知系统的多模态情感识别 Emotion Recognition Based on Visual and Audiovisual Perception System期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于视听觉感知系统的多模态情感识别

引用本文：	龙英潮,丁美荣,林桂锦,刘鸿业,曾碧卿. 基于视听觉感知系统的多模态情感识别[J]. 计算机系统应用, 2021, 30(12): 218-225. DOI: 10.15888/j.cnki.csa.008235

作者姓名：	龙英潮丁美荣林桂锦刘鸿业曾碧卿

作者单位：	华南师范大学软件学院, 佛山 528225

基金项目：	国家自然科学基金(61876067); 广东省普通高校人工智能重点领域专项(2019KZDZX1033); 广东省信息物理融合系统重点实验室建设专项(2020B1212060069)

摘要：	情绪识别作为人机交互的热门领域,其技术已经被应用于医学、教育、安全驾驶、电子商务等领域.情绪主要由面部表情、声音、话语等进行表达,不同情绪表达时的面部肌肉、语气、语调等特征也不相同,使用单一模态特征确定的情绪的不准确性偏高,考虑到情绪表达主要通过视觉和听觉进行感知,本文提出了一种基于视听觉感知系统的多模态表情识别算法,分别从语音和图像模态出发,提取两种模态的情感特征,并设计多个分类器为单特征进行情绪分类实验,得到多个基于单特征的表情识别模型.在语音和图像的多模态实验中,提出了晚期融合策略进行特征融合,考虑到不同模型间的弱依赖性,采用加权投票法进行模型融合,得到基于多个单特征模型的融合表情识别模型.本文使用AFEW数据集进行实验,通过对比融合表情识别模型与单特征的表情识别模型的识别结果,验证了基于视听觉感知系统的多模态情感识别效果要优于基于单模态的识别效果.
关键词：	情感识别模型融合多模态视听觉感知系统
收稿时间：	2021-03-05
修稿时间：	2021-04-07
Emotion Recognition Based on Visual and Audiovisual Perception System

LONG Ying-Chao,DING Mei-Rong,LIN Gui-Jin,LIU Hong-Ye,ZENG Bi-Qing. Emotion Recognition Based on Visual and Audiovisual Perception System[J]. Computer Systems& Applications, 2021, 30(12): 218-225. DOI: 10.15888/j.cnki.csa.008235

Authors:	LONG Ying-Chao DING Mei-Rong LIN Gui-Jin LIU Hong-Ye ZENG Bi-Qing

Affiliation:	School of Software, South China Normal University, Foshan 528225, China

Abstract:	As a hot spot of human-computer interaction, emotion recognition has been applied in many fields, such as medicine, education, safe driving and e-commerce. Emotions are mainly expressed by facial expression, voice, discourse and so on. Other characteristics such as facial muscles, mood and intonation vary when different kinds of emotions are expressed. Thus, the inaccuracy of emotions determined using a single modal feature is high. Considering that the expressed emotions are mainly perceived by vision and hearing, this study proposes a multimodal expression recognition algorithm based on an audiovisual perception system. Specifically, the emotion features of speech and image modalities are first extracted, and a plurality of classifiers are designed to perform emotion classification experiments for a single feature, from which multiple expression recognition models based on single features are obtained. In the multimodal experiments of speech and images, a late fusion strategy is put forward for feature fusion. Taking into account the weak dependence of different models, this work uses the weighted voting method for model fusion and obtains the integrated expression recognition model based on multiple single-feature models. The AFEW dataset is adopted for facial expression recognition in this study. The comparison of recognition results between the integrated model and the single-feature models for expression recognition verifies that the effect of multimodal emotion recognition based on the audiovisual perception system is better than that of single-modal emotion recognition.

Keywords:	emotion recognition model fusion multimodal audiovisual perception system
本文献已被万方数据等数据库收录！
	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏