基于注意力的短视频多模态情感分析 Multimodal sentiment analysis of short videos based on attention期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于注意力的短视频多模态情感分析

作者姓名：	黄欢孙力娟曹莹郭剑任恒毅

作者单位：	南京邮电大学计算机学院,江苏南京210003;南京邮电大学计算机学院,江苏南京210003;南京邮电大学江苏省无线传感网高技术重点实验室,江苏南京210003;河南大学计算机与信息工程学院,河南开封475001

基金项目：	国家自然科学基金项目(61873131,61702284);安徽省科技厅面上项目(1908085MF207);江苏省博士后研究基金项目(2018K009B)。

摘要：	针对现有的情感分析方法缺乏对短视频中信息的充分考虑,从而导致不恰当的情感分析结果.基于音视频的多模态情感分析(AV-MSA)模型便由此产生,模型通过利用视频帧图像中的视觉特征和音频信息来完成短视频的情感分析.模型分为视觉与音频2分支,音频分支采用卷积神经网络(CNN)架构来提取音频图谱中的情感特征,实现情感分析的目的;...
关键词：	多模态情感分析残差网络 3D卷积神经网络注意力决策融合
Multimodal sentiment analysis of short videos based on attention

Authors:	HUANG Huan SUN Li-juan CAO Ying GUO Jian REN Heng-yi

Affiliation:	(1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China;2. Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China;3. College of Computer and Information Engineering, Henan University, Kaifeng Henan 475001, China)

Abstract:	The existing sentiment analysis methods lack sufficient consideration of information in short videos,leading to inappropriate sentiment analysis results.Based on this,we proposed the audio-visual multimodal sentiment analysis(AV-MSA)model that can complete the sentiment analysis of short videos using visual features in frame images and audio information in videos.The model was divided into two branches,namely the visual branch and the audio branch.In the audio branch,the convolutional neural networks(CNN)architecture was employed to extract the emotional features in the audio atlas to achieve the purpose of sentiment analysis;in the visual branch,we utilized the 3D convolution operation to increase the temporal correlation of visual features.In addition,on the basis of ResNet,in order to highlight the emotion-related features,we added an attention mechanism to enhance the sensitivity of the model to information features.Finally,a cross-voting mechanism was designed to fuse the results of the visual and audio branches to produce the final result of sentiment analysis.The proposed AV-MSA was evaluated on IEMOCAP and Weibo audio-visual(Weibo audio-visual,WB-AV)datasets.Experimental results show that compared with the current short video sentiment analysis methods,the proposed AV-MSA has improved the classification accuracy greatly.

Keywords:	multimodal sentiment analysis ResNet 3D convolutional neural networks attention decision fusion
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《》浏览原始摘要信息
	点击此处可从《》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏