首页 | 本学科首页   官方微博 | 高级检索  
     

基于注意力的短视频多模态情感分析
作者姓名:黄欢  孙力娟  曹莹  郭剑  任恒毅
作者单位:南京邮电大学计算机学院,江苏南京210003;南京邮电大学计算机学院,江苏南京210003;南京邮电大学江苏省无线传感网高技术重点实验室,江苏南京210003;河南大学计算机与信息工程学院,河南开封475001
基金项目:国家自然科学基金项目(61873131,61702284);安徽省科技厅面上项目(1908085MF207);江苏省博士后研究基金项目(2018K009B)。
摘    要:针对现有的情感分析方法缺乏对短视频中信息的充分考虑,从而导致不恰当的情感分析结果.基于音视频的多模态情感分析(AV-MSA)模型便由此产生,模型通过利用视频帧图像中的视觉特征和音频信息来完成短视频的情感分析.模型分为视觉与音频2分支,音频分支采用卷积神经网络(CNN)架构来提取音频图谱中的情感特征,实现情感分析的目的;...

关 键 词:多模态情感分析  残差网络  3D卷积神经网络  注意力  决策融合

Multimodal sentiment analysis of short videos based on attention
Authors:HUANG Huan  SUN Li-juan  CAO Ying  GUO Jian  REN Heng-yi
Affiliation:(1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China;2. Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China;3. College of Computer and Information Engineering, Henan University, Kaifeng Henan 475001, China)
Abstract:The existing sentiment analysis methods lack sufficient consideration of information in short videos,leading to inappropriate sentiment analysis results.Based on this,we proposed the audio-visual multimodal sentiment analysis(AV-MSA)model that can complete the sentiment analysis of short videos using visual features in frame images and audio information in videos.The model was divided into two branches,namely the visual branch and the audio branch.In the audio branch,the convolutional neural networks(CNN)architecture was employed to extract the emotional features in the audio atlas to achieve the purpose of sentiment analysis;in the visual branch,we utilized the 3D convolution operation to increase the temporal correlation of visual features.In addition,on the basis of ResNet,in order to highlight the emotion-related features,we added an attention mechanism to enhance the sensitivity of the model to information features.Finally,a cross-voting mechanism was designed to fuse the results of the visual and audio branches to produce the final result of sentiment analysis.The proposed AV-MSA was evaluated on IEMOCAP and Weibo audio-visual(Weibo audio-visual,WB-AV)datasets.Experimental results show that compared with the current short video sentiment analysis methods,the proposed AV-MSA has improved the classification accuracy greatly.
Keywords:multimodal sentiment analysis  ResNet  3D convolutional neural networks  attention  decision fusion
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号