首页 | 本学科首页   官方微博 | 高级检索  
     

基于注意力融合网络的方面级多模态情感分类
引用本文:冼广铭,招志锋,阳先平. 基于注意力融合网络的方面级多模态情感分类[J]. 计算机系统应用, 2024, 33(2): 94-104
作者姓名:冼广铭  招志锋  阳先平
作者单位:华南师范大学 软件学院, 佛山 528225
基金项目:国家自然科学基金(61070015)
摘    要:方面级多模态情感分类任务的一个关键是从文本和视觉两种不同模态中准确地提取和融合互补信息, 以检测文本中提及的方面词的情感倾向. 现有的方法大多数只利用单一的上下文信息结合图片信息来分析, 存在对方面和上下文信息、视觉信息的相关性的识别不敏感, 对视觉中的方面相关信息的局部提取不够精准等问题, 此外, 在进行特征融合时, 部分模态信息不全会导致融合效果一般. 针对上述问题, 本文提出一种注意力融合网络AF-Net模型去进行方面级多模态情感分类, 利用空间变换网络STN学习图像中目标的位置信息来帮助提取重要的局部特征; 利用基于Transformer的交互网络对方面和文本以及图像之间的关系进行建模, 实现多模态交互; 同时补充了不同模态特征间的相似信息以及使用多头注意力机制融合多特征信息, 表征出多模态信息, 最后通过Softmax层取得情感分类的结果. 在两个基准数据集上进行实验和对比, 结果表明AF-Net能获得较好的性能, 提升方面级多模态情感分类的效果.

关 键 词:多模态  情感分类  空间变换网络  交互网络  相似信息  注意力融合网络
收稿时间:2023-08-01
修稿时间:2023-09-01

Aspect-level Multimodal Sentiment Classification Based on Attention Fusion Network
XIAN Guang-Ming,ZHAO Zhi-Feng,YANG Xian-Ping. Aspect-level Multimodal Sentiment Classification Based on Attention Fusion Network[J]. Computer Systems& Applications, 2024, 33(2): 94-104
Authors:XIAN Guang-Ming  ZHAO Zhi-Feng  YANG Xian-Ping
Affiliation:School of Software, South China Normal University, Foshan 528225, China
Abstract:One of the key tasks of aspect-level multimodal sentiment classification is to accurately extract and fuse complementary information from two different modals of text and vision, so as to detect the sentiment orientation of the aspect words mentioned in the text. Most of the existing methods only use single context information combined with image information for analysis, revealing the problems such as insensitive to the recognition of the correlation between aspect-, context- and visual-information, and imprecise in local extraction of aspect-related information in vision. In addition, when performing feature fusion, insufficient partial modal information will lead to mediocre fusion effect. To solve the above problems, an attention fusion network AF-Net model is proposed to perform aspect-level multimodal sentiment classification in this study. The spatial transformation network (STN) is used to learn the location information of objects in images to help extract important local features. The Transformer based interaction network is used to model the relationship between aspects, texts and images, and realize multi-modal interaction. At the same time, the similar information between different modal features is supplemented and the multi-feature information is fused by multi-attention mechanism to represent the multi-modal information. Finally, the result of sentiment classification is obtained through Softmax layer. Experiments and comparisons carried out on the two benchmark datasets show that AF-Net can achieve better performance and improve the effect of aspect-level multimodal sentiment classification.
Keywords:multimodal  sentiment classification  spatial transformation network (STN)  interaction network  similar information  attention fusion network
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号