首页 | 本学科首页   官方微博 | 高级检索  
     

图文语义增强的多模态命名实体识别方法
引用本文:徐玺,王海荣,王彤,马赫.图文语义增强的多模态命名实体识别方法[J].计算机应用研究,2024,41(6).
作者姓名:徐玺  王海荣  王彤  马赫
作者单位:北方民族大学 计算机科学与工程学院,北方民族大学 计算机科学与工程学院,北方民族大学 计算机科学与工程学院,北方民族大学 计算机科学与工程学院
基金项目:宁夏自然科学基金资助项目(2023AAC03316);北方民族大学中央高校基本科研业务费专项资金资助项目(2022PT_S04)
摘    要:为了解决多模态命名实体识别方法中存在的图文语义缺失、多模态表征语义不明确等问题,提出了一种图文语义增强的多模态命名实体识别方法。其中,利用多种预训练模型分别提取文本特征、字符特征、区域视觉特征、图像关键字和视觉标签,以全面描述图文数据的语义信息;采用Transformer和跨模态注意力机制,挖掘图文特征间的互补语义关系,以引导特征融合,从而生成语义补全的文本表征和语义增强的多模态表征;整合边界检测、实体类别检测和命名实体识别任务,构建了多任务标签解码器,该解码器能对输入特征进行细粒度语义解码,以提高预测特征的语义准确性;使用这个解码器对文本表征和多模态表征进行联合解码,以获得全局最优的预测标签。在Twitter-2015和Twitter-2017基准数据集的大量实验结果显示,该方法在平均F1值上分别提升了1.00%和1.41%,表明该模型具有较强的命名实体识别能力。

关 键 词:多模态命名实体识别    多模态表示    多模态融合    多任务学习    命名实体识别
收稿时间:2023/9/21 0:00:00
修稿时间:2024/5/9 0:00:00

Textual-visual semantics-enhanced multimodal named entity recognition method
Xu Xi,Wang HaiRong,Wang Tong and Ma He.Textual-visual semantics-enhanced multimodal named entity recognition method[J].Application Research of Computers,2024,41(6).
Authors:Xu Xi  Wang HaiRong  Wang Tong and Ma He
Affiliation:College of Computer Science and Engineering, North Minzu University,,,
Abstract:To address the issues of missing textual-visual semantics and unclear multimodal representation semantics in multimodal named entity recognition methods, this paper proposed a method of textual-visual semantic enhancement for multimodal named entity recognition. In this method, it used various pre-trained models to extract text features, character features, regional visual features, image keywords, and visual labels, in order to comprehensively describe the semantic information of image-text data. It adopted the Transformer and cross-modal attention mechanism to mine the complementary semantic relationships between image-text features, guiding feature fusion, thereby generating semantically complete text representations and semantically enhanced multimodal representations. By integrating boundary detection, entity type detection, and named entity recognition tasks, it constructed a multi-task label decoder. This decoder could perform fine-grained semantic decoding of input features to improve the semantic accuracy of predicted features. It used this decoder to jointly decode text representations and multimodal representations to obtain globally optimal predicted labels. A large number of experimental results on the Twitter-2015 and Twitter-2017 benchmark datasets show that this method has increased the average F1 score by 1.00% and 1.41% respectively, this indicates that the model has a strong capability for named entity recognition.
Keywords:multimodal named entity recognition  multimodal representation  multimodal fusion  multi-task learning  named entity recognition
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号