首页 | 本学科首页   官方微博 | 高级检索  
     

融合多模态数据的小样本命名实体识别方法
引用本文:张天明,张杉,刘曦,曹斌,范菁.融合多模态数据的小样本命名实体识别方法[J].软件学报,2024,35(3):1107-1124.
作者姓名:张天明  张杉  刘曦  曹斌  范菁
作者单位:浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023
基金项目:国家自然科学基金(62276233、62302451);浙江省自然科学基金项目(LQ22F020018);浙江省重点研发项目(2023C01048)
摘    要:作为自然语言处理领域的关键子任务,命名实体识别通过提取文本中的关键信息,帮助机器翻译、文本生成、知识图谱构建以及多模态数据融合等许多下游任务深度理解文本蕴含的复杂语义信息,有效地完成任务.在实际生活中,由于时间和人力等成本问题,命名实体识别任务常常受限于标注样本的稀缺.尽管基于文本的小样本命名实体识别方法已取得较好的泛化表现,但由于样本量有限,使得模型能提取的语义信息也十分受限,进而导致模型预测效果依然不佳.针对标注样本稀缺给基于文本的小样本命名实体识别方法带来的挑战,提出了一种融合多模态数据的小样本命名实体识别模型,借助多模态数据提供额外语义信息,帮助模型提升预测效果,进而可以有效提升多模态数据融合、建模效果.该方法将图像信息转化为文本信息作为辅助模态信息,有效地解决了由文本与图像蕴含语义信息粒度不一致导致的模态对齐效果不佳的问题.为了有效地考虑实体识别中的标签依赖关系,使用CRF框架并使用最先进的元学习方法分别作为发射模块和转移模块.为了缓解辅助模态中的噪声样本对模型的负面影响,提出一种基于元学习的通用去噪网络.该去噪网络在数据量十分有限的情况下,依然可以有效地评估辅助模态中不同样...

关 键 词:命名实体识别  多模态数据  小样本学习  元学习  去噪网络
收稿时间:2023/7/15 0:00:00
修稿时间:2023/9/5 0:00:00

Multimodal data Fusion for Few-shot Named Entity Recognition Method
ZHANG Tian-Ming,ZHANG Shan,LIU Xi,CAO Bin,FAN Jing.Multimodal data Fusion for Few-shot Named Entity Recognition Method[J].Journal of Software,2024,35(3):1107-1124.
Authors:ZHANG Tian-Ming  ZHANG Shan  LIU Xi  CAO Bin  FAN Jing
Affiliation:College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Abstract:As a crucial subtask in natural language processing (NLP), named entity recognition (NER) aims to extract the import information from text, which can help many downstream tasks such as machine translation, text generation, knowledge graph construction, and multi-modal data fusion to deeply understand the complex semantic information of the text and effectively complete these tasks. In practice, due to time and labor costs, NER suffers from annotated data scarcity, known as few-shot NER. Although few-shot NER methods based on text have achieved good generalization performance, the semantic information that the model can extract is still limited due to the few samples, which leads to the poor prediction effect of the model.To this end, this paper proposes a few-shot NER model based on the multi-modal dataset fusion, which provides additional semantic information with multi-modal data for the first time, to help the model prediction and can further effectively improve the effect of multimodal data fusion and modeling.This method converts image information into text information as auxiliary modality information, which effectively solves the problem of poor modality alignment caused by the inconsistent granularity of semantic information contained in text and images.In order to effectively consider the label dependencies in few-shot NER, this paper uses the CRF framework and introduces the state-of-the-art meta-learning methods as the emission module and the transition module, respectively. To alleviate the negative impact of noisy samples in the auxiliary modal samples, this paper proposes a general denoising network based on the idea of meta-learning. The denoising network can measure the variability of the samples and evaluate the beneficial extent of each sample to the model. Finally, this paper conducts extensive experiments on real unimodal and multimodal data sets. The experimental results show the outstanding generalization performance of the proposed method, where our method outperforms the state-of-the-art methods by 10 F1 scores in the 1-shot setting.
Keywords:named entity recognition  multi-modal data  few-shot learning  meta learning  denoising network
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号