首页 | 本学科首页   官方微博 | 高级检索  
     

融入词集合信息的跨境民族文化实体识别方法
引用本文:杨振平,毛存礼,雷雄丽,高盛祥,陆杉,张勇丙. 融入词集合信息的跨境民族文化实体识别方法[J]. 中文信息学报, 2022, 36(10): 88-96
作者姓名:杨振平  毛存礼  雷雄丽  高盛祥  陆杉  张勇丙
作者单位:1.昆明理工大学 信息工程与自动化学院,云南 昆明 650500;
2.昆明理工大学 云南省人工智能重点实验室,云南 昆明 650500;
3.昆明冶金高等专科学校,云南 昆明 650500
基金项目:国家自然科学基金(61732005,61866019,61761026,61972186);云南省应用基础研究计划重点项目(2019FA023);云南特色产业数字化研究与应用示范(202002AD080001);云南省中青年学术和技术带头人后备人才项目(2019HB006)
摘    要:跨境民族文化领域实体通常由描述民族文化特征的领域词汇组合构成,使用当前主流的基于字符表征的实体识别方法会面临领域实体边界模糊问题,造成实体识别错误。为此,该文提出一种融入词集合信息的跨境民族文化实体识别方法,利用领域词典获取的词集合增强领域实体的词边界和词语义信息。首先,构建跨境民族文化领域词典,用于获取词集合信息;其次,通过词集合注意力机制获取词集合向量之间的权重,并融入位置编码增强词集合位置信息;最后,在特征提取层融入词集合信息,增强领域实体边界信息并缓解仅使用字符特征表示所带来的词语义缺失问题。实验结果表明,在跨境民族文化文本数据集上所提出方法相比于基线方法的F1值提升了2.71%。

关 键 词:跨境民族文化  实体识别  词集合信息  领域词典  注意力机制
收稿时间:2021-11-01

Cross-border National Cultural Entity Recognition Method with Word Set Information
YANG Zhenping,MAO Cunli,LEI Xiongli,GAO Shengxiang,LU Shan,ZHANG Yongbing. Cross-border National Cultural Entity Recognition Method with Word Set Information[J]. Journal of Chinese Information Processing, 2022, 36(10): 88-96
Authors:YANG Zhenping  MAO Cunli  LEI Xiongli  GAO Shengxiang  LU Shan  ZHANG Yongbing
Affiliation:1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan 650500, China;3.Kunming Metallurgical College, Kunming, Yunnan 650500, China
Abstract:Cross-border national cultural entities are usually composed of domain words that describe national cultural characteristics. This paper proposes a cross-border national cultural entity recognition method with word set information obtained from domain lexicon. Firstly, a cross-border national cultural domain lexicon is constructed to obtain the word set information. Secondly, the weight between the word set vectors is obtained through attention mechanism, and the positional encoding is adopted. Finally, the word set information is incorporated into the feature extraction layer to enhance the domain entity boundary information and alleviate the problem of word information loss caused by using only character features. Experimental results show that, the F1 value of the proposed method is improved by 2.71% compared with the baseline method.
Keywords:cross-border national culture    entity recognition    word set information    domain lexicon    attention mechanism  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号