首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义先验知识与类型嵌入的复杂实体识别
引用本文:姜小波,何昆,阎广瑜.基于语义先验知识与类型嵌入的复杂实体识别[J].软件学报,2023,34(12):5649-5669.
作者姓名:姜小波  何昆  阎广瑜
作者单位:华南理工大学 电子与信息学院, 广东 广州 510641
基金项目:国家自然科学基金(U1801262); 广东省科技计划(2019B010154003)
摘    要:实体识别是信息抽取的关键任务.随着信息抽取技术的发展,研究人员从简单实体的识别转向复杂实体的识别.然而,复杂实体缺乏明显的特征且在句法结构与词性组成上更加复杂多样,给实体识别带来了巨大挑战.此外,现有模型广泛采用基于跨度的方法来识别嵌套实体,在实体边界检测方面呈现出模糊化,影响识别的性能.针对这些问题和挑战,提出了一种基于语义先验知识与类型嵌入的实体识别模型GIA-2DPE.该模型使用实体类别的关键词序列作为语义先验知识来提升对实体的认知,并通过类型嵌入捕获不同实体类型的潜在特征,然后通过门控交互注意力机制将先验知识与类型特征相融合以辅助复杂实体识别.另外,模型通过2D概率编码来预测实体边界,并利用边界特征和上下文特征来增强对边界的精准检测,从而提升嵌套实体的识别效果.在7个英文数据集和2个中文数据集上进行了广泛实验.结果表明, GIA-2DPE超越了目前最先进的模型;并且在ScienceIE数据集的实体识别任务中,相对基线F1分数取得了最高10.4%的提升.

关 键 词:信息抽取  复杂实体识别  门控交互注意力机制  2D概率编码
收稿时间:2021/12/2 0:00:00
修稿时间:2022/2/25 0:00:00

Complex Entity Recognition Based on Prior Semantic Knowledge and Type Embedding
JIANG Xiao-Bo,HE Kun,YAN Guang-Yu.Complex Entity Recognition Based on Prior Semantic Knowledge and Type Embedding[J].Journal of Software,2023,34(12):5649-5669.
Authors:JIANG Xiao-Bo  HE Kun  YAN Guang-Yu
Affiliation:School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China
Abstract:Entity recognition is a key task of information extraction. With the development of information extraction technology, researchers turn the research direction from the recognition of simple entities to the recognition of complex ones. Complex entities usually have no explicit features, and they are more complicated in syntactic constructions and parts of speech, which makes the recognition of complex entities a great challenge. In addition, existing models widely use span-based methods to identify nested entities. As a result, they always have an ambiguity in the detection of entity boundaries, which affects recognition performance. In response to the above challenge and problem, this study proposes an entity recognition model GIA-2DPE based on prior semantic knowledge and type embedding. The model uses keyword sequences of entity categories as prior semantic knowledge to improve the cognition of entities, utilizes type embedding to capture potential features of different entity types, and then combines prior knowledge with entity-type features through the gated interactive attention mechanism to assist in the recognition of complex entities. Moreover, the model uses 2D probability encoding to predict entity boundaries and combines boundary features and contextual features to enhance accurate boundary detection, thereby improving the performance of nested entity recognition. This study conducts extensive experiments on seven English datasets and two Chinese datasets. The results show that GIA-2DPE outperforms state-of-the-art models and achieves a 10.4% F1 boost compared with the baseline in entity recognition tasks on the ScienceIE dataset.
Keywords:information extraction  complex entity recognition  gated interactive attention  2D probability encoding
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号