首页 | 本学科首页   官方微博 | 高级检索  
     

支持背景知识的多维端到端短语识别算法研究
引用本文:刘广,涂刚,李政,刘译键,占志强.支持背景知识的多维端到端短语识别算法研究[J].计算机工程与应用,2022,58(8):147-155.
作者姓名:刘广  涂刚  李政  刘译键  占志强
作者单位:华中科技大学 计算机科学与技术学院,武汉 430074
摘    要:目前,实体识别与依存关系分析,采用的主要是基于监督学习的深度端到端方法。这种方法存在两个问题:不能引入背景知识;不能识别出自然语言的多粒度、嵌套特征。为了解决以上问题,提出了基于短语窗口的依存句法标注规则,并标注了中文短语窗口数据集(CPWD),同时设计了配套的多维端到端短语识别模型(MDM模型)。该标注规则以短语为最小单位,把句子分成7类可嵌套的短语类型,同时标示出短语之间的依存关系。MDM模型不仅可以引入背景知识,识别出句子中的各类嵌套短语,而且可以识别出短语之间的依存关系。实验结果表明,该标注规则方便易用。同时,MDM模型比传统端到端算法能更有效地处理短语嵌套的问题。在CPWD数据集上实验,MDM模型比端到端方法在F1]值上提高1个百分点以上。相应的方法应用到了CCL2018的中文隐喻情感分析比赛中,在原有基础上提升了1个百分点以上,并取得第一名成绩。

关 键 词:自然语言处理  标注体系  短语识别  依存分析  

Research on Multi-Dimensional End-to-End Phrase Recognition Algorithm Based on Back-ground Knowledge
LIU Guang,TU Gang,LI Zheng,LIU Yijian,ZHAN Zhiqiang.Research on Multi-Dimensional End-to-End Phrase Recognition Algorithm Based on Back-ground Knowledge[J].Computer Engineering and Applications,2022,58(8):147-155.
Authors:LIU Guang  TU Gang  LI Zheng  LIU Yijian  ZHAN Zhiqiang
Affiliation:School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Abstract:At present, the deep end-to-end method based on supervised learning is mainly used in entity recognition and dependency analysis. There are two problems in this method:firstly, background knowledge cannot be introduced; secondly, multi-granularity and nested features of natural language cannot be recognized. In order to solve the above problems, this paper proposes a dependency syntax annotation rule based on phrase window, labels the Chinese phrase window data set(CPWD), and designs a supporting multi-dimensional end-to-end phrase recognition model(MDM model). The rule takes phrase as the minimum unit, divides sentences into seven nested phrase types, and indicates the dependency between phrases. MDM model can not only introduce background knowledge, recognize various nested phrases in sentences, but also recognize the dependency between phrases. The experimental results show that the annotation rule is easy to use and has no ambiguity. At the same time, the MDM model can deal with the problem of phrase nesting more effectively than the traditional end-to-end algorithm. The experiment on CPWD dataset shows that the MDM model can improve the F1] value by more than 1 percentage point compared with the end-to-end method. The corresponding method is applied to the Chinese Metaphorical Emotion Analysis Competition of CCL2018, which improves by more than 1 percentage point and wins the first place.
Keywords:natural language processing  annotation system  phrase recognition  dependency analysis  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号