首页 | 本学科首页   官方微博 | 高级检索  
     

结合规则与语义的中文人称代词指代消解
引用本文:张文艳李存华仲兆满王艺李莉.结合规则与语义的中文人称代词指代消解[J].数据采集与处理,2017,32(1):149-156.
作者姓名:张文艳李存华仲兆满王艺李莉
作者单位:1.中国矿业大学计算机科学与技术学院,徐州,221116; 2.淮海工学院计算机工程学院,连云港,222005
摘    要:指代消解是一种为了确定文章中出现的指代词与前文中出现的内容是否为同一事物的技术,在海量信息文本智能处理中具有重要的作用,而人称代词在各种指代词集合中占有相当一部分比例。本文采用规则与语义相结合的方法对中文人称代词进行指代消解,在基础的语法过滤规则之上新增同位语规则过滤指代词的候选消解项;提出更精确的同义词距离计算方法,利用同义词词林和知网对人称代词的关联词与候选先行词的关联词进行语义关系计算,选择关联度最高的候选先行词作为最终的指代结果。通 过不同方法的对比实验和在真实语料数据集上的实验表明,本文所提方法获得了较好的效果。

关 键 词:指代消解  人称代词  规则  候选先行词  语义特征

Coreference Resolution of Chinese Personal Pronouns With Combination of Semantics and Rules
Affiliation:1.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China; 2.School of Computer Engineering, Huaihai Institute of Technology, Lianyungang, 222005, China
Abstract:Coreference resolution is a widely used technology to judge whether pronouns can match with the entity existing before in the text, which plays a crucial role in intelligent processing for massive text information on internet. A coreference resolution study, especially on the frequently-used Chinese personal pronouns, was carried out with the result of a developed algorithm with the combination of semantics and rules. Based on fundamental filtration rules, an improved mechanism specific to apposition was also adopted. To raise the accuracy of calculating the synonyms distances, the algorithm identified the associative-word of personal pronouns and selected antecedents based method for analyzing semantic relations and selecting high relevancy antecedent, which was realized with the aid of Tongyici Cilin and HowNet. Comparison experiments with different methods and experiments on the real corpus dataset were conducted, and results show that the presented method achieves higher validity and obvious improvement.
Keywords:coreference resolution  person pronouns  rules  antecedent  semantic relations
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号