首页 | 本学科首页   官方微博 | 高级检索  
     

基于支持向量机分类和语义信息的中文跨文本指代消解
引用本文:赵知纬,顾静航,胡亚楠,钱龙华,周国栋.基于支持向量机分类和语义信息的中文跨文本指代消解[J].计算机应用,2013,33(4):984-987.
作者姓名:赵知纬  顾静航  胡亚楠  钱龙华  周国栋
作者单位:1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006 2. 苏州大学 自然语言处理实验室,江苏 苏州 215006 3. .苏州大学 自然语言处理实验室,江苏 苏州 215006
基金项目:国家自然科学基金资助项目,江苏省自然科学基金资助项目,江苏省高校自然科学重大项目
摘    要:跨文本(实体)指代消解(CDCR)的任务就是把所有分布在不同文本但指向相同实体的词组合在一起形成一个指代链。传统的跨文本指代消解主要采用聚类方法来解决信息检索中遇到的重名消歧问题。将聚类问题转换为分类问题,并采用支持向量机(SVM)分类器来解决信息抽取中的重名消歧和多名聚合问题。该方法可有效融合实体名称的构词特征、读音特征以及文本内部和文本外部的多种语义特征。在中文跨文本指代语料库上的实验表明,同聚类方法相比,该方法在提高精度的同时,也提高了召回率。

关 键 词:跨文本指代  信息抽取  支持向量机分类器  语义信息  重名消歧  多名聚合
收稿时间:2012-09-24
修稿时间:2012-10-30

Chinese cross document co-reference resolution based on SVM classification and semantics
ZHAO Zhiwei , GU Jinghang , HU Yanan , QIAN Longhua , ZHOU Guodong.Chinese cross document co-reference resolution based on SVM classification and semantics[J].journal of Computer Applications,2013,33(4):984-987.
Authors:ZHAO Zhiwei  GU Jinghang  HU Yanan  QIAN Longhua  ZHOU Guodong
Affiliation:1. Laboratory of Natual Language Processing, Soochow University, Suzhou Jiangsu 215006, China
2. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China
3. Laboratory of Natual Language Processing, Soochow University, Suzhou Jiangsu 215006, China
4. Laboratory of Natual Language Processing, Soochow University, Suzhou Jiangsu 215006, ChinaJiangsu 215006, China
Abstract:The task of Cross-Document Co-reference Resolution (CDCR) aims to merge those words distributed in different texts which refer to the same entity together to form co-reference chains. The traditional research on CDCR addresses name disambiguation posed in information retrieval using clustering methods. This paper transformed CDCR as a classification problem by using an Support Vector Machine (SVM) classifier to resolve both name disambiguation and variant consolidation, both of which were prevalent in information extraction. This method can effectively integrate various features, such as morphological, phonetic, and semantic knowledge collected from the corpus and the Internet. The experiment on a Chinese cross-document co-reference corpus shows the classification method outperforms clustering methods in both precision and recall.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号