基于支持向量机分类和语义信息的中文跨文本指代消解 Chinese cross document co-reference resolution based on SVM classification and semantics期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于支持向量机分类和语义信息的中文跨文本指代消解

引用本文：	赵知纬,顾静航,胡亚楠,钱龙华,周国栋.基于支持向量机分类和语义信息的中文跨文本指代消解[J].计算机应用,2013,33(4):984-987.

作者姓名：	赵知纬顾静航胡亚楠钱龙华周国栋

作者单位：	1. 苏州大学计算机科学与技术学院，江苏苏州 215006 2. 苏州大学自然语言处理实验室，江苏苏州 215006 3. .苏州大学自然语言处理实验室，江苏苏州 215006

基金项目：	国家自然科学基金资助项目，江苏省自然科学基金资助项目，江苏省高校自然科学重大项目

摘要：	跨文本(实体)指代消解(CDCR)的任务就是把所有分布在不同文本但指向相同实体的词组合在一起形成一个指代链。传统的跨文本指代消解主要采用聚类方法来解决信息检索中遇到的重名消歧问题。将聚类问题转换为分类问题，并采用支持向量机(SVM)分类器来解决信息抽取中的重名消歧和多名聚合问题。该方法可有效融合实体名称的构词特征、读音特征以及文本内部和文本外部的多种语义特征。在中文跨文本指代语料库上的实验表明，同聚类方法相比，该方法在提高精度的同时，也提高了召回率。
关键词：	跨文本指代信息抽取支持向量机分类器语义信息重名消歧多名聚合
收稿时间：	2012-09-24
修稿时间：	2012-10-30
Chinese cross document co-reference resolution based on SVM classification and semantics

ZHAO Zhiwei , GU Jinghang , HU Yanan , QIAN Longhua , ZHOU Guodong.Chinese cross document co-reference resolution based on SVM classification and semantics[J].journal of Computer Applications,2013,33(4):984-987.

Authors:	ZHAO Zhiwei GU Jinghang HU Yanan QIAN Longhua ZHOU Guodong

Affiliation:	1. Laboratory of Natual Language Processing, Soochow University, Suzhou Jiangsu 215006, China 2. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China 3. Laboratory of Natual Language Processing, Soochow University, Suzhou Jiangsu 215006, China 4. Laboratory of Natual Language Processing, Soochow University, Suzhou Jiangsu 215006, ChinaJiangsu 215006, China

Abstract:	The task of Cross-Document Co-reference Resolution (CDCR) aims to merge those words distributed in different texts which refer to the same entity together to form co-reference chains. The traditional research on CDCR addresses name disambiguation posed in information retrieval using clustering methods. This paper transformed CDCR as a classification problem by using an Support Vector Machine (SVM) classifier to resolve both name disambiguation and variant consolidation, both of which were prevalent in information extraction. This method can effectively integrate various features, such as morphological, phonetic, and semantic knowledge collected from the corpus and the Internet. The experiment on a Chinese cross-document co-reference corpus shows the classification method outperforms clustering methods in both precision and recall.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏