首页 | 本学科首页   官方微博 | 高级检索  
     

基于层次聚类的跨文本中文人名消歧研究
引用本文:张菲菲,李宗海,周晓辉,李晓戈. 基于层次聚类的跨文本中文人名消歧研究[J]. 计算机工程与应用, 2014, 50(6): 106-111
作者姓名:张菲菲  李宗海  周晓辉  李晓戈
作者单位:1.西安邮电大学,西安 7101212.济南中林信息科技有限公司,济南 250100
摘    要:
人名消歧已经成为自然语言处理和信息抽取应用中亟待解决的重要问题。运用中文自然语言处理和信息抽取系统识别命名实体和实体关系,生成实体信息对象(Entity Profile),采用实体信息对象(EP)中的个人信息特征,实体关系和上下文相关信息在Hadoop平台上基于凝聚的层次聚类方法解决了实体消歧问题。采用哈尔滨工业大学整理的全网新闻语料作为人名消歧训练和测试数据,着重研究了中文人名消歧特征的选取,参数的确定和验证,在训练集和测试集上分别取得了91.33%和88.73%的F值。说明提出的方法具有较好的可行性。

关 键 词:人名消歧  信息抽取  相似度  层次聚类  

Cross-document Chinese personal name entity disambiguation based on hierarchical clustering
ZHANG Feifei,LI Zonghai,ZHOU Xiaohui,LI Xiaoge. Cross-document Chinese personal name entity disambiguation based on hierarchical clustering[J]. Computer Engineering and Applications, 2014, 50(6): 106-111
Authors:ZHANG Feifei  LI Zonghai  ZHOU Xiaohui  LI Xiaoge
Affiliation:1.Xi’an University of Posts & Telecommunications, Xi’an 710121, China2.Jinan Zhonglin Information Technology Co., Ltd, Jinan 250100, China
Abstract:
Cross-document entity disambiguation is the problem of identifying whether mentions from different documents refer to the same or distinct entities. This paper describes a Chinese information extraction system which involves both document-level IE and corpus-level IE, a pipeline and multi-level modular approach to name entity and Entity Profile extrac-tion. It introduces novel features based on document-level entity profiles and study on the influence of feature selection, parameter selection, parameter validation and analysis on results. Disambiguation is performed based on agglomerative hier-archical clustering using Hadoop. Experiments show that F-measure of training set is 91.33% and testing set is 88.73%, using the whole network news corpus dataset from Harbin Institute of Technology.
Keywords:entity disambiguation  information extraction  similarity  hierarchical clustering
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号