首页 | 本学科首页   官方微博 | 高级检索  
     

基于并列结构的部分整体关系获取方法
引用本文:夏 飞,曹馨宇,符建辉,王 石,曹存根.基于并列结构的部分整体关系获取方法[J].中文信息学报,2015,29(1):88-96.
作者姓名:夏 飞  曹馨宇  符建辉  王 石  曹存根
作者单位:1. 中国科学院计算技术研究所智能信息处理重点实验室,北京 100190;
2. 中国科学院大学,北京 100049
基金项目:国家自然科学基金(91224006、61173063、61035004、61203284、309737163)、国家社科基金(10AYY003)
摘    要:部分整体关系是一种基础而重要的语义关系,从文本中自动获取部分整体关系是知识工程的一项基础性研究课题。该文提出了一种基于图的从Web中获取部分整体关系的方法,首先利用部分整体关系模式从Google下载语料,然后用并列结构模式从中匹配出部分概念对,据此形成图,用层次聚类算法对该图进行自动聚类,使正确的部分概念聚集在一起。在层次聚类基础上,我们挖掘并列结构的特性、图的特点和汉语的语言特点,采用惩罚逗号边、去除低频边、奖励环路、加重相同后缀和前缀等5种方法调整图中边的权重,在不损失层次聚类的高准确率条件下,大幅提高了召回率。

关 键 词:部分整体关系  图模型  并列结构  层次聚类  边权重  

Extracting Part-Whole Relations Based on Coordinate Structure
XIA Fei,CAO Xinyu,FU Jianhui,WANG Shi,CAO Cungen.Extracting Part-Whole Relations Based on Coordinate Structure[J].Journal of Chinese Information Processing,2015,29(1):88-96.
Authors:XIA Fei  CAO Xinyu  FU Jianhui  WANG Shi  CAO Cungen
Affiliation:1. Key Laboratory of Intelligent Information Processing,Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Automatic discovery of part-whole relations from the Web is a fundamental but critical problem in knowledge engineering. This paper proposes a graph-based method of extracting part-whole relations from the Web. Firstly, we download snippets from Google using part-whole query patterns, and then we built a graph by extracting word pairs with a coordinate structure from these snippets, with the co-occurring words as nodes and the frequency count as edges’ weight. A hierarchical clustering method is used to cluster the correct parts, which is optimized by five methods of adjusting the edge weight: reduce the weight of comma-edges, cut the low-frequency edges, enlarge the weight of edges in the loop, enlarge the weight of edges in which two nodes share the same suffix, and enlarge the weight of edges in which two nodes share the same prefix. Experimental results show that the five methods increase the recall substantially.
Keywords:part-whole relations  graph model  coordinate structure  hierarchical clustering  edge weight  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号