首页 | 本学科首页   官方微博 | 高级检索  
     

从Web获取部分整体关系语料的方法
引用本文:曹馨宇,曹存根.从Web获取部分整体关系语料的方法[J].中文信息学报,2011,25(5):17-24.
作者姓名:曹馨宇  曹存根
作者单位:1. 中国科学院计算技术研究所 智能信息处理重点实验室,北京 100190;
2. 中国科学院研究生院,北京 100190
基金项目:国家自然科学基金资助项目(60773059)
摘    要:部分整体关系获取是知识获取中的重要组成部分。Web逐步成为知识获取的重要资源之一。搜索引擎是从Web中获取部分整体关系知识的有效手段之一,我们将Web中包含部分整体关系的检索结果集合称为部分整体关系语料。由于目前主流搜索引擎尚不支持语义搜索,如何构造有效的查询以得到富含部分整体关系的语料,从而进一步获取部分整体关系,就成为一个重要的问题。该文提出了一种新的查询构造方法,目的在于从Web中获取部分整体关系语料。该方法能够构造基于语境词的查询,进而利用现有的搜索引擎从Web中获取部分整体关系语料。该方法在两个方面与人工构造查询方法和基于语料库查询构造查询方法所获取的语料进行对比,其一是语料中含有部分整体关系的语句数量;二是从语料中进一步获取部分整体关系的难易程度。实验结果表明,该方法远远优于后两者。

关 键 词:部分整体关系获取  语料获取  查询构造  

A Method for Acquiring Corpus Rich in Part-Whole Relation from the Web
CAO Xinyu,CAO Cungen.A Method for Acquiring Corpus Rich in Part-Whole Relation from the Web[J].Journal of Chinese Information Processing,2011,25(5):17-24.
Authors:CAO Xinyu  CAO Cungen
Affiliation:1. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology,
Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University, the Chinese Academy of Sciences, Beijing 100190, China
Abstract:The acquisition of part-whole relations is an important problem of knowledge acquisition. The Web becomes an important resource of knowledge acquisition. Search engine is an effective way to mining knowledge from the Web. The retrieval results containing part-whole relations are called corpus rich in part-whole relation in our paper. Because the current search engine is not semantic-based retrieval, it becomes a challenging issue to construct an effective query to retrieve documents containing part-whole relation from web. This paper gives a novel method of constructing query for acquiring corpus rich in part-whole relations from the Web. We use search engine and query string with context words related to part-whole relation to acquire corpus rich in knowledge about part-whole relation. By contrasting the method of manually constructing query and the method of constructing query based on corpus on the number of retrieve documents containing part-whole relation and the difficult degree expected from the retrieve documents, the result shows that our method was superior to others.
Key wordspart-whole relation acquisition; corpus acquisition; query formulation
Keywords:part-whole relation acquisition  corpus acquisition  query formulation  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号