首页 | 本学科首页   官方微博 | 高级检索  
     

基于开放域抽取的多文档概念图构建研究
引用本文:盛泳潘,付雪峰,吴天星.基于开放域抽取的多文档概念图构建研究[J].计算机应用研究,2020,37(1):19-25.
作者姓名:盛泳潘  付雪峰  吴天星
作者单位:电子科技大学计算机科学与工程学院,成都611731;南昌工程学院信息工程学院,南昌330099;东南大学 计算机科学与工程学院,南京211189
基金项目:江西省教育厅科研项目;国家建设高水平大学公派研究生项目;江西省自然科学基金;国家自然科学基金
摘    要:在信息过载的背景下,如何从拥有共同主题的多篇文档中挖掘并组织核心概念及其语义连接已成为当前开放式信息抽取任务中的一项重要挑战。为此,提出了一个基于开放域抽取的多文档概念图构建模型。首先基于预定主题挖掘主题词,通过改进的TF-IDF算法对文档进行排序;然后通过共指消解、篇章权重计算、开放域抽取等一系列的方法从多篇文章中抽取出大量具有事实表达能力的三元组实例。为去除开放域方法本身的噪声以及提升信息抽取的准确率,提出一种事实过滤算法。通过该算法可有效提取置信度高且具有良好语义兼容性的显著事实知识集合,并构成多个概念子图。最后,将不同子图中等价的概念以及关系进行合并,形成一张具有主题表达能力的连通概念图。通过在signal media新闻数据集上进行验证,实验结果表明,所提出的模型能够跨文档挖掘并有效组织与特定主题相关的关键信息,形成的概念图在主题概念覆盖率、事实知识的兼容性等指标上均取得了较好的效果。除此之外,该模型对于自动文档摘要的应用也具有重要的参考价值。

关 键 词:开放域抽取  多文档  概念图构建
收稿时间:2018/5/23 0:00:00
修稿时间:2019/11/27 0:00:00

Multi-document conceptual graph construction research based on open domain extraction
Sheng Yongpan,Fu Xuefeng and Wu Tianxing.Multi-document conceptual graph construction research based on open domain extraction[J].Application Research of Computers,2020,37(1):19-25.
Authors:Sheng Yongpan  Fu Xuefeng and Wu Tianxing
Affiliation:University of Electronic Science and Technology of China,,
Abstract:In the background of information overload, this is challenging to mine and organize meaningful concepts and their semantic connections from a set of related documents under the same topic in open information extraction. Thus, this paper proposed a multi-document conceptual graph model based on open-domain information extraction. Firstly, documents were ranked according to the improved TF-IDF weight of extracted topic words under the predefined topics, then the model relayed on a serious of methods, including coreference resolution, weight computation, open-domain information extraction method to extract numerous representative subject-predicate-object triples from multiple documents. For filtering out the noise of open-domain information approach itself and improving the accuracy of information extraction, this paper presented a fact filtering algorithm to retain only the most salient, compatible facts as well as a form of multiple conceptual subgraphs. Finally, in combined with the equivalent concepts and relationships across different subgraphs to connect into a fully connected conceptual graph with expressive topic ability. Experiments on signal media dataset illustrate that the proposed model has the ability to discern and effectively group the key information corresponds to specific topics within and across documents, and formed conceptual graph outperforms state-of-the-art the algorithms in terms of the coverage rate of topic concepts as well as the compatible facts. Besides, this model also has the important significance for the automatic abstraction.
Keywords:open-domain extraction  multiple documents  conceptual graph construction
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号