首页 | 本学科首页   官方微博 | 高级检索  
     

基于事件抽取的网络新闻多文档自动摘要
引用本文:韩永峰,许旭阳,李弼程,朱武斌,陈刚. 基于事件抽取的网络新闻多文档自动摘要[J]. 中文信息学报, 2012, 26(1): 58-67
作者姓名:韩永峰  许旭阳  李弼程  朱武斌  陈刚
作者单位:解放军信息工程大学 信息工程学院,河南 郑州 450002
基金项目:国家社科重大基金项目,国家"863"计划资助项目
摘    要:目前,有代表性的自动摘要方法是根据文本片段进行聚类,较传统方法避免了信息冗余,但网络新闻文本中有些文本片段和主题无关,影响了聚类的效果,导致最终生成的摘要不够简洁。为此,该文引入事件抽取技术,提出了一种基于事件抽取的网络新闻多文档自动摘要方法。该方法首先通过二元分类器辨析出文本中的事件和非事件;然后通过聚类将文档原来以段落或句子为单位的物理划分转化为以事件为单位的内容逻辑划分,最后通过主旨事件抽取、排序及润色,生成摘要。实验结果表明,该方法是有效的,显著提高了生成摘要的质量。

关 键 词:事件抽取  中文信息处理  分类  新闻文档  聚类  自动摘要  

Web News Multi-document Summarization Based on Event Extraction
HAN Yongfeng , XU Xuyang , LI Bicheng , ZHU Wubin , CHEN Gang. Web News Multi-document Summarization Based on Event Extraction[J]. Journal of Chinese Information Processing, 2012, 26(1): 58-67
Authors:HAN Yongfeng    XU Xuyang    LI Bicheng    ZHU Wubin    CHEN Gang
Affiliation:Institute of Information Engineering, PLA Information Engineering University, Zhengzhou, Henan 450002, China
Abstract:State-of-the-art automatic summarization is based on text segment clustering to avoid redundancy defects in the traditional approaches.But some of the text segments in the web news are irrelevant to the subject,which affects the result of clustering and damages the conciseness of summarization.This paper introduces the event extraction technology and proposes an event extraction based web news multi-document summarization method.Firstly,the method distinguishes event and non-event from the news through a binary classifier.Then,the original documents’ physical division based on paragraphs or sentences are transformed into event based content logical division through clustering.Finally,the summarization is derived from the extraction,taxis and embellishment of the major events.Experimental results demonstrate the effectiveness of the proposed method,which improves summarization quality significantly.
Keywords:event extraction  Chinese information processing  classification  news text  clustering  automatic summarization
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号