首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于主题词集的自动文摘方法
引用本文:刘兴林,郑启伦,马千里.一种基于主题词集的自动文摘方法[J].计算机应用研究,2011,28(4):1322-1324.
作者姓名:刘兴林  郑启伦  马千里
作者单位:1. 华南理工大学,计算机科学与工程学院,广州,510640;五邑大学,计算机学院,广东,江门,529020
2. 华南理工大学,计算机科学与工程学院,广州,510640
基金项目:省自然科学基金资助项目,华南理工大学中央高校基本科研业务费专项资金资助项目
摘    要:提出一种基于主题词集的文本自动文摘方法,用于自动提取文档文摘.该方法根据提取到的主题词集,由主题词权重进行加权计算各主题词所在的句子权重,从而得出主题词集对应的每个句子的总权重,再根据自动文摘比例选取句子权重较大的几个句子,最后按原文顺序输出文摘.实验在哈工大信息检索研究室单文档自动文摘语料库上进行,使用内部评测自动评...

关 键 词:自动文摘  主题词集  句子权重  自然语言处理
收稿时间:2010/10/1 0:00:00
修稿时间:3/12/2011 4:42:54 PM

Automatic summarization method based on thematic term set
LIU Xing-lin,ZHENG Qi-lun,MA Qian-li.Automatic summarization method based on thematic term set[J].Application Research of Computers,2011,28(4):1322-1324.
Authors:LIU Xing-lin  ZHENG Qi-lun  MA Qian-li
Abstract:An automatic summarization method based on thematic term set is proposed for automatic extracting abstracts from Chinese documents. According to the extracted thematic term set, the method calculated the sentence weights by the weights of the thematic terms, then got the corresponding total weight of each sentence, and selected several sentences with higher weight by percentage, and finally, output the summarization sentences by original order. Experiments were conducted on HIT IR-lab Text Summarization Corpus, and intrinsic automatic evaluation measures were utilized to evaluate the performance of the proposed method. Experimental results show that the proposed method achieved 66.07% upon the F-measure, which suggests it can generate higher quality summarization, nearly to the reference abstract, achieving very good performance.
Keywords:automatic summarization  thematic term set  sentence weight  NLP
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号