首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进的TextRank的自动摘要提取方法
引用本文:余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247.
作者姓名:余珊珊  苏锦钿  李鹏飞
作者单位:广东药科大学医药信息工程学院 广州510006,华南理工大学计算机科学与工程学院 广州510640,华南理工大学计算机科学与工程学院 广州510640
基金项目:本文受广东省自然科学基金(2015A030310318),广东省医学科学技术研究基金项目(A2015065),国家自然科学基金资助
摘    要:经典的TextRank算法在文档的自动摘要提取时往往只考虑了句子节点间的相似性,而忽略了文档的篇章结构及句子的上下文信息。针对这些问题,结合中文文本的结构特点,提出一种改进后的iTextRank算法,通过将标题、段落、特殊句子、句子位置和长度等信息引入到TextRank网络图的构造中,给出改进后的句子相似度计算方法及权重调整因子,并将其应用于中文文本的自动摘要提取,同时分析了算法的时间复杂度。最后,实验证明iTextRank比经典的TextRank方法具有更高的准确率和更低的召回率。

关 键 词:中文文本  自动摘要提取  TextRank  篇章结构  无监督学习方法
收稿时间:2016/1/20 0:00:00
修稿时间:2016/3/20 0:00:00

Improved TextRank-based Method for Automatic Summarization
YU Shan-shan,SU Jin-dian and LI Peng-fei.Improved TextRank-based Method for Automatic Summarization[J].Computer Science,2016,43(6):240-247.
Authors:YU Shan-shan  SU Jin-dian and LI Peng-fei
Affiliation:College of Medical Information Engineering,Guangdong Pharmaceutical University,Guangzhou 510006,China,College of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,China and College of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,China
Abstract:The canonical TextRank usually only considers the similarity between sentences in the processes of automatic summarization and neglects the information of text structures and sentence contexts.To overcome these disadvantages,we proposed an improved method on the basis of TextRank,called iTextRank,by incorporating the structure information of Chinese texts.iTextRank takes some important contexts and semantic information into consideration,including titles,paragraphs,special sentences,positions and lengths of sentences,when building the network diagram of TextRank,computing the similarities of sentences and adjusting the weights of the nodes.We also applied iTextRank into the automatic summarization of Chinese texts and analyzed its time complexities.Finally,some experiments were done.The results prove that iTextRank has higher accuracy rate and lower recall rate compared with canonical TextRank.
Keywords:Chinese texts  Automatic summarization extraction  TextRank  Article discourse  Unsupervised learning methods
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号