首页 | 本学科首页   官方微博 | 高级检索  
     

融合主题特征的文本自动摘要方法研究
引用本文:罗芳,汪竞航,何道森,蒲秋梅.融合主题特征的文本自动摘要方法研究[J].计算机应用研究,2021,38(1):129-133.
作者姓名:罗芳  汪竞航  何道森  蒲秋梅
作者单位:武汉理工大学计算机科学与技术学院,武汉430063;香港恒生大学供应链及资讯管理系,香港999077;中央民族大学信息工程学院,北京100081
基金项目:武汉理工大学自主创新研究基金资助项目;国家教育部人文社会科学研究规划基金资助项目
摘    要:针对传统图模型方法进行文本摘要时只考虑统计特征或浅层次语义特征,缺乏对深层次主题语义特征的挖掘与利用,提出了融合主题特征后多维度度量的文本自动摘要方法MDSR(multi-dimension summarization rank)。首先利用LDA主题模型对文本主题语义信息进行挖掘,定义了主题重要度以衡量主题特征对句子重要程度的影响;然后结合主题特征、统计特征和句间相似度,改进了图模型节点的概率转移矩阵的构建方式;最后根据句子节点权重进行摘要的抽取与度量。实验结果显示,当主题特征、统计特征及句间相似度权重比例达到3:4:3时,MDSR方法的ROUGE评测值达到最佳,ROUGE-1、ROUGE-2、ROUGE-SU4值分别达到53.35%、35.18%和33.86%,优于对比方法,表明了融入主题特征后的文本摘要方法有效提高了摘要抽取的准确性。

关 键 词:TextRank  文本摘要  语义特征  主题模型  概率转移矩阵
收稿时间:2019/9/30 0:00:00
修稿时间:2020/12/12 0:00:00

Research on automatic text summarization combining topic feature
Luo Fang,Wang Janghang,He Daosen and Pu Qiumei.Research on automatic text summarization combining topic feature[J].Application Research of Computers,2021,38(1):129-133.
Authors:Luo Fang  Wang Janghang  He Daosen and Pu Qiumei
Affiliation:(School of Computer Science&Technology,Wuhan University of Technology,Wuhan 430063,China;Dept.of Supply Chain&Information Management,Hang Seng University of Hong Kong,Hong Kong 999077,China;School of Information Engineering,Minzu University of China,Beijing 100081,China)
Abstract:Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features,and lack mining and utilization of deep topic semantic features,this paper proposed MDSR(multi-dimension summarization rank),an automatic text summarization method that combined topic feature.Specifically,this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic.And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity.Finally,it extracted and measured summarization according to the weight of sentence nodes.The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature,statistic feature and inter-sentence similarity is 3:4:3.The ROUGE-1,ROUGE-2,ROUGE-SU4 are 53.35%,35.18%and 33.86%,which perform better than other comparisons.It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction.
Keywords:TextRank  text summarization  semantic features  LDA  probability transition matrix
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号