首页 | 本学科首页   官方微博 | 高级检索  
     

结合主题信息聚类编码的文本摘要模型
引用本文:魏媛媛,倪建成,高峰,吴俊清. 结合主题信息聚类编码的文本摘要模型[J]. 计算机技术与发展, 2021, 0(1)
作者姓名:魏媛媛  倪建成  高峰  吴俊清
作者单位:曲阜师范大学软件学院
基金项目:国家自然科学基金青年项目(61601261);山东省研究生教育质量提升计划项目(SDYY17136)
摘    要:结合注意力机制的序列到序列模型在生成式文本摘要的研究中已取得了广泛应用,但基于该模型的摘要生成技术依然存在信息编码不充分、生成的摘要偏离主题的问题,对此提出了一种结合主题信息聚类编码的文本摘要生成模型TICTS(theme information clustering coding text summarization)。将传统的抽取式文本摘要方法与基于深度学习的生成式文本摘要方法相结合,使用基于词向量的聚类算法进行主题信息提取,利用余弦相似度计算输入文本与所提取关键信息的主题相关性,将其作为主题编码的权重以修正注意力机制,在序列到序列模型的基础上结合主题信息与注意力机制生成摘要。模型在LCSTS数据集上进行实验,以ROUGE为评价标准,实验结果相对于基线模型在ROUGE-1的得分上提高了1.1,ROUGE-2提高了1.3,ROUGE-L提高了1.1。实验证明结合主题信息聚类编码的摘要模型生成的摘要更切合主题,摘要质量有所提高。

关 键 词:序列到序列模型  生成式文本摘要  词向量聚类  主题编码  余弦相似度

A Text Abstract Summarization Model Combined with Theme Information Clustering Coding
WEI Yuan-yuan,NI Jian-cheng,GAO Feng,WU Jun-qing. A Text Abstract Summarization Model Combined with Theme Information Clustering Coding[J]. Computer Technology and Development, 2021, 0(1)
Authors:WEI Yuan-yuan  NI Jian-cheng  GAO Feng  WU Jun-qing
Affiliation:(School of Software,Qufu Normal University,Jining 272000,China)
Abstract:The sequence-to-sequence model combined with the attention mechanism has been widely used in the research of the generative text abstract,but the abstract generation technology based on this model still has the problems of insufficient information encoding and the generated abstract deviating from the topic.Therefore,we present a TICTS(theme information clustering coding text summarization)model based on the cluster encoding of topic information.The traditional extraction text abstract method is combined with the generation text summary method based on deep learning,and the topic information is extracted by using the clustering algorithm based on word vector.The topic correlation between the input text and the extracted key information is calculated by cosine similarity,which is used as the weight of topic encoding to modify the attention mechanism,and the abstract is generated by combining the topic information and attention mechanism on the basis of the sequence-to-sequence model.The model is tested on the LCSTS dataset.With ROUGE as the evaluation standard,compared with the baseline model,the experimental results are improved by 1.1,1.3 and 1.1 in terms of the score of Rouges-1,Rouges-2 and Rouges-L.It is showed that the summary model combined with the abstract model of topic information cluster encoding is more relevant to the topic,and the quality of abstract is improved.
Keywords:sequence-to-sequence model  generative text abstract  word vector clustering  theme coding  cosine similarity
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号