首页 | 本学科首页   官方微博 | 高级检索  
     

基于双编码器的短文本自动摘要方法
引用本文:丁建立,李洋,王家亮. 基于双编码器的短文本自动摘要方法[J]. 计算机应用, 2019, 39(12): 3476-3481. DOI: 10.11772/j.issn.1001-9081.2019050800
作者姓名:丁建立  李洋  王家亮
作者单位:中国民航大学 计算机科学与技术学院,天津 300300;中国民航大学 计算机科学与技术学院,天津 300300;中国民航大学 计算机科学与技术学院,天津 300300
基金项目:民航局科技重大专项基金资助项目(MHRD20150107,MHRD20160109);中央高校基本科研业务费专项资金资助项目(3122018C025);中国民航大学科研启动基金资助项目(2014QD13X)。
摘    要:针对当前生成式文本摘要方法存在的语义信息利用不充分、摘要精度不够等问题,提出一种基于双编码器的文本摘要方法。首先,通过双编码器为序列映射(Seq2Seq)架构提供更丰富的语义信息,并对融入双通道语义的注意力机制和伴随经验分布的解码器进行了优化研究;然后,在词嵌入生成技术中融合位置嵌入和词嵌入,并新增词频-逆文档频率(TF-IDF)、词性(POS)、关键性得分(Soc),优化词嵌入维度。所提方法对传统序列映射Seq2Seq和词特征表示进行优化,在增强模型对语义的理解的同时,提高了摘要的质量。实验结果表明,该方法在Rouge评价体系中的表现相比传统伴随自注意力机制的递归神经网络方法(RNN+atten)和多层双向伴随自注意力机制的递归神经网络方法(Bi-MulRNN+atten)提高10~13个百分点,其文本摘要语义理解更加准确、生成效果更好,拥有更好的应用前景。

关 键 词:生成式文本摘要  序列映射(Seq2Seq)  双编码器  经验分布  词特征表示
收稿时间:2019-05-13
修稿时间:2019-07-16

Short text automatic summarization method based on dual encoder
DING Jianli,LI Yang,WANG Jialiang. Short text automatic summarization method based on dual encoder[J]. Journal of Computer Applications, 2019, 39(12): 3476-3481. DOI: 10.11772/j.issn.1001-9081.2019050800
Authors:DING Jianli  LI Yang  WANG Jialiang
Affiliation:College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
Abstract:Aiming at the problems of insufficient use of semantic information and the poor summarization precision in the current generated text summarization method, a text summarization method was proposed based on dual encoder. Firstly, the dual encoder was used to provide richer semantic information for Sequence to Sequence (Seq2Seq) architecture. And the attention mechanism with dual channel semantics and the decoder with empirical distribution were optimized. Then, position embedding and word embedding were merged in word embedding technology, and Term Frequency-Inverse Document Frequency (TF-IDF), Part Of Speech (POS), key Score (Soc) were added to word embedding, as a result, the word embedding dimension was optimized. The proposed method aims to optimize the traditional sequence mapping of Seq2Seq and word feature representation, enhance the model's semantic understanding, and improve the quality of the summarization. The experimental results show that the proposed method has the performance improved in the Rouge evaluation system by 10 to 13 percentage points compared with traditional Recurrent Neural Network method with attention (RNN+atten) and Multi-layer Bidirectional Recurrent Neural Network method with attention (Bi-MulRNN+atten). It can be seen that the proposed method has more accurate semantic understanding of text summarization and the generation effect better, and has a better application prospect.
Keywords:generated text summarization  Sequence to Sequence (Seq2Seq)  double encoder  empirical distribution  word feature representation  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号