首页 | 本学科首页   官方微博 | 高级检索  
     

基于子词单元的深度学习摘要生成方法
引用本文:陈雪雯.基于子词单元的深度学习摘要生成方法[J].计算机应用与软件,2020,37(3):202-208.
作者姓名:陈雪雯
作者单位:中国科学技术大学计算机科学与技术学院中国科大-伯明翰大学智能计算与应用联合研究所 安徽 合肥 230027
摘    要:现有的生成式文本摘要方法存在一些局限,包括难以产生可靠的源文本表示,产生的摘要句与源文本的语义相似度较低,存在集外词问题等。对此提出一种混合神经网络编码器结构,对源文本的长距依赖和上下文信息进行捕捉,得到高质量的文本表示;提出一种基于关键短语的重排序机制,利用源文本中抽取的关键短语对集束搜索生成的候选序列进行重新排序,以减小其与源文本语义上的距离;对文本进行子词单元提取,利用更细粒度的单元对文本进行表示。该方法在不同长度的摘要数据集上进行实验,均取得了良好的效果。

关 键 词:生成式文本摘要  字节对编码  集束搜索  深度学习

ABSTRACT GENERATION METHOD OF DEEP LEARNING BASED ON SUBWORD UNITS
Chen Xuewen.ABSTRACT GENERATION METHOD OF DEEP LEARNING BASED ON SUBWORD UNITS[J].Computer Applications and Software,2020,37(3):202-208.
Authors:Chen Xuewen
Affiliation:(USTC-Birmingham Joint Research Institute in Intelligent Computation and Its Application(UBRI),School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,Anhui,China)
Abstract:There are some limitations in the existing generative text abstract methods,including the difficulty in generating reliable source text representation,the low semantic similarity between the generated abstract sentences and the source text,and the existence of the problem of out-of-vocabulary(OOV)words.A hybrid neural network encoder structure was proposed to capture the long-distance dependence and context information of the source text and obtain high-quality text representation.Then we proposed a reordering mechanism based on the key phrases extracted from the source text to reorder the candidate sequences generated by beam search,so as to reduce the semantic distance between them and the source text.And we extracted the subword units of the text and express the text with more fine-grained units.This method has been tested on different length summary datasets and achieves good results.
Keywords:Generative text abstract  Byte pair encoding  Beam search  Deep learning
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号