基于ALBERT-UniLM模型的文本自动摘要技术研究 Automatic Text Summarization Technology Based on ALBERT-UniLM Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于ALBERT-UniLM模型的文本自动摘要技术研究

引用本文：	孙宝山,谭浩.基于ALBERT-UniLM模型的文本自动摘要技术研究[J].计算机工程与应用,2022,58(15):184-190.

作者姓名：	孙宝山谭浩

作者单位：	1.天津工业大学计算机科学与技术学院，天津 300387 2.天津市自主智能技术与系统重点实验室，天津 300387

基金项目：	国家自然科学基金（61972456,61173032）；;天津市自然科学基金（20JCYBJC00140）；

摘要：	任务中的生成式摘要模型对原文理解不充分且容易生成重复文本等问题，提出将词向量模型ALBERT与统一预训练模型UniLM相结合的算法，构造出一种ALBERT-UniLM摘要生成模型。该模型采用预训练动态词向量ALBERT替代传统的BERT基准模型进行特征提取获得词向量。利用融合指针网络的UniLM语言模型对下游生成任务微调，结合覆盖机制来降低重复词的生成并获取摘要文本。实验以ROUGE评测值作为评价指标，在2018年CCF国际自然语言处理与中文计算会议（NLPC-C2018）单文档中文新闻摘要评价数据集上进行验证。与BERT基准模型相比，ALBERT-UniLM模型的Rouge-1、Rouge-2和Rouge-L指标分别提升了1.57%、1.37%和1.60%。实验结果表明，提出的ALBERT-UniLM模型在文本摘要任务上效果明显优于其他基准模型，能够有效提高文本摘要的生成质量。
关键词：	自然语言处理预训练语言模型 ALBERT模型 UniLM模型生成式摘要
Automatic Text Summarization Technology Based on ALBERT-UniLM Model

SUN Baoshan,TAN Hao.Automatic Text Summarization Technology Based on ALBERT-UniLM Model[J].Computer Engineering and Applications,2022,58(15):184-190.

Authors:	SUN Baoshan TAN Hao

Affiliation:	1.School of Computer Science and Technology, Tiangong University, Tianjin 300387, China 2.Tianjin Key Laboratory of Autonomous Intelligence Technology and Systems, Tiangong University, Tianjin 300387, China

Abstract:	Aiming at the problem that the generative summary model in the text summarization task does not fully understand the original text and is easy to generate repeated texts, an algorithm combining the dynamic word vector model ALBERT and the unified pre-training model UniLM is proposed to construct an ALBERT-UniLM summary generate the model. The model first uses the pre-trained dynamic word vector ALBERT to replace the traditional BERT benchmark model for feature extraction to obtain the word vector. Then the UniLM language model of the fusion pointer network is used to fine-tune the downstream generation tasks, and the coverage mechanism is combined to reduce the generation of repetitive content and obtain the summary text. The experimental result uses the ROUGE evaluation value as the evaluation indicator. It is verified on the 2018 CCF International Natural Language Processing and Chinese Computing Conference（NLPCC2018） single-document Chinese news summary evaluation data set. Compared with the BERT benchmark model, the Rouge of the ALBERT-UniLM model Rouge-1, Rouge-2 and Rouge-L indicators increased by 1.57%, 1.37% and 1.60% respectively. Experimental results show that the ALBERT-UniLM model proposed in the article is significantly better than other benchmark models on text summarization tasks, and can effectively improve the quality of text summarization generation.

Keywords:	natural language processing pre-trained language model ALBERT model UniLM model abstractive summarization

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏