面向自然语言推理的基于截断高斯距离的自注意力机制 Truncated Gaussian Distance-based Self-attention Mechanism for Natural Language Inference期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向自然语言推理的基于截断高斯距离的自注意力机制

引用本文：	张鹏飞,李冠宇,贾彩燕. 面向自然语言推理的基于截断高斯距离的自注意力机制[J]. 计算机科学, 2020, 47(4): 178-183

作者姓名：	张鹏飞李冠宇贾彩燕

作者单位：	北京交通大学计算机与信息技术学院北京 100044;北京交通大学交通数据分析与挖掘北京市重点实验室北京 100044

基金项目：	国家自然科学基金;中央高校基本科研业务费专项

摘要：	在自然语言理解任务中,注意力机制由于可以有效捕获词在上下文语境中的重要程度并提高自然语言理解任务的有效性而受到了人们的普遍关注。基于注意力机制的非递归深度网络Transformer,不仅以非常少的参数和训练时间取得了机器翻译学习任务的最优性能,还在自然语言推理(Gaussian-Transformer)、词表示学习(Bert)等任务中取得了令人瞩目的成绩。目前Gaussian-Transformer已成为自然语言推理任务性能最好的方法之一。然而,在Transformer中引入Gaussian先验分布对词的位置信息进行编码,虽然可以大大提升邻近词的重要程度,但由于Gaussian分布中非邻近词的重要性会快速趋向于0,对当前词的表示有重要作用的非邻近词的影响会随着距离的加深消失殆尽。因此,文中面向自然语言推理任务,提出了一种基于截断高斯距离分布的自注意力机制,该方法不仅可以凸显邻近词的重要性,还可以保留对当前词表示具有重要作用的非邻近词的信息。在自然语言推理基准数据集SNLI和MultiNLI上的实验结果证实,截断高斯距离分布自注意力机制能够更有效地提取句子中词语的相对位置信息。
关键词：	自然语言推理自注意力机制距离掩码截断高斯掩码
Truncated Gaussian Distance-based Self-attention Mechanism for Natural Language Inference

ZHANG Peng-fei,LI Guan-yu,JIA Cai-yan. Truncated Gaussian Distance-based Self-attention Mechanism for Natural Language Inference[J]. Computer Science, 2020, 47(4): 178-183

Authors:	ZHANG Peng-fei LI Guan-yu JIA Cai-yan

Affiliation:	(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China)

Abstract:	In the task of natural language inference,attention mechanisms have attracted a lot of attention because it can effectively capture the importance of words in the context and improve the effectiveness of natural language inference tasks.Transformer,a deep feedforward network model solely based on attention mechanisms,not only achieves state-of-the-art performance on machine translation with much less parameters and training time,but also achieves remarkable results in tasks such as natural language inference(Gaussian-Transformer)and word representation learning(Bert).Moreover,Gaussian-Transformer has become one of the best methods for natural language inference tasks.However,the Gaussian prior distribution in Transformer,which weights the positional importance of words,although greatly improves the importance of adjacent words,the importance of non-neighborhood words in Gaussian distribution will quickly become 0,the influence of non-neighborhood words that plays an important role in the current word representation will disappear as the distance deepens.Therefore,this paper proposed a position weighting method based on the self-attention mechanism of clipped Gaussian distance distribution for natural language inference.This method not only highlights the importance of neighboring words,but also preserves non-neighborhood words those are important to the current word representation.The experimental results on the natural language inference benchmark datasets SNLI and MultiNLI confirm the validity of the cliped Gaussian distance distribution used in the self-attention mechanism for extracting the relative position information of the words in sentences.

Keywords:	Natural language inference Self-attention mechanism Distance mask Clipped Gaussian distance mask
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏