基于残差的门控循环单元EI北大核心CSCD Residual Based Gated Recurrent Unit期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于残差的门控循环单元EI北大核心CSCD

引用本文：	张忠豪,董方敏,胡枫,吴义熔,孙水发.基于残差的门控循环单元EI北大核心CSCD[J].自动化学报,2022,48(12):3067-3074.

作者姓名：	张忠豪董方敏胡枫吴义熔孙水发

作者单位：	1.三峡大学计算机与信息学院宜昌 443002

基金项目：	国家自然科学基金(U1703261, 61871258), 国家重点研发计划(2016-YFB0800403)资助

摘要：	传统循环神经网络易发生梯度消失和网络退化问题.利用非饱和激活函数可以有效克服梯度消失的性质,同时借鉴卷积神经网络中的残差结构能够有效缓解网络退化的特性,在门控循环神经网络(Gated recurrent unit,GRU)的基础上提出了基于残差的门控循环单元(Residual-GRU,Re-GRU)来缓解梯度消失和网络退化问题.Re-GRU的改进主要包括两个方面:1)将原有GRU的候选隐状态的激活函数改为非饱和激活函数;2)在GRU的候选隐状态表示中引入残差信息.对候选隐状态激活函数的改动不仅可以有效避免由饱和激活函数带来的梯度消失问题,同时也能够更好地引入残差信息,使网络对梯度变化更敏感,从而达到缓解网络退化的目的.进行了图像识别、构建语言模型和语音识别3类不同的测试实验,实验结果均表明,Re-GRU拥有比对比方法更高的检测性能,同时在运行速度方面优于Highway-GRU和长短期记忆单元.其中,在语言模型预测任务中的Penn Treebank数据集上取得了23.88的困惑度,相比有记录的最低困惑度,该方法的困惑度降低了一半.
关键词：	深度学习循环神经网络门控循环单元残差连接
收稿时间：	2019-08-18
Residual Based Gated Recurrent Unit

Affiliation:	1.College of Computer and Information Technology, China Three Gorges University, Yichang 4430022.Yichang Key Laboratory of Intelligent Medicine, Yichang 443002

Abstract:	Traditional recurrent neural networks are prone to the problems of vanishing gradient and degradation. Relying on the facts that non-saturated activation functions can effectively overcome the vanishing gradient problem, and the residual structure in convolution neural network can effectively alleviate the degradation problem, we propose a residual?gated recurrent unit (Re-GRU) which leverages gated recurrent unit (GRU) to alleviate the problems of vanishing gradient and degradation. There are two main improvements in Re-GRU. One is to replace the activation function of the candidate hidden state in GRU with the non-saturated activation function. The other is to introduce the residual information into the candidate hidden state representation of the GRU. The modification of candidate hidden state activation function can not only effectively avoid vanishing gradient caused by non-saturated activation function, but also introduce residual information to make the network more sensitive to gradient change, so as to alleviate the degradation problem. We conducted three kinds of test experiments, including image recognition, building language model, and speech recognition. The results indicate that our proposed Re-GRU has higher detection performance than other 6 methods. Specifically, we achieved a test-set perplexity of 23.88 on the Penn Treebank data set in language model prediction task, which is one half of the lowest value ever recorded.

Keywords:
本文献已被维普等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏