首页 | 本学科首页   官方微博 | 高级检索  
     

深度学习中的权重初始化方法研究
引用本文:邢彤彤,孙仁诚,邵峰晶,隋毅.深度学习中的权重初始化方法研究[J].计算机工程,2022,48(7):104-113.
作者姓名:邢彤彤  孙仁诚  邵峰晶  隋毅
作者单位:青岛大学 计算机科学技术学院, 山东 青岛 266071
基金项目:国家自然科学基金青年科学基金项目(41706198);
摘    要:深度神经网络训练的实质是初始化权重不断调整的过程,整个训练过程存在耗费时间长、需要数据量大等问题。大量预训练网络由经过训练的权重数据组成,若能发现预训练网络权重分布规律,利用这些规律来初始化未训练网络,势必会减少网络训练时间。通过对AlexNet、ResNet18网络在ImageNet数据集上的预训练模型权重进行概率分布分析,发现该权重分布具备单侧幂律分布的特征,进而使用双对数拟合的方式进一步验证权重的单侧分布服从截断幂律分布的性质。基于该分布规律,结合防止过拟合的正则化思想提出一种标准化对称幂律分布(NSPL)的初始化方法,并基于AlexNet和ResNet32网络,与He初始化的正态分布、均匀分布两种方法在CIFAR10数据集上进行实验对比,结果表明,NSPL方法收敛速度优于正态分布、均匀分布两种初始化方法,且在ResNet32上取得了更高的精确度。

关 键 词:深度学习  卷积神经网络  预训练模型  权重初始化  对称幂律分布  
收稿时间:2021-07-08
修稿时间:2021-09-06

Research on Weight Initialization Method in Deep Learning
XING Tongtong,SUN Rencheng,SHAO Fengjing,SUI Yi.Research on Weight Initialization Method in Deep Learning[J].Computer Engineering,2022,48(7):104-113.
Authors:XING Tongtong  SUN Rencheng  SHAO Fengjing  SUI Yi
Affiliation:School of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
Abstract:The essence of deep neural network training is the constant adjustment of the initial weight, and the entire training process is time consuming and requires a large amount of data.Most pretraining networks are essentially trained weight data.If the weight distribution rules of pretraining networks are identified and untrained networks can be initialized using these rules, then the network training time can be reduced.In this study, the probability distribution analysis of the pre-training model weights of AlexNet and ResNet18 on the ImageNet dataset is performed;the result shows that the weight distribution exhibits the characteristics of a one-sided power law distribution.Subsequently, the double logarithm fitting method is used to verify that the one-sided distribution of weight obeys the truncated power law distribution.Combining the distribution law with the regularization idea to prevent overfitting, an initialization method fora Normalized Symmetric Power Law(NSPL) distribution is proposed.Subsequently, the normal and uniform distribution methods initialized by He on the AlexNet and ResNet32 networks are compared experimentally on the CIFAR10 dataset.The experimental results show that the convergence rate of the NSPL distribution initializing method is higher than those of the two abovementioned initializing methods, and that ResNet32 achieves higher accuracy.
Keywords:deep learning  Convolutional Neural Network(CNN)  pre-training model  weight initialization  symmetric power law distribution  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号