首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于分布式编码的同步梯度下降算法
引用本文:李博文,谢在鹏,毛莺池,徐媛媛,朱晓瑞,张基.一种基于分布式编码的同步梯度下降算法[J].计算机工程,2021,47(4):68-76,83.
作者姓名:李博文  谢在鹏  毛莺池  徐媛媛  朱晓瑞  张基
作者单位:河海大学 计算机与信息学院, 南京 211100
基金项目:国家重点研发计划;国家自然科学基金重点项目
摘    要:基于数据并行化的异步随机梯度下降(ASGD)算法由于需要在分布式计算节点之间频繁交换梯度数据,从而影响算法执行效率。提出基于分布式编码的同步随机梯度下降(SSGD)算法,利用计算任务的冗余分发策略对每个节点的中间结果传输时间进行量化以减少单一批次训练时间,并通过数据传输编码策略的分组数据交换模式降低节点间的数据通信总量。实验结果表明,当配置合适的超参数时,与SSGD和ASGD算法相比,该算法在深度神经网络和卷积神经网络分布式训练中平均减少了53.97%、26.89%和39.11%、26.37%的训练时间,从而证明其能有效降低分布式集群的通信负载并保证神经网络的训练精确度。

关 键 词:神经网络  深度学习  分布式编码  梯度下降  通信负载  
收稿时间:2020-02-06
修稿时间:2020-04-04

A Synchronized Gradient Descent Algorithm Based on Distributed Coding
LI Bowen,XIE Zaipeng,MAO Yingchi,XU Yuanyuan,ZHU Xiaorui,ZHANG Ji.A Synchronized Gradient Descent Algorithm Based on Distributed Coding[J].Computer Engineering,2021,47(4):68-76,83.
Authors:LI Bowen  XIE Zaipeng  MAO Yingchi  XU Yuanyuan  ZHU Xiaorui  ZHANG Ji
Affiliation:School of Computer and Information, Hohai University, Nanjing 211100, China
Abstract:The Asynchronized Stochastic Gradient Descent(ASGD)algorithm based on data parallelization require frequent gradient data exchanges between distributed computing nodes,which affects the execution efficiency of the algorithm.This paper proposes a Synchronized Stochastic Gradient Descent(SSGD)algorithm based on distributed coding.The algorithm uses the redundancy allocation strategy of computation tasks to quantify the intermediate transmission time of each node,and thus reduces the consumed time for training of a single batch.Then the amount of data transmitted between nodes is reduced by using the grouped data exchange mode of the coding strategy for data communication.Experimental results show that with a suitable hyper parameter configuration,the proposed algorithm can reduce the average distributed training time of Deep Neural Network(DNN)and Convolutional Neural Network(CNN)by 53.97%and 26.89%compared with the SSGD algorithm,and by 39.11%and 26.37%compared with the ASGD algorithm.It can significantly reduce the communication loads of the distributed cluster and ensures the training accuracy of neural networks.
Keywords:neural network  deep learning  distributed coding  Gradient Descent(GD)  communication load
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号