一种基于分布式编码的同步梯度下降算法 A Synchronized Gradient Descent Algorithm Based on Distributed Coding期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于分布式编码的同步梯度下降算法

引用本文：	李博文,谢在鹏,毛莺池,徐媛媛,朱晓瑞,张基.一种基于分布式编码的同步梯度下降算法[J].计算机工程,2021,47(4):68-76,83.

作者姓名：	李博文谢在鹏毛莺池徐媛媛朱晓瑞张基

作者单位：	河海大学计算机与信息学院, 南京 211100

基金项目：	国家重点研发计划;国家自然科学基金重点项目

摘要：	基于数据并行化的异步随机梯度下降（ASGD）算法由于需要在分布式计算节点之间频繁交换梯度数据，从而影响算法执行效率。提出基于分布式编码的同步随机梯度下降（SSGD）算法，利用计算任务的冗余分发策略对每个节点的中间结果传输时间进行量化以减少单一批次训练时间，并通过数据传输编码策略的分组数据交换模式降低节点间的数据通信总量。实验结果表明，当配置合适的超参数时，与SSGD和ASGD算法相比，该算法在深度神经网络和卷积神经网络分布式训练中平均减少了53.97%、26.89%和39.11%、26.37%的训练时间，从而证明其能有效降低分布式集群的通信负载并保证神经网络的训练精确度。
关键词：	神经网络深度学习分布式编码梯度下降通信负载
收稿时间：	2020-02-06
修稿时间：	2020-04-04
A Synchronized Gradient Descent Algorithm Based on Distributed Coding

LI Bowen,XIE Zaipeng,MAO Yingchi,XU Yuanyuan,ZHU Xiaorui,ZHANG Ji.A Synchronized Gradient Descent Algorithm Based on Distributed Coding[J].Computer Engineering,2021,47(4):68-76,83.

Authors:	LI Bowen XIE Zaipeng MAO Yingchi XU Yuanyuan ZHU Xiaorui ZHANG Ji

Affiliation:	School of Computer and Information, Hohai University, Nanjing 211100, China

Abstract:	The Asynchronized Stochastic Gradient Descent(ASGD)algorithm based on data parallelization require frequent gradient data exchanges between distributed computing nodes,which affects the execution efficiency of the algorithm.This paper proposes a Synchronized Stochastic Gradient Descent(SSGD)algorithm based on distributed coding.The algorithm uses the redundancy allocation strategy of computation tasks to quantify the intermediate transmission time of each node,and thus reduces the consumed time for training of a single batch.Then the amount of data transmitted between nodes is reduced by using the grouped data exchange mode of the coding strategy for data communication.Experimental results show that with a suitable hyper parameter configuration,the proposed algorithm can reduce the average distributed training time of Deep Neural Network(DNN)and Convolutional Neural Network(CNN)by 53.97%and 26.89%compared with the SSGD algorithm,and by 39.11%and 26.37%compared with the ASGD algorithm.It can significantly reduce the communication loads of the distributed cluster and ensures the training accuracy of neural networks.

Keywords:	neural network deep learning distributed coding Gradient Descent(GD) communication load
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏