面向稀疏卷积神经网络的GPU性能优化方法 Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向稀疏卷积神经网络的GPU性能优化方法

引用本文：	董晓,刘雷,李晶,冯晓兵.面向稀疏卷积神经网络的GPU性能优化方法[J].软件学报,2020,31(9):2944-2964.

作者姓名：	董晓刘雷李晶冯晓兵

作者单位：	计算机体系结构国家重点实验室(中国科学院计算技术研究所),北京100190;中国科学院大学,北京 100190;计算机体系结构国家重点实验室(中国科学院计算技术研究所),北京100190

基金项目：	国家自然科学基金（61521092）；国家重点研发计划（2017YFB1003103）

摘要：	近些年来，深度卷积神经网络在多项任务中展现了惊人的能力，并已经被用在物体检测、自动驾驶和机器翻译等众多应用中.但这些模型往往参数规模庞大，并带来了沉重的计算负担.神经网络的模型剪枝技术能够识别并删除模型中对精度影响较小的参数，从而降低模型的参数数目和理论计算量，给模型的高效执行提供了机会.然而，剪枝后的稀疏模型却难以在GPU上实现高效执行，其性能甚至差于剪枝前的稠密模型，导致模型剪枝难以带来真正的执行性能收益.提出一种稀疏感知的代码生成方法，能够生成高效的稀疏卷积GPU程序.首先为卷积算子设计了算子模板，并结合GPU的特点对模板代码进行了多种优化.算子模板中的源代码经过编译和分析被转换为算子中间表示模板，设计了一种稀疏代码生成方法，能够结合剪枝后的稀疏参数，基于中间表示模板生成对应的稀疏卷积代码.同时，利用神经网络执行过程中的数据访问特点对数据的访问和放置进行了优化，有效提升了访存吞吐量.最后，稀疏参数的位置信息被隐式编码在生成的代码中，不需要额外的索引结构，降低了访存需求.在实验中证明了：相对于GPU上已有的稀疏神经网络执行方法，提出的稀疏感知的代码生成方法能够有效提升稀疏卷积神经网络的性能.
关键词：	神经网络稀疏 GPU 性能优化卷积代码生成
收稿时间：	2019/10/5 0:00:00
修稿时间：	2020/1/13 0:00:00
Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU

DONG Xiao,LIU Lei,LI Jing,FENG Xiao-Bing.Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU[J].Journal of Software,2020,31(9):2944-2964.

Authors:	DONG Xiao LIU Lei LI Jing FENG Xiao-Bing

Affiliation:	State Key Laboratory of Computer Architecture(Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100190, China

Abstract:	In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been deployed in applications including object detection, autonomous driving, machine translation, etc. But these models are accompanied by huge amounts of parameters and bring a heavy computational burden. The neural network pruning technique can recognize and remove parameters that contribute little to the accuracy, resulting in reduced amounts of parameters and decreased theoretical computational requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to achieve efficient execution on GPUs, and the performance of sparse models cannot even match their well-optimized dense counterparts. This study designs a sparsity-aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned neural networks. First, a template is designed for convolution operators with several optimizations targeting GPU architecture. Through compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as the input to the designed algorithm to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve memory throughput, optimizations are performed on data access and data placement based on the characteristics of memory access in neural networks. Finally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse parameters can be eliminated, reducing the memory footprint during the execution. In experiments, it is demonstrated that the proposed sparse code generating method can improve the performance of sparse convolutional neural networks compared with current methods.

Keywords:	neural networks sparse GPU performance optimization convolution code generation
本文献已被万方数据等数据库收录！
	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏