首页 | 本学科首页   官方微博 | 高级检索  
     

基于申威众核架构的分组卷积计算加速与优化
引用本文:王鑫,张铭.基于申威众核架构的分组卷积计算加速与优化[J].计算机应用研究,2023,40(6):1745-1749.
作者姓名:王鑫  张铭
作者单位:江南大学物联网工程学院,江南大学物联网工程学院
基金项目:高等学校学科创新引智计划项目(B12018)
摘    要:针对应用普通卷积结构的卷积计算复杂度较高、计算量与参数量较大的问题,提出以国产SW26010P众核处理器为平台的并行分组卷积算法。核心思想是利用独特的数据布局,通过多核映射处理进行并行计算。实验测试结果表明,与单核串行算法相比,使用该并行分组卷积算法可以获得79.5的最高加速比及186.7MFLOPS的最大有效算力。通过SIMD指令对并行分组卷积算法进行数据并行优化后,与使用优化前的并行分组卷积算法相比,可以获得10.2的最高加速比。

关 键 词:卷积神经网络  分组卷积  并行算法  数据并行
收稿时间:2022/10/12 0:00:00
修稿时间:2023/5/17 0:00:00

Acceleration and optimization of group convolution calculation based on SW many-core architecture
WANG Xin and ZHANG Ming.Acceleration and optimization of group convolution calculation based on SW many-core architecture[J].Application Research of Computers,2023,40(6):1745-1749.
Authors:WANG Xin and ZHANG Ming
Affiliation:School of Internet of Things Engineering, Jiangnan University,
Abstract:In order to solve the problems of high computational complexity, large computational cost and large number of parameters, this paper proposed the parallel group convolution algorithm based on the domestic SW26010P multi-core processor. The core idea was to use the unique data layout, through the multi-core mapping processing, parallel computing. Experimental results show that compared with single-core serial algorithm, the proposed parallel group convolution algorithm can achieve the highest speed-up ratio of 79.5 and the maximum effective computing power of 186.7MFLOPS. After data parallel optimization of the parallel group convolution algorithm by SIMD instruction, the algorithm obtains the highest speed-up ratio of 10.2 compared with the parallel group convolution algorithm before optimization.
Keywords:convolutional neural network  group convolution  parallel algorithm  data parallel
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号