首页 | 本学科首页   官方微博 | 高级检索  
     

面向卷积神经网络的高并行度FPGA加速器设计
引用本文:王晓峰,蒋彭龙,周辉,赵雄波.面向卷积神经网络的高并行度FPGA加速器设计[J].计算机应用,2021,41(3):812-819.
作者姓名:王晓峰  蒋彭龙  周辉  赵雄波
作者单位:1. 北京航天自动控制研究所, 北京 100854;2. 宇航智能控制技术国家级重点实验室, 北京 100854
基金项目:军队科研资助项目;中国运载火箭技术研究院创新研发项目
摘    要:大多数基于卷积神经网络(CNN)的算法都是计算密集型和存储密集型的,很难应用于具有低功耗要求的航天、移动机器人、智能手机等嵌入式领域。针对这一问题,提出一种面向CNN的高并行度现场可编程逻辑门阵列(FPGA)加速器。首先,比较研究CNN算法中可用于FPGA加速的4类并行度;然后,提出多通道卷积旋转寄存流水(MCRP)结构,简洁有效地利用了CNN算法的卷积核内并行;最后,采用输入输出通道并行+卷积核内并行的方案提出一种基于MCRP结构的高并行度CNN加速器架构,并将其部署到XILINX的XCZU9EG芯片上,在充分利用片上数字信号处理器(DPS)资源的情况下,峰值算力达到2 304 GOPS。以SSD-300算法为测试对象,该CNN加速器的实际算力为1 830.33 GOPS,硬件利用率达79.44%。实验结果表明,MCRP结构可有效提高CNN加速器的算力,基于MCRP结构的CNN加速器可基本满足嵌入式领域大部分应用的算力需求。

关 键 词:卷积神经网络  高性能  硬件加速器  并行度  现场可编程逻辑门阵列  
收稿时间:2020-07-09
修稿时间:2020-10-12

Design of FPGA accelerator with high parallelism for convolution neural network
WANG Xiaofeng,JIANG Penglong,ZHOU Hui,ZHAO Xiongbo.Design of FPGA accelerator with high parallelism for convolution neural network[J].journal of Computer Applications,2021,41(3):812-819.
Authors:WANG Xiaofeng  JIANG Penglong  ZHOU Hui  ZHAO Xiongbo
Affiliation:1. Beijing Aerospace Automatic Control Institute, Beijing 100854, China;2. National Key Laboratory of Science and Technology on Aerospace Intelligence Control, Beijing 100854, China
Abstract:Most of the algorithms based on Convolutional Neural Network (CNN) are computation-intensive and memory-intensive, so they are difficult to be applied in embedded fields such as aerospace, mobile robotics and smartphones which have low-power requirements. To solve this problem, a Field Programmable Gate Array (FPGA) accelerator with high parallelism for CNN was proposed. Firstly, four kinds of parallelism in CNN algorithm that can be used for FPGA acceleration were compared and studied. Then, a Multi-channel Convolutional Rotating-register Pipeline (MCRP) structure was proposed to concisely and effectively utilize the convolution kernel parallelism of CNN algorithm. Finally, using the strategy of input/output channel parallelism+convolution kernel parallelism, a CNN accelerator architecture with high parallelism was proposed based on MCRP structure, and to verify the design rationality of the architecture, it was deployed on the XCZU9EG chip of XILINX. Under the condition of making full use of the on-chip Digital Signal Processor (DSP) resources, the peak computing capacity of the proposed CNN accelerator reached 2 304 GOPS(Giga Operations Per Second). Taking SSD-300 algorithm as the test object, this CNN accelerator had the actual computing capacity of 1 830.33 GOPS, and the hardware utilization rate of 79.44%. Experimental results show that, the MCRP structure can effectively improve the computing capacity of CNN accelerator, and the CNN accelerator based on MCRP structure can generally meet the computing capacity requirements of most applications in the embedded fields.
Keywords:Convolutional Neural Network (CNN)  high performance  hardware accelerator  parallelism  Field Programmable Gate Array (FPGA)  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号