首页 | 本学科首页   官方微博 | 高级检索  
     

一种可配置的CNN协加速器的FPGA实现方法
引用本文:蹇强,张培勇,王雪洁.一种可配置的CNN协加速器的FPGA实现方法[J].电子学报,2019,47(7):1525-1531.
作者姓名:蹇强  张培勇  王雪洁
作者单位:浙江大学信息与电子工程学院,浙江杭州,310027;浙江大学城市学院,浙江杭州,310015
基金项目:面向14纳米及以下工艺的亚皮秒精度信号片上测量关键技术研究;面向10纳米及以下工艺集成电路晶圆快速缺陷检测
摘    要:针对卷积神经网络中卷积运算复杂度高而导致计算时间过长的问题,本文提出了一种八级流水线结构的可配置CNN协加速器FPGA实现方法.通过在卷积运算控制器中嵌入池化采样控制器的复用手段使计算模块获得更多资源,利用mirror-tree结构来提高并行度,并采用Map算法来提高计算密度,同时加快了计算速度.实验结果表明,当精度为32位定点数/浮点数时,该实现方法的计算性能达到22.74GOPS.对比MAPLE加速器,计算密度提高283.3%,计算速度提高了224.9%,对比MCA(Memory-Centric Accelerator)加速器,计算密度提高了14.47%,计算速度提高了33.76%,当精度为8-16位定点数时,计算性能达到58.3GOPS,对比LBA(Layer-Based Accelerator)计算密度提高了8.5%.

关 键 词:卷积神经网络  FPGA  嵌入式  卷积计算  并行算法
收稿时间:2018-05-30

An FPGA Implementation Method for Configurable CNN Co-Accelerator
JIAN Qiang,ZHANG Pei-yong,WANG Xue-jie.An FPGA Implementation Method for Configurable CNN Co-Accelerator[J].Acta Electronica Sinica,2019,47(7):1525-1531.
Authors:JIAN Qiang  ZHANG Pei-yong  WANG Xue-jie
Affiliation:1. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China; 2. Zhejiang University City College, Zhejiang University, Hangzhou, Zhejiang 310015, China
Abstract:To solve the problem that the time consumption of convolutional neural network is too much,which is mostly caused by the high complexity of convolution operation,an FPGA implementation of a configurable CNN co-accelerator with eight-stage pipeline structure is proposed.By embedding the pooling controller in the convolution controller,the computational module is able to obtain more resources.Specially,a mirror-tree structure is designed to increase parallelism.Furthermore,to increase computational density and speed up calculation at the same time,the Map algorithm is implemented in this design.The experimental results show that the computing performance of this implementation reaches 22.74 GOPS on 32-bit fixed/float point.Compared with MAPLE accelerator,the computational density is increased by 283.3%,and the calculation speed is boosted by 224.9%.Compared with MCA(Memory-Centric Accelerator), the computational density is increased by 14.47%,and the calculation speed is boosted by 33.76%.With a precision range between 8-bit and 16-bit fixed point,the performance reaches 58.3GOPS,and the computational density is increased by 8.5% compared with LBA(Layer-Based Accelerator).
Keywords:convolutional neural network  FPGA  embedded-system  convolution  parallel algorithm  
本文献已被 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号