首页 | 本学科首页   官方微博 | 高级检索  
     

基于FPGA的量化CNN加速系统设计
引用本文:巩杰,赵烁,何虎,邓宁.基于FPGA的量化CNN加速系统设计[J].计算机工程,2022,48(3):170-174+196.
作者姓名:巩杰  赵烁  何虎  邓宁
作者单位:清华大学微电子学研究所, 北京 100084
基金项目:国家重点研发计划(2016YFA0201800);
摘    要:深度卷积神经网络(CNN)模型中卷积层和全连接层包含大量卷积操作,导致网络规模、参数量和计算量大幅增加,部署于CPU/GPU平台时存在并行计算性能差和不适用于移动设备环境的问题,需要对卷积参数做量化处理并结合硬件进行加速设计。现场可编程门阵列(FPGA)可满足CNN并行计算和低功耗的需求,并具有高度的灵活性,因此,基于FPGA设计CNN量化方法及其加速系统。提出一种通用的动态定点量化方法,同时对网络的各个层级进行不同精度的量化,以减少网络准确率损失和网络参数的存储需求。在此基础上,针对量化后的CNN设计专用加速器及其片上系统,加速网络的前向推理计算。使用ImageNet ILSVRC2012数据集,基于VGG-16与ResNet-50网络对所设计的量化方法和加速系统进行性能验证。实验结果显示,量化后VGG-16与ResNet-50的网络规模仅为原来的13.8%和24.8%,而Top-1准确率损失均在1%以内,表明量化方法效果显著,同时,加速系统在运行VGG-16时,加速效果优于其他3种FPGA实现的加速系统,峰值性能达到614.4 GOPs,最高提升4.5倍,能耗比达到113.99 GOPs/W,最高提升4.7倍。

关 键 词:卷积神经网络  动态定点量化  硬件加速  现场可编程门阵列  模型压缩  
收稿时间:2021-01-22
修稿时间:2021-03-19

Design of Quantized CNN Acceleration System Based on FPGA
GONG Jie,ZHAO Shuo,HE Hu,DENG Ning.Design of Quantized CNN Acceleration System Based on FPGA[J].Computer Engineering,2022,48(3):170-174+196.
Authors:GONG Jie  ZHAO Shuo  HE Hu  DENG Ning
Affiliation:Institute of Microelectronics, Tsinghua University, Beijing 100084, China
Abstract:The convolution layer and full connection layer in the deep Convolutional Neural Network(CNN)model contain a large number of convolution operations,resulting in a significant increase in network scale,parameters,and computation. Deep CNNs are unsuitable for the mobile device environment,and the parallel computing performance is poor when deployed on the CPU/GPU platform. Thus,it is necessary to quantify the convolution parameters and speed up the design combining with the hardware.Field Programmable Gate Array(FPGA)with low power consumption and high flexibility,meet the requirements of CNN parallel computing. Therefore,the CNN quantization method and its acceleration system are designed based on FPGA.The general dynamic fixed-point quantization method proposed in this study quantifies each level of the network with different accuracy,simultaneously,reducing the loss in network accuracy as well as storage requirements of network parameters. On this basis,a special accelerator and its on-chip system are designed for the quantized CNN,to accelerate the forward reasoning calculation of the network. Using the ImageNetILSVRC2012 dataset,the performance of the designed quantization method and acceleration system are verified on the VGG-16 and ResNet-50 networks. Experimental results show that the network scale of VGG-16 and ResNet-50,after quantization,is only 13.8% and 24.8% of the original,respectively while the Top-1 accuracy loss is less than 1%,indicating that the quantization method is remarkably effective. Meanwhile,the acceleration system of VGG-16,outperforms the other three FPGA acceleration systems with a maximum increase of 4.5 times in peak performance(614.4 GOPs)and 4.7 times in energy consumption ratio(113.99 GOPs/W).
Keywords:Convolutional Neural Network(CNN)  dynamic fiexed-point quantization  hardware acceleration  Field Programmable Gate Array(FPGA)  model compression
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号