首页 | 本学科首页   官方微博 | 高级检索  
     

YOLOv3-tiny的硬件加速设计及FPGA实现
引用本文:陈浩敏,姚森敬,席禹,张凡,辛文成,王龙海,任超.YOLOv3-tiny的硬件加速设计及FPGA实现[J].计算机工程与科学,2021,43(12):2139-2149.
作者姓名:陈浩敏  姚森敬  席禹  张凡  辛文成  王龙海  任超
作者单位:(1.南方电网数字电网研究院有限公司,广东 广州 510623;2.天津大学电气自动化与信息工程学院,天津 300072)
基金项目:国家自然科学基金(61972282)
摘    要:YOLOv3-tiny具有优秀的目标检测能力,但模型所需的计算力依然较大,难以实现面向嵌入式领域的应用。提出一种YOLOv3-tiny的硬件加速方法,并在FPGA平台上实现。首先,针对网络定点化设计,以数据精度与资源消耗为设计指标,通过对模型中数据分布的统计以及数据类型的划分,提出了不同的定点化策略。其次,针对网络并行化设计,通过对卷积神经网络计算特性的分析,使用循环调整、循环分块、循环展开和数组分割等方法,设计了可扩展的常用硬件计算单元架构。然后,针对网络流水化设计,从层间与层内2个方面进行研究,以层间数据流方向和层内任务划分为基础,设计了一种灵活的流水化计算架构。最后,在XILINX XC7Z020CLG400-1平台上进行实验,结果表明,相较于667 MHz的单核ARM-A9处理器,加速比高达290.56。

关 键 词:YOLOV3-tiny  卷积神经网络  FPGA  硬件加速  
收稿时间:2020-11-10
修稿时间:2021-01-04

Design and FPGA implementation of YOLOv3-tiny hardware acceleration
CHEN Hao-min,YAO Sen-jing,XI Yu,ZHANG Fan,XIN Wen-cheng,WANG Long-hai,REN Chao.Design and FPGA implementation of YOLOv3-tiny hardware acceleration[J].Computer Engineering & Science,2021,43(12):2139-2149.
Authors:CHEN Hao-min  YAO Sen-jing  XI Yu  ZHANG Fan  XIN Wen-cheng  WANG Long-hai  REN Chao
Affiliation:(1.China Southern Power Grid Digital Grid Research Institute Limited Company,Guangzhou 510623; 2.School of Electrical Automation and Information Engineering,Tianjin University,Tianjin 300072,China)
Abstract:YOLOv3-tiny has the excellent target detection capability, but the computational power required by the model is still large, so it is difficult to be used in the embedded application field. This paper proposes a hardware acceleration method of YOLOv3-tiny and implements it on FPGA platform. Firstly, for the fixed-point design of the network, with data accuracy and resource consumption as design indicators, through the statistics of the data distribution in the model and the division of data types, different fixed-point strategies are determined. Secondly, for the parallel design of the network, through the analysis of the calculation characteristics of the convolutional neural network, with the methods of loop adjustment, loop block, loop expansion, and array splitting, a scalable common hardwarecomput- ing unit is designed. Then, for the network pipeline design, the research is carried out from two aspects: the inter-layer and the intra-layer. Based on the direction of the inter-layer data flow and the division of tasks within the layer, a flexible pipeline computing architecture is designed. Lastly, on the XILINX XC7Z020CLG400-1 platform, experiments demonstrate that, compared with single-core ARM-A9 processor at 667MHz, the proposal achieves the calculation speed as high as 290.56.
Keywords:YOLOv3-tiny  convolutional neural network  field programmable gate array  hardware acceleration     
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号