YOLOv3-tiny的硬件加速设计及FPGA实现 Design and FPGA implementation of YOLOv3-tiny hardware acceleration期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

YOLOv3-tiny的硬件加速设计及FPGA实现

引用本文：	陈浩敏,姚森敬,席禹,张凡,辛文成,王龙海,任超.YOLOv3-tiny的硬件加速设计及FPGA实现[J].计算机工程与科学,2021,43(12):2139-2149.

作者姓名：	陈浩敏姚森敬席禹张凡辛文成王龙海任超

作者单位：	（1.南方电网数字电网研究院有限公司，广东广州 510623；2.天津大学电气自动化与信息工程学院，天津 300072）

基金项目：	国家自然科学基金（61972282）

摘要：	YOLOv3-tiny具有优秀的目标检测能力，但模型所需的计算力依然较大，难以实现面向嵌入式领域的应用。提出一种YOLOv3-tiny的硬件加速方法，并在FPGA平台上实现。首先，针对网络定点化设计，以数据精度与资源消耗为设计指标，通过对模型中数据分布的统计以及数据类型的划分，提出了不同的定点化策略。其次，针对网络并行化设计，通过对卷积神经网络计算特性的分析，使用循环调整、循环分块、循环展开和数组分割等方法，设计了可扩展的常用硬件计算单元架构。然后，针对网络流水化设计，从层间与层内2个方面进行研究，以层间数据流方向和层内任务划分为基础，设计了一种灵活的流水化计算架构。最后，在XILINX XC7Z020CLG400-1平台上进行实验，结果表明，相较于667 MHz的单核ARM-A9处理器，加速比高达290.56。
关键词：	YOLOV3-tiny 卷积神经网络 FPGA 硬件加速
收稿时间：	2020-11-10
修稿时间：	2021-01-04
Design and FPGA implementation of YOLOv3-tiny hardware acceleration

CHEN Hao-min,YAO Sen-jing,XI Yu,ZHANG Fan,XIN Wen-cheng,WANG Long-hai,REN Chao.Design and FPGA implementation of YOLOv3-tiny hardware acceleration[J].Computer Engineering & Science,2021,43(12):2139-2149.

Authors:	CHEN Hao-min YAO Sen-jing XI Yu ZHANG Fan XIN Wen-cheng WANG Long-hai REN Chao

Affiliation:	(1.China Southern Power Grid Digital Grid Research Institute Limited Company,Guangzhou 510623; 2.School of Electrical Automation and Information Engineering,Tianjin University,Tianjin 300072,China）

Abstract:	YOLOv3-tiny has the excellent target detection capability, but the computational power required by the model is still large, so it is difficult to be used in the embedded application field. This paper proposes a hardware acceleration method of YOLOv3-tiny and implements it on FPGA platform. Firstly, for the fixed-point design of the network, with data accuracy and resource consumption as design indicators, through the statistics of the data distribution in the model and the division of data types, different fixed-point strategies are determined. Secondly, for the parallel design of the network, through the analysis of the calculation characteristics of the convolutional neural network, with the methods of loop adjustment, loop block, loop expansion, and array splitting, a scalable common hardwarecomput- ing unit is designed. Then, for the network pipeline design, the research is carried out from two aspects: the inter-layer and the intra-layer. Based on the direction of the inter-layer data flow and the division of tasks within the layer, a flexible pipeline computing architecture is designed. Lastly, on the XILINX XC7Z020CLG400-1 platform, experiments demonstrate that, compared with single-core ARM-A9 processor at 667MHz, the proposal achieves the calculation speed as high as 290.56.

Keywords:	YOLOv3-tiny convolutional neural network field programmable gate array hardware acceleration
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏