首页 | 本学科首页   官方微博 | 高级检索  
     

基于FPGA加速的低功耗的MobileNetV2网络识别系统
引用本文:孙小坚,林瑞全,方子卿,马驰.基于FPGA加速的低功耗的MobileNetV2网络识别系统[J].计算机测量与控制,2023,31(5):221-227.
作者姓名:孙小坚  林瑞全  方子卿  马驰
作者单位:福州大学电气工程与自动化学院,,,
摘    要:近年来,卷积神经网络由于其出色的性能被广泛应用在各个领域,如图像识别、语音识别与翻译和自动驾驶等;但是传统卷积神经网络(Convolutional Neural Network,CNN)存在参数多,计算量大,部署在CPU与GPU上推理速度慢、功耗大的问题。针对上述问题,采用量化感知训练(Quantization Aware Training,QAT)的方式在保证图像分类准确率的前提下,将网络参数总量压缩为原网络的1/4;将网络权重全部部署在FPGA的片内资源上,克服了片外存储带宽的限制,减少了访问片外存储资源带来的功耗;在MobileNetV2网络的层内以及相邻的点卷积层之间提出一种协同配合的流水线结构,极大的提高了网络的实时性;提出一种存储器与数据读取的优化策略,根据并行度调整数据的存储排列方式及读取顺序,进一步节约了片内BRAM资源。最终在Xilinx的Virtex-7 VC707开发板上实现了一套性能优、功耗小的轻量级卷积神经网络MobileNetV2识别系统,200HZ时钟下达到了170.06 GOP/s的吞吐量,功耗仅为6.13W,能耗比达到了27.74 GOP/s/W,是CPU的92倍,GPU的25倍,性能较其他实现有明显的优势。

关 键 词:硬件加速    量化感知训练    MobileNet    并行计算    流水线结构
收稿时间:2022/10/13 0:00:00
修稿时间:2022/12/26 0:00:00

FPGA-accelerated Low-power MobileNetV2 Network Identification System
Abstract:In recent years, convolutional neural networks have been widely used in various fields, such as image recognition, speech recognition and translation, and autonomous driving, due to their excellent performance. However, traditional Convolutional Neural Network (CNN) has the problems of many parameters, large computation, slow inference speed and high power consumption when deployed on CPU and GPU. To address the above problem, Quantization Aware Training (QAT) is used to compress the total number of network parameters to 1/4 of the original network while ensuring the accuracy of image classification. All the network weights are deployed on the on-chip resources of FPGA, which overcomes the limitation of off-chip storage bandwidth and reduces the power consumption caused by accessing off-chip storage resources. A cooperative pipeline structure is proposed within the layers of the MobileNetV2 network and between adjacent point convolutional layers, which greatly improves the real-time performance of the network. An optimization strategy for memory and data reading is proposed to adjust the data storage arrangement and reading order according to the parallelism degree, further saving on-chip BRAM resources. Finally, a lightweight convolutional neural network MobileNetV2 recognition system with excellent performance and low power consumption was implemented on Xilinx''s Virtex-7 VC707 development board. The 200HZ clock reached the throughput of 170.06 GOP/s, with power consumption of only 6.13W, energy consumption ratio of 27.74 GOP/s/W, 92 times that of CPU and 25 times that of GPU. The performance has obvious advantages over other implementations.
Keywords:hardware acceleration  quantization aware training  MobileNet  parallel computing  pipeline structure
点击此处可从《计算机测量与控制》浏览原始摘要信息
点击此处可从《计算机测量与控制》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号