A scalable and efficient convolutional neural network accelerator using HLS for a system-on-chip design期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A scalable and efficient convolutional neural network accelerator using HLS for a system-on-chip design

Affiliation:	1. Università degli Studi di Cagliari , Italy;2. Università degli Studi dell''Aquila, Italy;3. Università degli Studi di Sassari, Italy;4. Thales Alenia Space España, Spain;5. Philips, Netherlands;6. Eindhoven University of Technology, Netherlands;7. Institute of Information Theory and Automation, Czech Republic;8. Tampere University, Finland;9. Charles University, Czechia;10. Universidad de Granada, Spain;11. Nokia, Finland;12. University of Turku, Finland;13. Abinsula, Italy;14. InstitutoTecnólogico de Informática, Spain;15. Camea, Czech Republic;p. Seven Solutions, Spain;q. Delft University of Technology, Netherlands

Abstract:	This paper presents a configurable convolutional neural network accelerator (CNNA) for a system-on-chip (SoC). The goal was to accelerate inference in different deep learning networks on an embedded SoC platform. The presented CNNA has a scalable architecture that uses high-level synthesis (HLS) and SystemC for the hardware accelerator. It can accelerate any convolutional neural network (CNN) exported from Keras in Python and supports a combination of convolutional, max-pooling, and fully connected layers. A training method with fixed-point quantised weights is proposed and presented in the paper. The CNNA is template-based, enabling it to scale for different targets of the Xilinx Zynq platform. This approach enables design space exploration, which makes it possible to explore several configurations of the CNNA during C and RTL simulation, fitting it to the desired platform and model. The CNN VGG16 was used to test the solution on a Xilinx Ultra96 board using productivity for Zynq (PYNQ). The result gave a high level of accuracy in training with an autoscaled fixed-point Q2.14 format compared to a similar floating-point model. It was able to perform inference in 2.0 s while having an average power consumption of 2.63 W, which corresponds to a power efficiency of 6.0 GOPS/W.

Keywords:	System-on-chip FPGA High-level synthesis Convolutional neural network PYNQ
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏