A fast and scalable architecture to run convolutional neural networks in low density FPGAs期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A fast and scalable architecture to run convolutional neural networks in low density FPGAs

Affiliation:	1. INESC-ID, ISEL, Instituto Politécnico de Lisboa, Portugal;2. INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal;1. Research Scholar, Department of ECE, Koneru Lakshmaiah Educational Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India-522502;2. Professor, Department of ECM, Koneru Lakshmaiah Educational Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India-522502;1. Department of Electronics and Communication Engineering, Faculty of Engineering, University of Kufa, Najaf, Iraq;2. Najaf Technical Institute, Al-Furat Al-Awsat Technical University, Najaf, Iraq;1. Barcelona Supercomputing Center (BSC), Barcelona, Spain;1. Department of Electrical Engineering, Jadavpur University, Kolkata-700032, West Bengal, India;2. Department of Electrical Engineering, Mizoram University, Aizawl-796004, Mizoram, India

Abstract:	Deep learning and, in particular, convolutional neural networks (CNN) achieve very good results on several computer vision applications like security and surveillance, where image and video analysis are required. These networks are quite demanding in terms of computation and memory and therefore are usually implemented in high-performance computing platforms or devices. Running CNNs in embedded platforms or devices with low computational and memory resources requires a careful optimization of system architectures and algorithms to obtain very efficient designs. In this context, Field Programmable Gate Arrays (FPGA) can achieve this efficiency since the programmable hardware fabric can be tailored for each specific network. In this paper, a very efficient configurable architecture for CNN inference targeting any density FPGAs is described. The architecture considers fixed-point arithmetic and image batch to reduce computational, memory and memory bandwidth requirements without compromising network accuracy. The developed architecture supports the execution of large CNNs in any FPGA devices including those with small on-chip memory size and logic resources. With the proposed architecture, it is possible to infer an image in AlexNet in 4.3 ms in a ZYNQ7020 and 1.2 ms in a ZYNQ7045.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏