面向云端FPGA的卷积神经网络加速器的设计及其调度 Design and scheduling of convolutional neural network accelerator for cloud FPGA期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向云端FPGA的卷积神经网络加速器的设计及其调度

引用本文：	蔡瑞初,余洋,钟椿荣,卢冶,陈瑶.面向云端FPGA的卷积神经网络加速器的设计及其调度[J].计算机应用研究,2020,37(1):172-177,182.

作者姓名：	蔡瑞初余洋钟椿荣卢冶陈瑶

作者单位：	广东工业大学计算机学院,广州510006;南开大学计算机与控制工程学院,天津300350;广东工业大学计算机学院,广州510006;新加坡高等数字科学中心,新加坡 138602

摘要：	卷积神经网络的高计算复杂性阻碍其广泛用于实时和低功耗应用，现有软件实现方案难以满足其对运算性能与功耗的要求，传统面向FPGA的卷积神经网络构造方式具有流程复杂、周期较长和优化空间较小等问题。针对该问题，根据卷积神经网络计算模式的特点，提出一种面向云端FPGA的卷积神经网络加速器的设计及其调度机制。通过借鉴基于HLS技术、引入循环切割参数和对卷积层循环重排的设计，采用模块化方式构造网络，并进行参数拓展以进一步优化加速器处理过程；通过分析系统任务和资源的特性总结调度方案，且从控制流和数据流两方面对其进行优化设计。与其他已有工作相比，提出的设计提供了一种同时具有灵活性、低能耗、高能效和高性能的解决方案，并且探讨了加速器的高效通用调度方案。实验结果表明，该加速器可在有效提高运算整速度的同时减少功耗。
关键词：	卷积神经网络现场可编程门阵列高层次综合加速器调度
收稿时间：	2018/5/22 0:00:00
修稿时间：	2018/8/2 0:00:00
Design and scheduling of convolutional neural network accelerator for cloud FPGA

Cai Ruichu,YU Yang,Zhong Chunrong,Lu Ye and Chen Yao.Design and scheduling of convolutional neural network accelerator for cloud FPGA[J].Application Research of Computers,2020,37(1):172-177,182.

Authors:	Cai Ruichu YU Yang Zhong Chunrong Lu Ye and Chen Yao

Affiliation:	College of Computer Science,Guangdong University of Technology,,,,

Abstract:	Convolutional neural network''s high computational complexity often obstructs its widespread adhibition in real-time and low-power applications. The existing software implementation solution cannot meet the demands of the CNN for computing performance and power consumption. The traditional FPGA-oriented CNN construction method has problems such as complicated process, long cycle and small optimization space. For these problems, according to the characteristics of CNN calculation pattern, this paper proposed a design and scheduling mechanism of convolutional neural network accelerator for cloud FPGAs. By using for reference the design which based HLS technology, importing the cyclic cutting parameters and rearranging the convolution layer circularly, it constructed the network in a modular way, and extended parameters to further optimize the accelerator processing process. It summarized the scheduling scheme by analyzing the characteristics of system tasks and resources, and optimized its design from two aspects of control and data flow. In comparison with other existing works, the proposed design provided a solution with flexibility, low energy consumption, high energy efficiency and performance. The design also discussed the efficient universal scheduling scheme of the accelerator. Experimental results show that the accelerator can improve the computing speed and reduce the power consumption.

Keywords:	convolutional neural network(CNN) field programmable gate array high-level synthesis accelerator scheduling
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏