首页 | 本学科首页   官方微博 | 高级检索  
     

面向云端FPGA的卷积神经网络加速器的设计及其调度
引用本文:蔡瑞初,余洋,钟椿荣,卢冶,陈瑶.面向云端FPGA的卷积神经网络加速器的设计及其调度[J].计算机应用研究,2020,37(1):172-177,182.
作者姓名:蔡瑞初  余洋  钟椿荣  卢冶  陈瑶
作者单位:广东工业大学 计算机学院,广州510006;南开大学 计算机与控制工程学院,天津300350;广东工业大学 计算机学院,广州510006;新加坡高等数字科学中心,新加坡 138602
摘    要:卷积神经网络的高计算复杂性阻碍其广泛用于实时和低功耗应用,现有软件实现方案难以满足其对运算性能与功耗的要求,传统面向FPGA的卷积神经网络构造方式具有流程复杂、周期较长和优化空间较小等问题。针对该问题,根据卷积神经网络计算模式的特点,提出一种面向云端FPGA的卷积神经网络加速器的设计及其调度机制。通过借鉴基于HLS技术、引入循环切割参数和对卷积层循环重排的设计,采用模块化方式构造网络,并进行参数拓展以进一步优化加速器处理过程;通过分析系统任务和资源的特性总结调度方案,且从控制流和数据流两方面对其进行优化设计。与其他已有工作相比,提出的设计提供了一种同时具有灵活性、低能耗、高能效和高性能的解决方案,并且探讨了加速器的高效通用调度方案。实验结果表明,该加速器可在有效提高运算整速度的同时减少功耗。

关 键 词:卷积神经网络  现场可编程门阵列  高层次综合  加速器  调度
收稿时间:2018/5/22 0:00:00
修稿时间:2018/8/2 0:00:00

Design and scheduling of convolutional neural network accelerator for cloud FPGA
Cai Ruichu,YU Yang,Zhong Chunrong,Lu Ye and Chen Yao.Design and scheduling of convolutional neural network accelerator for cloud FPGA[J].Application Research of Computers,2020,37(1):172-177,182.
Authors:Cai Ruichu  YU Yang  Zhong Chunrong  Lu Ye and Chen Yao
Affiliation:College of Computer Science,Guangdong University of Technology,,,,
Abstract:Convolutional neural network''s high computational complexity often obstructs its widespread adhibition in real-time and low-power applications. The existing software implementation solution cannot meet the demands of the CNN for computing performance and power consumption. The traditional FPGA-oriented CNN construction method has problems such as complicated process, long cycle and small optimization space. For these problems, according to the characteristics of CNN calculation pattern, this paper proposed a design and scheduling mechanism of convolutional neural network accelerator for cloud FPGAs. By using for reference the design which based HLS technology, importing the cyclic cutting parameters and rearranging the convolution layer circularly, it constructed the network in a modular way, and extended parameters to further optimize the accelerator processing process. It summarized the scheduling scheme by analyzing the characteristics of system tasks and resources, and optimized its design from two aspects of control and data flow. In comparison with other existing works, the proposed design provided a solution with flexibility, low energy consumption, high energy efficiency and performance. The design also discussed the efficient universal scheduling scheme of the accelerator. Experimental results show that the accelerator can improve the computing speed and reduce the power consumption.
Keywords:convolutional neural network(CNN)  field programmable gate array  high-level synthesis  accelerator  scheduling
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号