首页 | 本学科首页   官方微博 | 高级检索  
     

基于多视图并行的可配置卷积神经网络加速器设计
引用本文:应三丛,彭铃.基于多视图并行的可配置卷积神经网络加速器设计[J].四川大学学报(工程科学版),2022,54(2):188-195.
作者姓名:应三丛  彭铃
作者单位:四川大学计算机学院,四川大学视觉合成图形图像技术国防重点学科实验室
基金项目:四川省重大科技专项(2018GZDZX0024);四川省科技计划项目(2020YFG0288)
摘    要:针对商用CPU的专用许可证授权费用高和卷积神经网络性能提升等问题,提出了一种基于多视图并行且具有可配置性的卷积神经网络加速器设计,同时结合RISC-V构建该加速器的片上系统。首先,扩展一组适用高速协加速器的控制访问接口和数据访问接口。其次,卷积神经网络各运算单元以多视图并行与结构复用实现。视图并行的不同组合将影响卷积单元硬件电路结构,因此多视图并行可通过复用基本运算结构来完成。池化单元由行池化和列池化子单元构成,且共享行池化的运算结构。对于全连接单元,采用调整全连接运算参数的方法来适应卷积单元的硬件结构,从而完成模型间的复用。然后,针对不同运算单元的硬件结构设计不同寄存器组,并结合开源RISC-V处理实现多种网络模型。最后,将卷积神经网络各运算单元分别部署在不同平台,计算运算时间、吞吐量和速度等。实验结果表明,提出方法的卷积加速比是CPU的189倍, VGG16的卷积运算吞吐量可达178GOP/s。因此,利用多视图并行能够达到加速效果,且以配置寄存器方式可实现不同网络模型。

关 键 词:卷积神经网络    多视图并行  可配置  片上系统  复用  RISC-V  VGG16
收稿时间:2021/4/10 0:00:00
修稿时间:2021/12/17 0:00:00

Configurable Convolutional Neural Network Accelerator Based on Multi-view Parallelism
YING Sancong,PENG Ling.Configurable Convolutional Neural Network Accelerator Based on Multi-view Parallelism[J].Journal of Sichuan University (Engineering Science Edition),2022,54(2):188-195.
Authors:YING Sancong  PENG Ling
Affiliation:College of Computer Sci., Sichuan Univ., Chengdu 610065, China; National Key Lab. of Fundamental Sci. on Synthetic Vision., Sichuan Univ., Chengdu 610065, China
Abstract:Abstract: In order to solve the high expenditure of special license authorization for commercial CPUs and the improving performance of convolutional neural networks(CNNs), a configurable CNN accelerator was proposed based on multi-view parallelism. And it constructed the system on chip(SoC) with RISC-V. Firstly, a set of interfaces incorporated control access bus and data access bus were expanded for high-speed accelerators. Secondly, each operation unit for CNNs was implemented by both multi-view parallelism and structure multiplexing. The hardware circuit structure for convolution unit would be affected by the different combination of view parallelism, thus multi-view parallelism was accomplished by reusing the basic arithmetic structure. The pooling unit was composed of row pooling and column pooling submodules, and which shared the structure of row pooling operation. For the fully connected, a method that parameters could be processed was employed to adapt to the hardware structure of convolution unit, thereby completing reuse between models. Then, different registers were designed for the hardware structure of different computing units, and combined with the open source RISC-V processor to realize multiple CNN models. Finally, different operation units were respectively deployed on different platforms to calculate the latency, throughput and speedup ratio. The experimental results demonstrated that the speedup ratio of the proposed method for convolution was 189 times that of the CPU, as well as the throughput of convolution operation achieved 178GOP/s for VGG16. Hence, an acceleration effect could be realized by multi-view parallelism, and different CNN models was able to be implemented by configuring registers.
Keywords:Convolutional neural network  Multi-view parallelism  Configurable  reuse  RISC-V  VGG16
点击此处可从《四川大学学报(工程科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(工程科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号