一种高性能可重构深度卷积神经网络加速器 High performance reconfigurable accelerator for deep convolutional neural networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种高性能可重构深度卷积神经网络加速器

引用本文：	乔瑞秀,陈刚,龚国良,鲁华祥.一种高性能可重构深度卷积神经网络加速器[J].西安电子科技大学学报,2019,46(3):130-139.

作者姓名：	乔瑞秀陈刚龚国良鲁华祥

作者单位：	^1. 中国科学院半导体研究所,北京 100083^2. 中国科学院大学,北京 100049^3. 中国科学院脑科学与智能技术卓越创新中心,上海 200031^4. 半导体神经网络智能感知与计算技术北京市重点实验室,北京 100083

基金项目：	中国科学院战略性先导科技专项(A)类超导计算机研发(XDA18000000);北京市科技计划(Z181100001518006);国家自然科学基金青年基金(61701473);国家自然科学基金青年基金(61401423);中国科学院STS计划(KFJ-STS-ZDTP-070);中国科学院国防科技创新基金(CXJJ-17-M152)

摘要：	由于深度卷积神经网络的卷积层通道规模及卷积核尺寸多样,现有加速器面对这些多样性很难实现高效计算。为此,基于生物脑神经元机制提出了一种深度卷积神经网络加速器。该加速器拥有类脑神经元电路的多种分簇方式及链路组织方式,可以应对不同通道规模。设计了3种卷积计算映射,可以应对不同卷积核大小;实现了局部存储区数据的高效复用,可大量减少数据搬移,提高了计算性能。分别以目标分类和目标检测网络进行测试,该加速器的计算性能分别达498.6×10 ⁹次/秒和571.3×10 ⁹次/秒;能效分别为582.0×10 ⁹次/(秒·瓦)和651.7×10 ⁹次/(秒·瓦)。
关键词：	深度神经网络加速器可重构结构高性能超大规模集成电路
收稿时间：	2019-02-14
High performance reconfigurable accelerator for deep convolutional neural networks

QIAO Ruixiu,CHEN Gang,GONG Guoliang,LU Huaxiang.High performance reconfigurable accelerator for deep convolutional neural networks[J].Journal of Xidian University,2019,46(3):130-139.

Authors:	QIAO Ruixiu CHEN Gang GONG Guoliang LU Huaxiang

Affiliation:	1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China;2. University of the Chinese Academy of Sciences, Beijing, 100049, China;3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China;4. Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab, Beijing 100083, China

Abstract:	In deep convolutional neural networks,the diversity of channel sizes and kernel sizes makes it difficult for existing accelerators to achieve efficient calculations. Therefore, based on the biological brain neuron mechanism, a deep convolutional neural network accelerator is proposed which can provide not only multiple clustering methods for brain-like neurons and link organization among brain-like neurons towards different channel sizes, but also three mapping methods for different convolution kernel sizes. The accelerator implements efficient reuse of local memory data, which greatly reduces the amount of data movement and improves the computing performance. Tested by the object classification network and object detection network, the accelerator's computational performance is 498.6 GOPS and 571.3 GOPS, respectively; the energy efficiency is 582.0 GOPS/W and 651.7 GOPS/W, respectively.

Keywords:	deep neural networks accelerator reconfigurable architecture high performance very large scale integrated circuit

	点击此处可从《西安电子科技大学学报》浏览原始摘要信息
	点击此处可从《西安电子科技大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏