首页 | 本学科首页   官方微博 | 高级检索  
     

基于可重构阵列的CNN数据量化方法
引用本文:朱家扬,蒋林,李远成,宋佳,刘帅.基于可重构阵列的CNN数据量化方法[J].计算机应用研究,2024,41(4):1070-1076.
作者姓名:朱家扬  蒋林  李远成  宋佳  刘帅
作者单位:1. 西安科技大学通信与信息工程学院;2. 西安科技大学计算机科学与技术学院;3. 西安科技大学电气与控制工程学院
基金项目:国家自然科学基金重点资助项目(61834005);
摘    要:针对卷积神经网络(CNN)模型中大量卷积操作,导致网络规模大幅增加,从而无法部署到嵌入式硬件平台,以及不同粒度数据与底层硬件结构不协调导致计算效率低的问题,基于项目组开发的可重构阵列处理器,面向支持多种位宽的运算单元,通过软硬件协同和可重构计算方法,采用KL(Kullback-Leibler)散度自定义量化阈值和随机取整进行截断处理的方式,寻找参数定长的最佳基点位置,设计支持多种计算粒度并行操作的指令及其卷积映射方案,并以此实现三种不同位宽的动态数据量化。实验结果表明,将权值与特征图分别量化到8 bit可以在准确率损失2%的情况下将模型压缩为原来的50%左右;将测试图像量化到三种位宽下进行硬件测试的加速比分别达到1.012、1.273和1.556,最高可缩短35.7%的执行时间和降低56.2%的访存次数,同时仅带来不足1%的相对误差,说明该方法可以在三种量化位宽下实现高效率的神经网络计算,进而达到硬件加速和模型压缩的目的。

关 键 词:卷积神经网络  数据量化  可重构结构  并行映射  加速比
收稿时间:2023/7/5 0:00:00
修稿时间:2024/3/12 0:00:00

CNN data quantization method based on reconfigurable array
ZHU Jiayang,JIANG Lin,LI Yuancheng,SONG Jia and LIU Shuai.CNN data quantization method based on reconfigurable array[J].Application Research of Computers,2024,41(4):1070-1076.
Authors:ZHU Jiayang  JIANG Lin  LI Yuancheng  SONG Jia and LIU Shuai
Abstract:Convolution operations lead to a significant increase in the network size, which makes CNN models difficult to deploy to the embedded hardware platform, and different granularity data is not coordinated with the underlying hardware structure, which leads to low computing efficiency. Based on the reconfigurable array with the computing units supporting multiple bit widths, through software hardware cooperation and reconfigurable computing methods, this paper defined the quantization threshold using KL divergence and random integer method, proposed a strategy for finding the best basis point, designed an instruction set and a parallel mapping scheme supporting multiple bit widths to realize three distinct bit widths in data quantization. The results show the quantization scheme with 8 bit weight and feature map can compress model parameter quantity to about 50% with 2% accuracy loss; The acceleration ratios of quantifying the test images to three different bit widths reach 1.012, 1.273, and 1.556, respectively, which can shorten the execution time by up to 35.7% and reduce memory access times by 56.2%, while only bringing less than 1% relative error. This indicates that this method can achieve efficient neural network computation under three quantization bit widths, thereby implementing hardware acceleration and model compression.
Keywords:CNN  data quantization  reconfigurable structure  parallel mapping  acceleration ratio
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号