结合半波高斯量化与交替更新的神经网络压缩方法 Neural Network Compression Method Combining Half-Wave Gaussian Quantization and Alternate Update期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合半波高斯量化与交替更新的神经网络压缩方法

引用本文：	张红梅,严海兵,张向利.结合半波高斯量化与交替更新的神经网络压缩方法[J].计算机工程,2021,47(5):80-87.

作者姓名：	张红梅严海兵张向利

作者单位：	桂林电子科技大学广西高校云计算与复杂系统重点实验室, 广西桂林 541004

基金项目：	广西密码学与信息安全重点实验室基金;国家自然科学基金;认知无线电与信息处理省部共建教育部重点实验室基金

摘要：	为使神经网络模型能在实时性要求较高且内存容量受限的边缘设备上部署使用，提出一种基于半波高斯量化与交替更新的混合压缩方法。对神经网络模型输入部分进行2 bit均匀半波高斯量化，将量化值输入带有缩放因子的二值网络通过训练得到初始二值模型，利用交替更新方法对已训练的二值模型进行逐层微调以提高模型测试精度。在CIFAR-10和ImageNet数据集上的实验结果表明，该方法能有效降低参数和结构冗余所导致的内存和时间开销，在神经网络模型压缩比接近30的前提下，测试精度相比HWGQ-Net方法提高0.8和2.0个百分点且实现了10倍的训练加速。
关键词：	卷积神经网络量化模型压缩半波高斯量化交替更新
收稿时间：	2020-03-24
修稿时间：	2020-04-26
Neural Network Compression Method Combining Half-Wave Gaussian Quantization and Alternate Update

ZHANG Hongmei,YAN Haibing,ZHANG Xiangli.Neural Network Compression Method Combining Half-Wave Gaussian Quantization and Alternate Update[J].Computer Engineering,2021,47(5):80-87.

Authors:	ZHANG Hongmei YAN Haibing ZHANG Xiangli

Affiliation:	Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China

Abstract:	To enable the deployment of neural network models on edge devices with a limited memory size and high real-time performance requirements,this paper proposes a hybrid compression method combining Half-Wave Gaussian Quantization(HWGQ)and alternate update.By performing the 2 bit uniform HWGQ on the input of the neural network model,the quantized value is input into a binary network with a scaling factor,which is trained to obtain the initial binary model.Then the trained binary model is fine-tuned layer by layer using the alternating update method to improve the accuracy of the model.Experimental results on the CIFAR-10 and ImageNet datasets show that the proposed method significantly reduces the memory consumption and time consumption caused by parameter redundancy and structural redundancy.When the model compression ratio is about 30,the accuracy of the model is increased by 0.8 and 2.0 percentage points compared with that of the HWGQ-Net method,and its training speed is increased by 10 times.

Keywords:	Convolutional Neural Network(CNN) quantization model compression Half-Wave Gaussian Quantization(HWGQ) alternate update
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏