高性能人脸识别加速器优化设计及FPGA实现 Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

高性能人脸识别加速器优化设计及FPGA实现

引用本文：	吴进,张伟华,席萌,代巍.高性能人脸识别加速器优化设计及FPGA实现[J].计算机工程与应用,2020,56(22):48-54.

作者姓名：	吴进张伟华席萌代巍

作者单位：	西安邮电大学电子工程学院，西安 710121

基金项目：	陕西省自然科学基础研究计划项目;国家自然科学基金;陕西省重点研发计划项目

摘要：	计算机视觉的快速发展对嵌入式产品的系统性能要求越来越高，传统的现场可编程门阵列（Field Programmable Gate Array，FPGA）平台存在计算吞吐未能很好匹配内存带宽，通用处理器对卷积神经网络（Convolutional Neural Network，CNN）的实现效率不高，未能满足性能要求等问题。针对以上设计瓶颈，使用经典的LeNet-5神经网络模型，在Xilinx ZC706嵌入式开发平台上设计了一个高性能的人脸识别神经网络加速器，在高层次综合（High Level Synthesis，HLS）工具的基础上通过存储优化、定点量化、运算优化等方法对神经网络模型进行优化改进，实现了7层的CNN加速器。实验结果表明，CNN加速器的工作频率为200 MHz，相较于CPU，加速器实现了126倍加速，相较于GPU速度提升10倍以上，并且功耗仅为2.62 W。
关键词：	CNN加速器现场可编程门阵列（FPGA）高层次综合（HLS）存储优化定点量化
Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator

WU Jin,ZHANG Weihua,XI Meng,DAI Wei.Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator[J].Computer Engineering and Applications,2020,56(22):48-54.

Authors:	WU Jin ZHANG Weihua XI Meng DAI Wei

Affiliation:	School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

Abstract:	The rapid development of computer vision requires higher and higher system performance of embedded products, traditional Field Programmable Gate Array（FPGA） platform has some problems that computational throughput does not match the memory bandwidth well, the implementation efficiency of general processor pair Convolutional Neural Network（CNN） is not high, and the performance requirements are not met. Aiming at above design bottlenecks, using the classic LeNet-5 neural network model, a high-performance face recognition neural network accelerator is designed on the Xilinx ZC706 embedded development platform, which is optimized by storage based on High Level Synthesis（HLS） tools. The fixed-point quantization, computational optimization and other aspects of the neural network model are optimized and improved, and the 7-layer CNN accelerator is realized. Experimental results show that the operating frequency of CNN accelerator is 200 MHz. Compared with the CPU, the accelerator achieves 126 times acceleration, which is more than ten times faster than the GPU speed, and the power consumption is only 2.62 W.

Keywords:	CNN accelerator Field Programmable Gate Array（FPGA） High Level Synthesis（HLS） storage optimization fixed point quantization
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏