Abstract: | With the rapid development and popularization of artificial intelligence technology, convolutional neural network(CNN) is applied in many fields, and begins to replace most traditional algorithms and gradually deploys to terminal devices. However, the huge data movement and computational complexity of CNN bring huge power consumption and performance challenges to the hardware, which hinders the application of CNN in embedded devices such as smartphones and smart cars. This paper implements a convolutional neural network accelerator based on Winograd convolution algorithm on field-programmable gate array (FPGA). Firstly, a convolution kernel decomposition method for Winograd convolution is proposed. The convolution kernel larger than 3×3 is divided into multiple 3×3 convolution kernels for convolution operation, and the unsynchronized long convolution operation is processed. Then, we design Winograd convolution array and use configurable multiplier to flexibly realize multiplication for data with different accuracy. Experimental results on VGG16 and AlexNet network show that our accelerator has the most energy efficient and 101 times that of the CPU, 5.8 times that of the GPU. At the same time, it has higher energy efficiency than other convolutional neural network accelerators. |