An Efficient Convolutional Neural Network Computation using AVX-512 Instructions

Hiroki Kataoka, Kohei Yamashita, Koji Nakano, Yasuaki Ito, Akihiko Kasagi, Tsuguchika Tabaru

Abstract


Recently, Convolutional Neural Networks (CNNs) are widely used for image processing. Since the computation cost is high, it is necessary to accelerate the computation. Therefore, in this paper, we propose an efficient implementation using Intel AVX-512 instructions on the multicore CPUs. AVX-512 instructions suppose 512-bit vector operations, in which 16 32-bit floating point number operations can be executed simultaneously. In this implementation, to reduce the computation, we use an idea of the fused filter that combines a convolutional layer and its following pooling layer. As a result, we achieve a speed-up factor of 1.62 over an existing library implementation using Intel Math Kernel Library for Deep Neural Networks.


Keywords


Deep Learning; Neural Networks; Convolution; Average Pooling

Full Text:

PDF

Refbacks

  • There are currently no refbacks.