An Efficient Convolutional Neural Network Computation using AVX-512 Instructions
Recently, Convolutional Neural Networks (CNNs) are widely used for image processing. Since the computation cost is high, it is necessary to accelerate the computation. Therefore, in this paper, we propose an efficient implementation using Intel AVX-512 instructions on the multicore CPUs. AVX-512 instructions suppose 512-bit vector operations, in which 16 32-bit floating point number operations can be executed simultaneously. In this implementation, to reduce the computation, we use an idea of the fused filter that combines a convolutional layer and its following pooling layer. As a result, we achieve a speed-up factor of 1.62 over an existing library implementation using Intel Math Kernel Library for Deep Neural Networks.
- There are currently no refbacks.