Fast Algorithm of Unified Layer Performing Convolution and Average Pooling on the GPU

Hiroki Tokura, Takahiro Nishimura, Yasuaki Ito, Koji Nakano, Akihiko Kasagi, Tsuguchika Tabaru


Recently,Convolutional Neural Networks (CNN) have made a major contribution in the field of recognition. CNN has multiple convolution layers and convolution operations are needed large number of floating-point operations. So,convolution operations are bottleneck of CNN. The main contribution of this paper is to present new methods for convolution and average pooling computation on the GPU. First,we present fused filter method. An average pooling can be considered as a kernel. A convolution kernel and an average kernel can be fused. In fused filter method,convolution using fused filter reduces floating-point operations for convolution and pooling computation. Also,we present direct sum method. Convolution and average pooling computation are commutative. In direct sum method, switching convolution and average pooling computation reduces floating point operations for convolution and pooling computation. Experimental results using NVIDIA V100 show direct sum method attains a speed-up factor of up to 1.8 (single precision) and 4.4 (half precision) over cuDNN naive implementation.


Deep Learning; Neural Network; Convolution; Average Pooling; GPU

Full Text:



  • There are currently no refbacks.