Comparison of GPU Implementations of Row-wise and Column-wise Fundamental Algorithms

Daisuke Takafuji, Yasuaki Ito, Koji Nakano


In this paper, we treat GPU implementations of the following three fundamental algorithms for each 1-dimensional array, which is one of given L 1-dimensional arrays: (1) calculating the prefix sum, (2) sorting by Bubble sort, (3) sorting by Bitonic sort. There are two methods to store L 1-dimensional arrays as a 2-dimensional array: one is to store each 1-dimensional array as a row-wise array, and the other is to do each 1-dimensional array as a column-wise one. A 2-dimensional array stored by the first or second method is called by a row-wise array or a column-wise array, respectively. We compared with the computation times of applying GPU implementation of above three algorithms to a row-wise array and a column-wise array. The experiment results show that GPU implementation of column-wise prefix sum is speedup factor up to 46.53 over that of row-wise one, and that GPU implementation of column-wise Bubble sort or Bitonic sort is to 58.52 or 17.86, respectively, over row-wise one.

Full Text:



  • There are currently no refbacks.