Accelerating GPU implementation of All Pairs Shortest Path Algorithm

Daisuke Takafuji, Koji Nakano, Yasuaki Ito


The Floyd-Warshall algorithm is a well-known algorithm to compute the distance of
all pairs of nodes of a graph.
The Blocked Floyd-Warshall algorithm, a variant of the Floyd-Warshall has been
proposed to accelerate the Fold-Warshall algorithm on the GPU.
The previously published GPU implementations for the Blocked Floyd-Warshall algorithm
performs many separated kernel calls for costly barrier synchronization.
The main contribution of this paper is to present a more sophisticated implementation
of the Blocked Floyd-Warshall algorithm, which performs no barrier synchronization and
invokes only one kernel call.
Experimental results using NVIDIA Tesla V100 show that
our implementation is 1.02-1.15 times faster than the previously published implementation.
We also propose efficient GPU implementation to execute the Blocked Floyd-Warshall algorithm
for many graphs at the same time using the SKSS technique.
From the experimental results, our implementation is 1.02-1.46 times faster than previously published
implementation for multiple graphs.

Full Text:



  • There are currently no refbacks.