Efficient breadth first search on multi-GPU systems |
| |
Authors: | Enrico Mastrostefano Massimo Bernaschi |
| |
Affiliation: | 1. Sapienza University, Rome, Italy;2. Istituto per le Applicazioni del Calcolo, IAC-CNR, Rome, Italy |
| |
Abstract: | Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. To obtain a significant improvement on a single GPU and to scale by using multiple GPUs, we resort to a suitable combination of operations to rearrange data before processing them. We propose a novel technique for mapping threads to data that achieves a perfect load balance by leveraging prefix-sum and binary search operations. To reduce the communication overhead, we perform a pruning operation on the set of edges that needs to be exchanged at each BFS level. The result is an algorithm that exploits at its best the parallelism available on a single GPU and minimizes communication among GPUs. We show that a cluster of GPUs can efficiently perform a distributed BFS on graphs with billions of nodes. |
| |
Keywords: | GPU CUDA BFS Distributed algorithm Large graphs Graph 500 benchmark |
本文献已被 ScienceDirect 等数据库收录! |
|