Efficient breadth first search on multi-GPU systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Efficient breadth first search on multi-GPU systems

Authors:	Enrico Mastrostefano Massimo Bernaschi

Affiliation:	1. Sapienza University, Rome, Italy;2. Istituto per le Applicazioni del Calcolo, IAC-CNR, Rome, Italy

Abstract:	Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. To obtain a significant improvement on a single GPU and to scale by using multiple GPUs, we resort to a suitable combination of operations to rearrange data before processing them. We propose a novel technique for mapping threads to data that achieves a perfect load balance by leveraging prefix-sum and binary search operations. To reduce the communication overhead, we perform a pruning operation on the set of edges that needs to be exchanged at each BFS level. The result is an algorithm that exploits at its best the parallelism available on a single GPU and minimizes communication among GPUs. We show that a cluster of GPUs can efficiently perform a distributed BFS on graphs with billions of nodes.

Keywords:	GPU CUDA BFS Distributed algorithm Large graphs Graph 500 benchmark
本文献已被 ScienceDirect 等数据库收录！