Performing BMMC Permutations Efficiently on Distributed-Memory Multiprocessors with MPI期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Performing BMMC Permutations Efficiently on Distributed-Memory Multiprocessors with MPI

Authors:	T H Cormen J C Clippinger

Affiliation:	(1) Department of Computer Science, Dartmouth College, Hanover, NH 03755-3510, USA. thc@cs.dartmouth.edu, james@cs.dartmouth.edu., US

Abstract:	This paper presents an architecture-independent method for performing BMMC permutations on multiprocessors with distributed memory. All interprocessor communication uses the MPI function MPI_Sendrecv_replace(). The number of elements and number of processors must be powers of 2, with at least one element per processor, and there is no inherent upper bound on the ratio of elements per processor. Our method transmits only data without transmitting any source or target indices, which conserves network bandwidth. When data is transmitted, the source and target processors implicitly agree on each other's identity and the indices of the elements being transmitted. A C-callable implementation of our method is available from Netlib. The implementation allows preprocessing (which incurs a modest cost) to be factored out for multiple runs of the same permutation, even if on different data. Data may be laid out in any one of several ways: processor-major, processor-minor, or anything in between. Experimental results indicate that our method works well compared with several other candidate methods on three different platforms. In particular, the slower the interconnection network, the greater the relative advantage of our method. Received June 1, 1997; revised March 10, 1998.

Keywords:	, BMMC permutations, Affine transformations, Distributed-memory multiprocessors, MPI,
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏