排序方式: 共有7条查询结果,搜索用时 0 毫秒
1
1.
R. Bunk U. Leske R. Krompass Z. Pretch K. Rudolf R. Herbig K. Pitch V. A. Tsykanov O. V. Skiba V. A. Makarov L. P. Bol'shakov P. T. Porodnov A. A. Maershin S. S. Keruchen'ko 《Atomic Energy》1989,67(5):802-806
Translated from Atomnaya Énergiya, Vol. 67, No. 5, pp. 320–323, November, 1989. 相似文献
2.
By splitting a large broadcast message into segments and broadcasting the segments in a pipelined fashion, pipelined broadcast can achieve high performance in many systems. In this paper, we investigate techniques for efficient pipelined broadcast on clusters connected by multiple Ethernet switches. Specifically, we develop algorithms for computing various contention-free broadcast trees that are suitable for pipelined broadcast on Ethernet switched clusters, extend the parametrized LogP model for predicting appropriate segment sizes for pipelined broadcast, show that the segment sizes computed based on the model yield high performance, and evaluate various pipelined broadcast schemes through experimentation on Ethernet switched clusters with various topologies. The results demonstrate that our techniques are practical and efficient for contemporary fast Ethernet and Giga-bit Ethernet clusters. 相似文献
3.
Ahmad Faraj Pitch Patarasuk Xin Yuan 《International journal of parallel programming》2008,36(4):426-453
Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such
systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology
can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast
operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a
cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size
is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not
improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used.
All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters
with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm
our theoretical finding. 相似文献
4.
We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth optimality. Unlike the widely used butterfly-like all-reduce algorithm that incurs network contention in SMP/multi-core clusters, the proposed algorithm can achieve contention-free communication in almost all contemporary clusters, including SMP/multi-core clusters and Ethernet switched clusters with multiple switches. We demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large. 相似文献
5.
Ahmad Faraj Pitch Patarasuk Xin Yuan 《International journal of parallel programming》2008,36(6):543-570
Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on
the performance of the operation. In this work, we characterize the process arrival patterns in a set of MPI programs on two
common cluster platforms, use a micro-benchmark to study the process arrival patterns in MPI programs with balanced loads,
and investigate the impacts of different process arrival patterns on collective algorithms. Our results show that (1) the
differences between the times when different processes arrive at a collective operation are usually sufficiently large to
affect the performance; (2) application developers in general cannot effectively control the process arrival patterns in their
MPI programs in the cluster environment: balancing loads at the application level does not balance the process arrival patterns;
and (3) the performance of collective communication algorithms is sensitive to process arrival patterns. These results indicate
that process arrival pattern is an important factor that must be taken into consideration in developing and optimizing MPI
collective routines. We propose a scheme that achieves high performance with different process arrival patterns, and demonstrate
that by explicitly considering process arrival pattern, more efficient MPI collective routines than the current ones can be
obtained. 相似文献
6.
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters 总被引:1,自引:0,他引:1
Ahmad Faraj Xin Yuan Patarasuk P. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(2):264-276
We develop a message scheduling scheme for efficiently realizing all-to-all personalized communication (AAPC) on Ethernet switched clusters with one or more switches. To avoid network contention and achieve high performance, the message scheduling scheme partitions AAPC into phases such that 1) there is no network contention within each phase and 2) the number of phases is minimum. Thus, realizing AAPC with the contention-free phases computed by the message scheduling algorithm can potentially achieve the minimum communication completion time. In practice, phased AAPC schemes must introduce synchronizations to separate messages in different phases. We investigate various synchronization mechanisms and various methods for incorporating synchronizations into the AAPC phases. Experimental results show that the message scheduling-based AAPC implementations with proper synchronization consistently achieve high performance on clusters with many different network topologies when the message size is large 相似文献
7.
1