首页 | 本学科首页   官方微博 | 高级检索  
     

一种数据并行中的群通信优化策略
引用本文:王珏,胡长军,张纪林,李建江. 一种数据并行中的群通信优化策略[J]. 计算机学报, 2008, 31(2): 318-328
作者姓名:王珏  胡长军  张纪林  李建江
作者单位:北京科技大学信息工程学院,北京,100083;北京科技大学信息工程学院,北京,100083;北京科技大学信息工程学院,北京,100083;北京科技大学信息工程学院,北京,100083
基金项目:国家高技术研究发展计划(863计划) , 国家自然科学基金 , 教育部科学技术研究重点项目
摘    要:群通信是影响大规模数据并行系统效率的关键因素,其主要发生在程序不同阶段间的数组重分布与循环划分后的数组重映射这两种情况.在一次通信中显著影响群通信效率常被忽视的因素是消息冲突和消息长度的不一致.因为它们会导致进程间大量的空闲等待时间.然而以前的研究要么不能完全避免消息冲突,要么针对某些特殊情况.对此,提出了在数组分布为Block_Cyclic(k)情况下的一种更具有普遍适用性的通信调度策略CSS.通过证明表明该策略能使一个通信步内的消息互不冲突且消息长度尽量相等.从而最小化通信调度生成时间和实际通信时间.最后的测试结果也表明,与传统的通信优化算法和MPI_Alltoallv实现相比,CSS策略使得通信效率得以明显提高.

关 键 词:并行编译  数据并行  组通信  数组重分布  分布内存
收稿时间:2006-10-23
修稿时间:2007-10-09

An Optimized Strategy for Collective Communication in Data Parallelism
WANG Jue,HU Chang-Jun,ZHANG Ji-Lin,LI Jian-Jiang. An Optimized Strategy for Collective Communication in Data Parallelism[J]. Chinese Journal of Computers, 2008, 31(2): 318-328
Authors:WANG Jue  HU Chang-Jun  ZHANG Ji-Lin  LI Jian-Jiang
Abstract:Collective communication significantly influences the performance of data parallel applications.It is required often in two situations:One is array redistribution from phase to phase;another is data remapping after loop partition.Nevertheless,an important factor that influences the efficiency of collective communication is often neglected:When there is node contention and difference among message lengths during one particular communication step,a larger communication idle time may occur.In previous works,researchers can't completely avoid communication conflict and focus on some special cases.This paper is devoted to develop an universal and efficient communication scheduling strategy(CSS)concerning with the situation where array distributions are Block_Cyclic(k).Base on the proof for the recursive theorem of communication table elements,this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step.And the messages with the close size are put into a communication step as near as possible.This indicates that the strategy not only avoids inter-processor contention,but it also minimizes real communication cost in each communication step.Finally,experimental results show that CSS has better performance than the general method and the implementation of MPI_Alltoallv.
Keywords:parallel compiling  data parallelism  collective communication  array redistribution  distributed memory
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号