首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
  总被引:2,自引:0,他引:2       下载免费PDF全文
All-to-All personalized communication is a basic communication operation in a parallel computing environment.There are a lot of results appearing in literature.All these communication algorithms can be divided into two kinds:direct communication algorithm and indirect communication algorthm.The optimal dircet all-to-all communication algorithm on rings and 2-D tori does exist.But,for indirect all-to-all communication algorithms,there is a gap between the time complexity of the already existing algorithm and the lower bound,In this paper an efficient indirect algorithm for all-to-all communication on rings and 2-D square tori with bidirection channels is presented.The algorithms is faster than any previous indirect algorithms.The main items of the time complexity of the algorithm is 2^2/8 and p^3/2/8 on rings and 2-D tori respectively,both reaching the theoretical lower bound,where p is the number of processors.  相似文献   

2.
并行双调排序算法的有效实现及性能分析   总被引:1,自引:0,他引:1  
排序是计算机中最常见的操作之一,双调排序是一个非常著名的排离算法,也是最早的并行排序算法,又调排离对排序算法的研究具有非常深远的影响,基于双调排序算法的基本思想,介绍了双调排序在分布存储的并行计算机环境下的一种有效实现方式,采用局部多对多通信替换全局通信,很好地解决了双调排序中的通信问题,算法的计算复杂度为⊙n/p(logn log^2p),其中n为待排序的关键字个数,p为处理器数,算法在二维网孔结构上通信时间复杂度达到了O(2.12132√p.n/p)其量级达到了理论上的下限,分析结果表明,双调排序算法也具有很好的通信性能和可扩展性。  相似文献   

3.
近优可扩展性:一种实用的可扩展性度量   总被引:2,自引:0,他引:2  
陈军  李晓梅 《计算机学报》2001,24(2):179-182
良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。  相似文献   

4.
Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used. All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm our theoretical finding.  相似文献   

5.
  总被引:5,自引:0,他引:5       下载免费PDF全文
In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of "architecture algorithm programming application". Only in this way, parallel computing research becomes continuous development and more realistic.  相似文献   

6.
    
Many real‐world optimization problems in the scientific and engineering fields can be solved by genetic algorithms (GAs) but it still requires a long execution time for complex problems. At the same time, there are many under‐utilized workstations on the Internet. In this paper, we present a self‐adaptive parallel GA system named APGAIN, which utilizes the spare power of the heterogeneous workstations on the Internet to solve complex optimization problems. In order to maintain a balance between exploitation and exploration, we have devised a novel probabilistic rule‐driven adaptive model (PRDAM) to adapt the GA parameters automatically. APGAIN is implemented on an Internet Computing system called DJM. In the implementation, we discover that DJM's original load balancing strategy is insufficient. Hence the strategy is extended with the job migration capability. The performance of the system is evaluated by solving the traveling salesman problem with data from a public database. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

7.
介绍了重叠网格并行计算主从对之间通信量最小化方法,通过迷路算法将主网格点进行分类,在保证计算正确的前提下将主从间通信量降至最小;在嵌套重叠情况下的通信时序控制方面,提出了重叠关系有向图避免通信等待和重复插值;实验结果表明该重叠网格通信优化处理方法能得到较理想的并行效率。  相似文献   

8.
针对基于多计算机机群构成的网格的大规模并行计算的需要,对多级分组通信模型的单一机群分组通信进行了研究。探讨了在单一机群内的主动节点、被动节点个数和各个计算节点的能力以及机群网络的带宽之间的形式化关系,优化了通信结构,描述了基于能力优化机制的通信模型。理论和试验表明,该模型充分利用了机群的计算节点能力、网络通信能力。该模型适合基于网格的并行计算。  相似文献   

9.
The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed.  相似文献   

10.
徐顺  王武  张鉴  姜金荣  金钟  迟学斌 《软件学报》2021,32(8):2365-2376
研发适应国产异构计算环境的高性能计算算法与软件是非常重要的课题,对我国高性能计算软件研发匹配高性能计算硬件高水平发展的速度具有重要意义.首先,简要介绍高性能计算应用软件的现状、趋势和面临挑战,并对几类典型高性能计算应用软件开展并行计算算法特征分析,涵盖了宇宙N体模拟、地球系统模式、计算材料相场动力学、分子动力学、量子计...  相似文献   

11.
提出了一个并行矩阵乘算法IPBPMM(Interconnected Processor-Based Parallel Matrix Multiplication).该算法运行在以五角形、Petersen图和Hoffman-Singleton图等直径为2的摩尔图(满足n=d2+1,n为节点数,d为度)为拓扑结构的由n个独立处理器构成的机群并行计算环境中.与基于二维环绕网孔阵列拓扑结构的Cannon和Fox等并行矩阵乘法算法相比较,IPBPMM算法通信开销较小,加速比更高,同时还具有矩阵分块可随机分布在各个节点中,无需事先按一定规律装入各节点中的特点.同时IPBPMM算法也能很好地扩充到由多个直径为2的摩尔图为拓扑结构组合构成的并行计算环境中,且随着网络的扩大,算法的并行加速比更高.  相似文献   

12.
In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2p3q5r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2.  相似文献   

13.
研发适应国产异构计算环境的高性能计算算法与软件是非常重要的课题,对我国高性能计算软件研发匹配高性能计算硬件高水平发展的速度具有重要意义.本文首先简要介绍高性能计算应用软件的现状、趋势和面临挑战,并对几类典型高性能计算应用软件开展并行计算算法特征分析,涵盖了宇宙N体模拟、地球系统模式、计算材料相场动力学、分子动力学、量子计算化学和格点量子色力学等多个问题、尺度和领域.其次,我们讨论了面向国产异构计算系统的对策,提炼出若干典型应用算法和软件的共性问题,涉及核心算法、算法发展、优化策略等.最后,本文面向异构计算体系结构对高性能计算算法与软件进行了总结.  相似文献   

14.
给出一种对异构计算系统进行任务映射与调度的遗传算法-SMT-GA算法。首先对HCS任务调度问题作出形式描述,然后分别介绍SMT-GA算法的总体框架,染色体设计,从染色体获得调度方案的方法,染色体适合度函数设计,交叉与变异遗传算子设计等。  相似文献   

15.
简要介绍了集群系统,指出其用于并行计算的工作原理,重点介绍MPI并行环境及其通信技术,并分析了MPI并行程序中的基本模式及其采用的通信技术。最后对构建MPI并行环境的集群系统进行了展望。  相似文献   

16.
网络并行计算环境中网络通信开销的分析与测试   总被引:2,自引:0,他引:2  
网络通信开销是影响网络并行计算的重要原因,但精确定量分析网络通信销中各个组成部分的报道不多。本文利用精度可达0.1微秒的计时工具,定量地分析了以太网中广泛使用的NetWare网络操作系统的网络层/传输层通信协议IPX/SPX与NetBIOS仿真会话层通信协议的性能,研究了网络通信销的主要因素,从而找出提高网络通信性能的途径。  相似文献   

17.
OilCL:一个面向油藏数值模拟并行计算的通信库   总被引:2,自引:0,他引:2  
OilCL是一个用于油藏数值模拟计算的可移植的通信库,虽然目前存在很多的通信库,如MPI,PVM等,但由于它们的通用性,而且其界面较低级而不适合油藏模拟数值计算,OilCL为油藏数值模拟计算程序员提供一个方便、自然的界面,它支持动态地建立和释放通信上下文/逻辑进程网格;支持基于源的消息选择;逻辑拓扑作为群通信子程序的参数并提供开发和运行模式,这些机制便于油藏数值模拟计算程序的设计,使程序可读性更强  相似文献   

18.
主要研究蜂窝环上的全广播路由算法.第一个全广播算法的设计思路是找到一条通过所有节点的路径,关键是确定边界上的一些特殊节点;第二个全广播算法应用了蜂窝环的哈密尔顿性质.假设一个有n个处理机的蜂窝环,前者每个节点有自己专用的路由策略,时间复杂度为3n,因为计算时间往往比数据传送时间低得多,所以总的通信时间可以降低到n;后者...  相似文献   

19.
张艳  孙世新 《计算机应用》2000,20(10):29-32
随着高速网络技术(如ATM)的出现,网络并行计算系统(NOW)已成为并行处理的主要平台,由于它的高通信延迟,某些在并行机上实现的细粒度并行算法已不适合在该环境下运行。为此,有必要对算法重新进行任务划分,研究它在网络环境中的并行实现。基于这一点,本文对矩阵的QR分解提出了一种新的任务划分策略,并由此得到了它的一种粗粒度并行算法,实验结果表明,设计的并行算法在网络并行计算环境中具有较高的加速比。  相似文献   

20.
Ramsey理论是组合数学中一个庞大而又丰富的领域,在集合论、逻辑学、分析以及代数学上具有极重要的应用.Ramsey数的求解是非常困难的,迄今为止只求出9个Ramsey数的准确值.探讨了DNA生物分子超级计算在求解这一困难数学问题的可能性.将Adleman-Lipton模型生物操作与粘贴模型解空间相结合的DNA计算模型...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号