共查询到20条相似文献,搜索用时 15 毫秒
1.
顾乃杰 《计算机科学技术学报》2001,16(5):0-0
All-to-All personalized communication is a basic communication operation in a parallel computing environment.There are a lot of results appearing in literature.All these communication algorithms can be divided into two kinds:direct communication algorithm and indirect communication algorthm.The optimal dircet all-to-all communication algorithm on rings and 2-D tori does exist.But,for indirect all-to-all communication algorithms,there is a gap between the time complexity of the already existing algorithm and the lower bound,In this paper an efficient indirect algorithm for all-to-all communication on rings and 2-D square tori with bidirection channels is presented.The algorithms is faster than any previous indirect algorithms.The main items of the time complexity of the algorithm is 2^2/8 and p^3/2/8 on rings and 2-D tori respectively,both reaching the theoretical lower bound,where p is the number of processors. 相似文献
2.
并行双调排序算法的有效实现及性能分析 总被引:1,自引:0,他引:1
排序是计算机中最常见的操作之一,双调排序是一个非常著名的排离算法,也是最早的并行排序算法,又调排离对排序算法的研究具有非常深远的影响,基于双调排序算法的基本思想,介绍了双调排序在分布存储的并行计算机环境下的一种有效实现方式,采用局部多对多通信替换全局通信,很好地解决了双调排序中的通信问题,算法的计算复杂度为⊙n/p(logn log^2p),其中n为待排序的关键字个数,p为处理器数,算法在二维网孔结构上通信时间复杂度达到了O(2.12132√p.n/p)其量级达到了理论上的下限,分析结果表明,双调排序算法也具有很好的通信性能和可扩展性。 相似文献
3.
近优可扩展性:一种实用的可扩展性度量 总被引:2,自引:0,他引:2
良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。 相似文献
4.
Ahmad Faraj Pitch Patarasuk Xin Yuan 《International journal of parallel programming》2008,36(4):426-453
Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such
systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology
can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast
operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a
cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size
is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not
improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used.
All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters
with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm
our theoretical finding. 相似文献
5.
In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of "architecture algorithm programming application". Only in this way, parallel computing research becomes continuous development and more realistic. 相似文献
6.
Many real‐world optimization problems in the scientific and engineering fields can be solved by genetic algorithms (GAs) but it still requires a long execution time for complex problems. At the same time, there are many under‐utilized workstations on the Internet. In this paper, we present a self‐adaptive parallel GA system named APGAIN, which utilizes the spare power of the heterogeneous workstations on the Internet to solve complex optimization problems. In order to maintain a balance between exploitation and exploration, we have devised a novel probabilistic rule‐driven adaptive model (PRDAM) to adapt the GA parameters automatically. APGAIN is implemented on an Internet Computing system called DJM. In the implementation, we discover that DJM's original load balancing strategy is insufficient. Hence the strategy is extended with the job migration capability. The performance of the system is evaluated by solving the traveling salesman problem with data from a public database. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献
7.
介绍了重叠网格并行计算主从对之间通信量最小化方法,通过迷路算法将主网格点进行分类,在保证计算正确的前提下将主从间通信量降至最小;在嵌套重叠情况下的通信时序控制方面,提出了重叠关系有向图避免通信等待和重复插值;实验结果表明该重叠网格通信优化处理方法能得到较理想的并行效率。 相似文献
8.
9.
Rajeev Thakur Ravi Ponnusamy Alok Choudhary Geoffrey Fox 《The Journal of supercomputing》1995,8(4):305-328
The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed. 相似文献
10.
11.
提出了一个并行矩阵乘算法IPBPMM(Interconnected Processor-Based Parallel Matrix Multiplication).该算法运行在以五角形、Petersen图和Hoffman-Singleton图等直径为2的摩尔图(满足n=d2+1,n为节点数,d为度)为拓扑结构的由n个独立处理器构成的机群并行计算环境中.与基于二维环绕网孔阵列拓扑结构的Cannon和Fox等并行矩阵乘法算法相比较,IPBPMM算法通信开销较小,加速比更高,同时还具有矩阵分块可随机分布在各个节点中,无需事先按一定规律装入各节点中的特点.同时IPBPMM算法也能很好地扩充到由多个直径为2的摩尔图为拓扑结构组合构成的并行计算环境中,且随着网络的扩大,算法的并行加速比更高. 相似文献
12.
High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers 总被引:3,自引:0,他引:3
In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2p3q5r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2. 相似文献
13.
研发适应国产异构计算环境的高性能计算算法与软件是非常重要的课题,对我国高性能计算软件研发匹配高性能计算硬件高水平发展的速度具有重要意义.本文首先简要介绍高性能计算应用软件的现状、趋势和面临挑战,并对几类典型高性能计算应用软件开展并行计算算法特征分析,涵盖了宇宙N体模拟、地球系统模式、计算材料相场动力学、分子动力学、量子计算化学和格点量子色力学等多个问题、尺度和领域.其次,我们讨论了面向国产异构计算系统的对策,提炼出若干典型应用算法和软件的共性问题,涉及核心算法、算法发展、优化策略等.最后,本文面向异构计算体系结构对高性能计算算法与软件进行了总结. 相似文献
14.
给出一种对异构计算系统进行任务映射与调度的遗传算法-SMT-GA算法。首先对HCS任务调度问题作出形式描述,然后分别介绍SMT-GA算法的总体框架,染色体设计,从染色体获得调度方案的方法,染色体适合度函数设计,交叉与变异遗传算子设计等。 相似文献
15.
简要介绍了集群系统,指出其用于并行计算的工作原理,重点介绍MPI并行环境及其通信技术,并分析了MPI并行程序中的基本模式及其采用的通信技术。最后对构建MPI并行环境的集群系统进行了展望。 相似文献
16.
网络并行计算环境中网络通信开销的分析与测试 总被引:2,自引:0,他引:2
网络通信开销是影响网络并行计算的重要原因,但精确定量分析网络通信销中各个组成部分的报道不多。本文利用精度可达0.1微秒的计时工具,定量地分析了以太网中广泛使用的NetWare网络操作系统的网络层/传输层通信协议IPX/SPX与NetBIOS仿真会话层通信协议的性能,研究了网络通信销的主要因素,从而找出提高网络通信性能的途径。 相似文献
17.
18.
主要研究蜂窝环上的全广播路由算法.第一个全广播算法的设计思路是找到一条通过所有节点的路径,关键是确定边界上的一些特殊节点;第二个全广播算法应用了蜂窝环的哈密尔顿性质.假设一个有n个处理机的蜂窝环,前者每个节点有自己专用的路由策略,时间复杂度为3n,因为计算时间往往比数据传送时间低得多,所以总的通信时间可以降低到n;后者... 相似文献
19.
随着高速网络技术(如ATM)的出现,网络并行计算系统(NOW)已成为并行处理的主要平台,由于它的高通信延迟,某些在并行机上实现的细粒度并行算法已不适合在该环境下运行。为此,有必要对算法重新进行任务划分,研究它在网络环境中的并行实现。基于这一点,本文对矩阵的QR分解提出了一种新的任务划分策略,并由此得到了它的一种粗粒度并行算法,实验结果表明,设计的并行算法在网络并行计算环境中具有较高的加速比。 相似文献
20.
Ramsey理论是组合数学中一个庞大而又丰富的领域,在集合论、逻辑学、分析以及代数学上具有极重要的应用.Ramsey数的求解是非常困难的,迄今为止只求出9个Ramsey数的准确值.探讨了DNA生物分子超级计算在求解这一困难数学问题的可能性.将Adleman-Lipton模型生物操作与粘贴模型解空间相结合的DNA计算模型... 相似文献