期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

总被引：2，自引：0，他引：2

顾乃杰《计算机科学技术学报》2001,16(5):0-0

All-to-All personalized communication is a basic communication operation in a parallel computing environment.There are a lot of results appearing in literature.All these communication algorithms can be divided into two kinds:direct communication algorithm and indirect communication algorthm.The optimal dircet all-to-all communication algorithm on rings and 2-D tori does exist.But,for indirect all-to-all communication algorithms,there is a gap between the time complexity of the already existing algorithm and the lower bound,In this paper an efficient indirect algorithm for all-to-all communication on rings and 2-D square tori with bidirection channels is presented.The algorithms is faster than any previous indirect algorithms.The main items of the time complexity of the algorithm is 2^2/8 and p^3/2/8 on rings and 2-D tori respectively,both reaching the theoretical lower bound,where p is the number of processors. 相似文献

2.

并行双调排序算法的有效实现及性能分析 总被引：1，自引：0，他引：1

顾乃杰王旭陈国良蒋凡《计算机研究与发展》2002,39(10):1343-1348

排序是计算机中最常见的操作之一,双调排序是一个非常著名的排离算法,也是最早的并行排序算法,又调排离对排序算法的研究具有非常深远的影响,基于双调排序算法的基本思想,介绍了双调排序在分布存储的并行计算机环境下的一种有效实现方式,采用局部多对多通信替换全局通信,很好地解决了双调排序中的通信问题,算法的计算复杂度为⊙n/p(logn log^2p),其中n为待排序的关键字个数,p为处理器数,算法在二维网孔结构上通信时间复杂度达到了O（2.12132√p.n/p）其量级达到了理论上的下限,分析结果表明,双调排序算法也具有很好的通信性能和可扩展性。相似文献

3.

近优可扩展性：一种实用的可扩展性度量 总被引：2，自引：0，他引：2

陈军李晓梅《计算机学报》2001,24(2):179-182

良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。相似文献

4.

Bandwidth Efficient All-to-All Broadcast on Switched Clusters

Ahmad Faraj Pitch Patarasuk Xin Yuan 《International journal of parallel programming》2008,36(4):426-453

Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used. All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm our theoretical finding. 相似文献

5.

总被引：5，自引：0，他引：5

下载免费PDF全文

Guo-Liang Chen Guang-Zhong Sun Yun-Quan Zhang and Ze-Yao Mo 《计算机科学技术学报》2006,21(5):665-673

In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of ＂architecture algorithm programming application＂. Only in this way, parallel computing research becomes continuous development and more realistic. 相似文献

6.

Yuk‐Yin Wong Kin‐Hong Lee Kwong‐Sak Leung 《Concurrency and Computation》2003,15(6):581-606

Many real‐world optimization problems in the scientific and engineering fields can be solved by genetic algorithms (GAs) but it still requires a long execution time for complex problems. At the same time, there are many under‐utilized workstations on the Internet. In this paper, we present a self‐adaptive parallel GA system named APGAIN, which utilizes the spare power of the heterogeneous workstations on the Internet to solve complex optimization problems. In order to maintain a balance between exploitation and exploration, we have devised a novel probabilistic rule‐driven adaptive model (PRDAM) to adapt the GA parameters automatically. APGAIN is implemented on an Internet Computing system called DJM. In the implementation, we discover that DJM's original load balancing strategy is insufficient. Hence the strategy is extended with the job migration capability. The performance of the system is evaluated by solving the traveling salesman problem with data from a public database. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

7.

重叠网格CFD并行计算的通信优化研究

刘鑫陆林生《计算机工程与设计》2006,27(24):4611-4614

介绍了重叠网格并行计算主从对之间通信量最小化方法,通过迷路算法将主网格点进行分类,在保证计算正确的前提下将主从间通信量降至最小;在嵌套重叠情况下的通信时序控制方面,提出了重叠关系有向图避免通信等待和重复插值;实验结果表明该重叠网格通信优化处理方法能得到较理想的并行效率。相似文献

8.

数据网格中基于优化机制的通信模型

涂占乐陈庆奎席与亨《微计算机信息》2006,22(21):55-57

针对基于多计算机机群构成的网格的大规模并行计算的需要,对多级分组通信模型的单一机群分组通信进行了研究。探讨了在单一机群内的主动节点、被动节点个数和各个计算节点的能力以及机群网络的带宽之间的形式化关系,优化了通信结构,描述了基于能力优化机制的通信模型。理论和试验表明,该模型充分利用了机群的计算节点能力、网络通信能力。该模型适合基于网格的并行计算。相似文献

9.

Complete exchange on the CM-5 and Touchstone Delta

Rajeev Thakur Ravi Ponnusamy Alok Choudhary Geoffrey Fox 《The Journal of supercomputing》1995,8(4):305-328

The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed. 相似文献

10.

面向异构计算的高性能计算算法与软件

下载免费PDF全文

徐顺王武张鉴姜金荣金钟迟学斌《软件学报》2021,32(8):2365-2376

研发适应国产异构计算环境的高性能计算算法与软件是非常重要的课题,对我国高性能计算软件研发匹配高性能计算硬件高水平发展的速度具有重要意义.首先,简要介绍高性能计算应用软件的现状、趋势和面临挑战,并对几类典型高性能计算应用软件开展并行计算算法特征分析,涵盖了宇宙N体模拟、地球系统模式、计算材料相场动力学、分子动力学、量子计... 相似文献

11.

基于直径为2的摩尔图网络的并行矩阵乘算法

张冰《计算机学报》2013,36(9)

提出了一个并行矩阵乘算法IPBPMM(Interconnected Processor-Based Parallel Matrix Multiplication).该算法运行在以五角形、Petersen图和Hoffman-Singleton图等直径为2的摩尔图(满足n=d2+1,n为节点数,d为度)为拓扑结构的由n个独立处理器构成的机群并行计算环境中.与基于二维环绕网孔阵列拓扑结构的Cannon和Fox等并行矩阵乘法算法相比较,IPBPMM算法通信开销较小,加速比更高,同时还具有矩阵分块可随机分布在各个节点中,无需事先按一定规律装入各节点中的特点.同时IPBPMM算法也能很好地扩充到由多个直径为2的摩尔图为拓扑结构组合构成的并行计算环境中,且随着网络的扩大,算法的并行加速比更高. 相似文献

12.

High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers 总被引：3，自引：0，他引：3

Takahashi Daisuke Kanada Yasumasa 《The Journal of supercomputing》2000,15(2):207-228

In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT algorithms. In our parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order.We also show that the suitability of a parallel FFT algorithm is machine-dependent because of the differences in the architecture of the processor elements in distributed-memory parallel computers. Experimental results of 2^p3^q5^r point FFTs on distributed-memory parallel computers, HITACHI SR2201 and IBM SP2 are reported. We succeeded to get performances of about 130 GFLOPS on a 1024PE HITACHI SR2201 and about 1.25 GFLOPS on a 32PE IBM SP2. 相似文献

13.

面向异构计算的高性能计算算法与软件

下载免费PDF全文

徐顺王武张鉴姜金荣金钟迟学斌《软件学报》2020,31(7)

研发适应国产异构计算环境的高性能计算算法与软件是非常重要的课题,对我国高性能计算软件研发匹配高性能计算硬件高水平发展的速度具有重要意义.本文首先简要介绍高性能计算应用软件的现状、趋势和面临挑战,并对几类典型高性能计算应用软件开展并行计算算法特征分析,涵盖了宇宙N体模拟、地球系统模式、计算材料相场动力学、分子动力学、量子计算化学和格点量子色力学等多个问题、尺度和领域.其次,我们讨论了面向国产异构计算系统的对策,提炼出若干典型应用算法和软件的共性问题,涉及核心算法、算法发展、优化策略等.最后,本文面向异构计算体系结构对高性能计算算法与软件进行了总结. 相似文献

14.

异构计算系统的任务调度算法SMT=GA

陆鑫达郑飞《小型微型计算机系统》1999,20(4):241-245

给出一种对异构计算系统进行任务映射与调度的遗传算法－ＳＭＴ－ＧＡ算法。首先对ＨＣＳ任务调度问题作出形式描述,然后分别介绍ＳＭＴ－ＧＡ算法的总体框架,染色体设计,从染色体获得调度方案的方法,染色体适合度函数设计,交叉与变异遗传算子设计等。相似文献

15.

MPI集群通信技术浅析

CHEN Yan HAO Li-rui 《数字社区&智能家居》2008,(23)

简要介绍了集群系统,指出其用于并行计算的工作原理,重点介绍MPI并行环境及其通信技术,并分析了MPI并行程序中的基本模式及其采用的通信技术。最后对构建MPI并行环境的集群系统进行了展望。相似文献

16.

网络并行计算环境中网络通信开销的分析与测试 总被引：2，自引：0，他引：2

杜毅李三立《小型微型计算机系统》1996,17(11):6-14

网络通信开销是影响网络并行计算的重要原因，但精确定量分析网络通信销中各个组成部分的报道不多。本文利用精度可达０．１微秒的计时工具，定量地分析了以太网中广泛使用的ＮｅｔＷａｒｅ网络操作系统的网络层／传输层通信协议ＩＰＸ／ＳＰＸ与ＮｅｔＢＩＯＳ仿真会话层通信协议的性能，研究了网络通信销的主要因素，从而找出提高网络通信性能的途径。相似文献

17.

OilCL:一个面向油藏数值模拟并行计算的通信库 总被引：2，自引：0，他引：2

熊玉庆曹建文张祥《计算机学报》2000,23(7):744-749

ＯｉｌＣＬ是一个用于油藏数值模拟计算的可移植的通信库,虽然目前存在很多的通信库,如ＭＰＩ,ＰＶＭ等,但由于它们的通用性,而且其界面较低级而不适合油藏模拟数值计算,ＯｉｌＣＬ为油藏数值模拟计算程序员提供一个方便、自然的界面,它支持动态地建立和释放通信上下文／逻辑进程网格;支持基于源的消息选择;逻辑拓扑作为群通信子程序的参数并提供开发和运行模式,这些机制便于油藏数值模拟计算程序的设计,使程序可读性更强相似文献

18.

蜂窝环上的全广播算法*

殷玉玲《计算机应用研究》2011,28(7):2492-2493

主要研究蜂窝环上的全广播路由算法.第一个全广播算法的设计思路是找到一条通过所有节点的路径,关键是确定边界上的一些特殊节点;第二个全广播算法应用了蜂窝环的哈密尔顿性质.假设一个有n个处理机的蜂窝环,前者每个节点有自己专用的路由策略,时间复杂度为3n,因为计算时间往往比数据传送时间低得多,所以总的通信时间可以降低到n;后者... 相似文献

19.

网络并行计算中矩阵QR分解的并行算法

张艳孙世新《计算机应用》2000,20(10):29-32

随着高速网络技术（如ＡＴＭ）的出现,网络并行计算系统（ＮＯＷ）已成为并行处理的主要平台,由于它的高通信延迟,某些在并行机上实现的细粒度并行算法已不适合在该环境下运行。为此,有必要对算法重新进行任务划分,研究它在网络环境中的并行实现。基于这一点,本文对矩阵的ＱＲ分解提出了一种新的任务划分策略,并由此得到了它的一种粗粒度并行算法,实验结果表明,设计的并行算法在网络并行计算环境中具有较高的加速比。相似文献

20.

一种求解Ramsey数的DNA计算机算法

李肯立郭里唐卓江勇李仁发《计算机研究与发展》2011,48(3):447-454

Ramsey理论是组合数学中一个庞大而又丰富的领域,在集合论、逻辑学、分析以及代数学上具有极重要的应用.Ramsey数的求解是非常困难的,迄今为止只求出9个Ramsey数的准确值.探讨了DNA生物分子超级计算在求解这一困难数学问题的可能性.将Adleman-Lipton模型生物操作与粘贴模型解空间相结合的DNA计算模型... 相似文献