首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
巨量并行处理(MPP)强调并行系统结构和并行算法的可扩放性。在一个可扩放的并行系统结构上,可扩放的并行算法应该能够有效地利用不断增加的处理机,算法的有效性通常以算法运行时的处理机效率来衡量。一个被普遍忽视的因素是通讯效率,这是一个具有一般性的问题。本文给出了通讯效率的定义,研究了它与处理机效率的关系,并通过对一个典型算法的运行情况分析,研究了几个常见的并行系统结构的通讯效率。本文的结果表明:处理机效率和通讯效率的综合才能全面地评价算法的可扩放性并指导并行系统结构的设计。  相似文献   

2.
王华君 《福建电脑》2006,(10):48-48,79
并行处理是并行计算机的关键技术,它包括并行结构、并行算法、并行操作系统、并行语言及其编译系统等,而并行算法设计是最基础最重要的内容,本文针对三种并行算法设计方法中”由对象本身出发的并行算法设计”进行了探讨。  相似文献   

3.
采用LogP模型[1] 对FIR滤波器分块并行算法进行了详尽的分析 ,讨论了在网络并行环境中 ,分块长度对分块并行算法效率的影响 ,提出了提前发送数据块的传输模式 ,得到该模式下的并行效率 ,并在网络并行环境上进行了测试。  相似文献   

4.
基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。 关键词:SMP集群;MPI+OpenMP;混合并行;并行求解器  相似文献   

5.
基于MPI的并行计算集群通信及应用   总被引:4,自引:0,他引:4  
对能有效解大型稀疏矩阵方程的LSQR串行算法进行了并行化分析,并应用可移植消息传递标准MPI的集群通信机制在分布式存储并行系统上设计和实现了LSQR并行算法,该并行算法和程序在地震表层模型层析反演中得到了有效的应用。  相似文献   

6.
由于并行计算机的出现,并行算法的研究已事在必行。该领域的工作,目前大体上可归为4类:从事算法思想的研究,如同步并行、异步并行等;构造并行算法,在各个科技计算分支上都或有所见,其中基本代数运算更有基础意义;改造已有算法,使之具有更高的  相似文献   

7.
本文提出一种求解大规模稀疏矩阵特征问题的并行共轭梯度算法.为了提高算法的并行效率,设计了负载平衡的行划分方式,实现了计算和通信重叠的稀疏矩阵重排序方法,通过预处理减少计算过程中各进程间消息传递的通信量.另外,基于多核处理器高性能并行计算,实现了MPI和细粒度(线程级)OpenMP混合并行算法.在深腾7800并行计算机上对并行算法进行了测试,结果表明在进程数增多时并行算法可保持通信时间稳定性,在并行计算机上有很好的扩展性,适合大规模稀疏特征问题的求解.  相似文献   

8.
王兴伟  刘聪  崔建业  黄敏 《计算机应用》2005,25(9):2094-2097
QoS需求的区间表示形式体现了对柔性与异构QoS的支持;根据微观经济学理论与方法,建立基于Kelly/PSP模型的定价策略,体现组间公平性;使用下游链路均分方法在组成员之间分摊费用,体现组内公平性;基于并行化点火耦合神经网络,建立智能QoS组播路由并行算法,充分挖掘点火耦合神经网络内在的并行能力,而且具备对网络规模与问题规模的良好可伸缩性。以上各方面有机结合,构成IP/DWDM光Internet中的并行公平智能QoS组播路由机制。仿真结果表明,该机制是可行和有效的,其时间效率优于相应的串行算法。  相似文献   

9.
魏琼 《程序员》2008,(8):90-92
本文分析和介绍了如何在Cell上实现矩阵求逆的并行算法,从而提高矩阵求逆的运算速度,提到的矩阵求逆并行算法对于其他的多核并行处理器具有通用性。  相似文献   

10.
基于数据并行的重启动Arnoldi并行算法,基于使用数据并行模型的重启动Arnoldi并行算法,提出一个精化重启动Arnoldi并行算法。为了降低弱扩展性对并行性能的负面影响,该算法使用任务图模型并行计算精化向量,减少处理器进程之间的通信次数,有效地实现并行计算。在KD-50-I万亿次机上的测试结果表明,该算法具有较好的可扩展性和并行 效率。  相似文献   

11.
并行算法与并行机相结合的可扩展性   总被引:6,自引:1,他引:5  
可扩展性是设计并行算法和高性能并行机所要考虑的一个重要问题。文中首先分析了等效率和等速度两种可扩展性评价准则,指出其优缺点,然后在分析并行计算时间的基础上提出一种新的可扩展性评价准则(等并行开销计算比可扩展性评价准则),新准则可用来评价并行算法与并行机相结合的可扩展性。最后用该评价准则分析了两个并行算法与YH03高性能并行机相结合的可扩展性。  相似文献   

12.
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe for parallel coarsening, parallel greedy k-way refinement and parallel multi-phase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a state-of-the-art serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multi-phase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost.  相似文献   

13.
This paper presents the analysis of a parallel formulation of depth-first search. At the heart of this parallel formulation is a dynamic work-distribution scheme that divides the work between different processors. The effectiveness of the parallel formulation is strongly influenced by the work-distribution scheme and the target architecture. We introduce the concept of isoefficiency function to characterize the effectiveness of different architectures and work-distribution schemes. Many researchers considered the ring architecture to be quite suitable for parallel depth-first search. Our analytical and experimental results show that hypercube and shared-memory architectures are significantly better. The analysis of previously known work-distribution schemes motivated the design of substantially improved schemes for ring and shared-memory architectures. In particular, we present a work-distribution algorithm that guarantees close to optimal performance on a shared-memory/-network-with-message-combining architecture (e.g. RP3). Much of the analysis presented in this paper is applicable to other parallel algorithms in which work is dynamically shared between different processors (e.g., parallel divide-and-conquer algorithms). The concept of isoefficiency is useful in characterizing the scalability of a variety of parallel algorithms.This work was supported by Army Research Office Grant No. DAAG29-84-K-0060 to the Artificial Intelligence Laboratory, and Office of Naval Research Grant N00014-86-K-0763 to the Computer Science Department at the University of Texas at Austin.  相似文献   

14.
We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al. (1) and Schnorret al. (2) have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort.  相似文献   

15.
This paper presents a new formulation of the isoefficiency function which can be applied to parallel systems executing balanced or unbalanced workloads. This new formulation allows analyzing the scalability of parallel systems under either balanced or unbalanced workloads. Finally, the validity of this new metric is evaluated using some synthetic benchmarks. The experimental results allow assessing the importance of considering the unbalanced workloads while analyzing the scalability of parallel systems.  相似文献   

16.
The authors present the scalability analysis of a parallel fast Fourier transform (FFT) algorithm on mesh and hypercube connected multicomputers using the isoefficiency metric. The isoefficiency function of an algorithm architecture combination is defined as the rate at which the problem size should grow with the number of processors to maintain a fixed efficiency. It is shown that it is more cost-effective to implement the FFT algorithm on a hypercube rather than a mesh despite the fact that large scale meshes are cheaper to construct than large hypercubes. Although the scope of this work is limited to the Cooley-Tukey FFT algorithm on a few classes of architectures, the methodology can be used to study the performance of various FFT algorithms on a variety of architectures such as SIMD hypercube and mesh architectures and shared memory architecture  相似文献   

17.
In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies.  相似文献   

18.
随着超级计算机向着更大规模趋势发展,并行算法与并行机相结合的可扩展性日益得到重视,特别是对实际应用程序的可扩展性研究愈为迫切.新的并行机的发展己成为科学计算本身的一个巨大挑战.目前仍然缺乏能求解“巨大挑战性问题”的数值方法和并行度高、可扩展性好的应用软件.大规模并行计算的一个关键问题是可扩展性问题[1].不可能期望通过将串行代码移植到并行系统上就能获得很大的性能增益.当处理机节点数超过64,16甚至8时,这种做法将使可扩展性降低.我国目前仍局限于中小型计算,原有算法和并行软件是否能求解更大规模问题是个值得关注的问题.  相似文献   

19.
This paper presents a new expression for an isoefficiency function which can be applied both to homogeneous and heterogeneous systems. Using this new function, called H-isoefficiency, it is now possible to analyze the scalability of heterogeneous clusters. In order to show how this new metric can be used, a theoretical a priori analysis of the scalability of a Gauss Elimination algorithm is presented, together with a model evaluation which demonstrates the correlation between the theoretical analysis and the experimental results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号