首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm.

The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm.

Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses.

The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression (p +6)/4.

Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization.  相似文献   


2.
随着多处理器的出现,并行技术受到了广泛的关注,成为了加速处理问题速度的重要技术.但是使用并行技术在加速计算的同时也带来了对处理器数量需求的急剧提升,并行成本的显著增加.针对这一问题,通过研究基于PRAM (Parallel Random Access Machine)下的3种最大值查找并行算法中的不足,提出了一种比平衡树算法,快速查找法,双对数深度树方法并行成本(cost)更优的基于数据划分方法的最大值查找并行算法.基于数据划分方法的最大值查找算法有效的解决了现有并行方法中处理器工作量分配不均,对处理器需求过大,实现条件苛刻等问题.为此后类似并行算法降低并行成本提供一个方向.  相似文献   

3.
一类Toeplitz循环三对角方程组的一种分布式并行算法   总被引:4,自引:1,他引:3  
提出一类Toeplitz循环三对方程组的一种分布式并行算法,在求解由一阶线性双曲型方程(如迁移方程)在一定边界条件下导出的隐式差分方程组时,要重复地求解此类Toeplitz循环三对角方程组。算法基于对系数矩阵的分解,贯彻并行算法设计中“分而治之”的原则,充分利用了系数矩阵结构的特殊性。算法实现中通过秦九韶公式的运用,避免了不必要的冗余计算;理论分析和数值试验表明,算法是数值稳定的,且当方程组规模充分大时,该算法加速比趋近线性加速比的理想情况。给出了算法在某分布存储多计算机系统上的数值试验结果。  相似文献   

4.
《国际计算机数学杂志》2012,89(1-4):153-158
Motivated by the recursive partitioning algorithm of Evans [2], we present a new algorithm for inverting tridiagonal matrices. Our derivation of the algorithm is different but elementary. The present algorithm has potential for its vector and parallel implementation.  相似文献   

5.
利用近似三对角Toeplitz矩阵的特殊结构,提出了一种新的求解近似三对角Toeplitz方程组的快速算法.在三对角Toeplitz矩阵的近似LU分解的基础上,利用“分而治之”的思想,并结合秦九韶技术和特殊的数学技巧减少大量的冗余计算,提出了求解近似Toeplitz三对角方程组的快速分布式并行算法,并在理论上证明了算法具有近似于线性的加速比.最后通过数值实验证明,新的并行算法具有较高的并行效率,并且当矩阵阶数n足够大时,算法的加速比趋近于线性加速比.  相似文献   

6.
《国际计算机数学杂志》2012,89(3-4):435-440
This paper presents a parallel algorithm for solving the implicit diffusion difference equations. The basic idea is based on vectorization of the tridiagonal Toeplitz difference equations. This method is superior to the algorithm showed by H. Stone [8]. We computed some examples on an NEC SX-3/44R supercomputer by our method. The results showed a good parallelism with this algorithm.  相似文献   

7.
An efficient parallel algorithm for merging two sorted lists is presented. The algorithm is based on a novel partitioning algorithm that splits the two lists among the processors, in a way that ensures load balance during the merge. The partitioning algorithm can itself be efficiently parallelized, allowing the solution to scale with increased numbers of processors. A shared memory multiprocessor is assumed. The time complexity for partitioning and merging is O(N/p + log N), where p is the number of processors and N is the total number of elements in the two lists. Implementation results on a twenty node Sequent Symmetry multiprocessor are also presented.  相似文献   

8.
三对角线性方程组的一种有效并行算法   总被引:8,自引:0,他引:8  
本文提出一种求解严格对角占优的三对角线性方程组的并行算法(简称PPD算法),新算法计算复杂性约为8n,与最优串行算法追赶法的计算复杂性相同,通信复杂性为常数.目前求解此类方程组的最优并行算法的计算复杂性约为17n,通信复杂性约为logP,相对而言PPD算法的计算性能和通信性能都有大幅度提高.试算结果表明,加速比呈线性增加,并行效率达到90%以上.  相似文献   

9.
一类Toeplitz三对角方程组的有效分布式并行算法   总被引:1,自引:0,他引:1  
针对大型方程组的特点,本文提出了一种求解一类Toeplitz三对角方程组的分布式并行算法.该算法首先并行求出原Toeplitz三对角方程组的近似解,然后在给定的误差范围内对近似解进行修正,该算法的通信机制简单、冗余计算量少.数值试验表明该算法具有较高的并行效率.  相似文献   

10.
基于递归耦合方法的三对角线性方程组分布式并行算法   总被引:1,自引:2,他引:1  
方蓉  赵瑛 《计算机工程与设计》2006,27(4):670-671,687
提出了一种在分布式计算机上用递归倍增方法解三对角线性方程组的并行算法。通过研究算法中的额外开销达到优化标量算法的执行和通讯,并减少了存储开销。当三对角线性方程组的系数矩阵满足对角占优时,该算法在运行过程中不会中断。最后,在采用消息传递编程模型的基于局域网MPI并行环境下对算法进行了评价。数值实验结果表明,该算法是高效的。  相似文献   

11.
51.引言 随着科学技术的发展,对大规模科学计算提出的需求越来越高.一是求解问题的规模越来越大,例如,三维油正模拟、大气和海洋之间的相互作用和核安全分析等都要求解超大规模的非线性方程组(未知数个数高达106~108).另一方面是实时性要求越来越迫切,电力系统安全分析、气象预报等方面提出的实时性需求是最好的铭证. 传统的单机串行式地解决问题的方法已经无法满足客观需求,因此各种形式的向量化和并行(乃至并行十向量化)算法的研究受到普遍的重视. 无论用什么方法求解非线性偏微分方程(组);最终都导致成千上万…  相似文献   

12.
1.引 言 循环三对角线性方程组的求解是诸多应用问题的重要组成部分.例如,周期的样条插值就导致对角占优的循环三对角线性方程组的求解[1],当边界条件为周期边界条件时,一些偏微分方程的离散化也可能导致循环三对角线性方程组的求解.适应计算机体系结构发展的此类方程组的算法研究,是数值并行算法的重要问题之一.文献[2]讨论了适用于共享主存并行机的此类方程组的并行算法,在 Mller和 Scheerer[3]提出的并行化H对角线性方程组解法的划分方法基础上,Chung[4]等对循环块三对角线性方程组进行了研究…  相似文献   

13.
SMP集群系统上矩阵特征问题并行求解器的有效算法   总被引:2,自引:0,他引:2  
对称矩阵三对角化和三对角对称矩阵的特征值求解是稠密对称矩阵特征问题并行求解器的关键步 .针对SMP集群系统的多级体系结构,基于Householder变换的矩阵三对角化和三对角矩阵特征值问题的分而治之算法,给出了它们的MPI OpenMP混合并行算法 .算法研究集中在SMP集群系统环境下的负载平衡、通信开销和性能评价 .混合并行算法的设计结合了粗粒度线程并行模式和任务共享的动态调用方法,改善了MPI算法中的负载平衡问题、降低了通信开销 .在深腾6800上的实验表明,基于混合并行算法的求解器比纯MPI版本的求解器具有更好的性能和可扩展性 .  相似文献   

14.
An optimized parallel algorithm is proposed to solve the problem occurred in the process of complicated backward substitution of cyclic reduction during solving tridiagonal linear systems. Adopting a hybrid parallel model, this algorithm combines the cyclic reduction method and the partition method. This hybrid algorithm has simple backward substitution on parallel computers comparing with the cyclic reduction method. In this paper, the operation count and execution time are obtained to evaluate and make comparison for these methods. On the basis of results of these measured parameters, the hybrid algorithm using the hybrid approach with a multi‐threading implementation achieves better efficiency than the other parallel methods, that is, the cyclic reduction and the partition methods. In particular, the approach involved in this paper has the least scalar operation count and the shortest execution time on a multi‐core computer when the size of equations meets some dimension threshold. The hybrid parallel algorithm improves the performance of the cyclic reduction and partition methods by 19.2% and 13.2%, respectively. In addition, by comparing the single‐iteration and multi‐iteration hybrid parallel algorithms, it is found that increasing iteration steps of the cyclic reduction method does not affect the performance of the hybrid parallel algorithm very much. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

15.
文章针对三对角矩阵,利用矩阵的Schur余子式求矩阵行列式的方法,提出了一种并行求解三对角矩阵及其逆的行列式的算法,应用该算法可以得到较好的加速度。  相似文献   

16.
基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。 关键词:SMP集群;MPI+OpenMP;混合并行;并行求解器  相似文献   

17.
The partition method of Wang for tridiagonal equations is generalized to the arbitrary band case. A stability criterion is given. The algorithm is compared to Gaussian elimination and cyclic reduction.  相似文献   

18.
In this paper we discuss the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.  相似文献   

19.
王晓锋  毛力 《计算机工程》2011,37(23):83-85
要提高并行网络模拟性能,需对网络模拟拓扑进行有效划分。为此,提出一种并行网络模拟拓扑的优化划分方法。分析影响并行网络模拟性能因素,给出并行网络模拟性能估计模型,以该模型为评价函数,采用遗传算法寻找优化划分,实现并行网络模拟拓扑的优化划分。在PDNS上的实验结果表明,与传统划分方法相比,该优化划分方法的并行模拟性能平均提高13.3%。  相似文献   

20.
对称矩阵三对角化的有效并行块算法设计   总被引:1,自引:0,他引:1  
在矩阵数值计算中,块算法通常比非块算法更有效,但这也增加了并行算法设计和实现的难度.在广义稠密对称矩阵特征问题并行求解器中,并行块算法的构造可应用到正定对称矩阵的Choleski分解、对称矩阵的三对角化和回代转化(back-transiation)操作中.本文将并行块算法的讨论集中在具有代表性的对称矩阵三对角化上,给出在非块存储方式下对称矩阵三对角化的并行块算法设计方法.分析块算法大小同矩阵规模和处理器数量的关系.在深腾6800上的试验表明,我们的算法具有很好的性能,并得到了比ScaLAPACK更高的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号