首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《国际计算机数学杂志》2012,89(13):3079-3093
The cyclic reduction method is a direct method for solving tridiagonal linear systems. At the first step of this method, a tridiagonal coefficient matrix is transformed into a pentadiagonal form. In this article, we prove that the condition number for eigenvalues of some classes of coefficient matrices always decreases after the first step of the cyclic reduction method.  相似文献   

2.
基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。 关键词:SMP集群;MPI+OpenMP;混合并行;并行求解器  相似文献   

3.
SMP集群系统上矩阵特征问题并行求解器的有效算法   总被引:2,自引:0,他引:2  
对称矩阵三对角化和三对角对称矩阵的特征值求解是稠密对称矩阵特征问题并行求解器的关键步.针对SMP集群系统的多级体系结构,基于Householder变换的矩阵三对角化和三对角矩阵特征值问题的分而治之算法,给出了它们的MPI+OpenMP混合并行算法.算法研究集中在SMP集群系统环境下的负载平衡、通信开销和性能评价.混合并行算法的设计结合了粗粒度线程并行模式和任务共享的动态调用方法,改善了MPI算法中的负载平衡问题、降低了通信开销.在深腾6800上的实验表明,基于混合并行算法的求解器比纯MPI版本的求解器具有更好的性能和可扩展性.  相似文献   

4.
S. Cei  M. Galli  M. Soccio  P. Zellini 《Calcolo》1981,18(4):303-319
We illustrate some 0 (logn) parallel algorithms for invertingn×n tridiagonal and pentadiagonal matrices. Also, an 0(logn) parallel algorithm is proposed to computer th order linear recurrences and the determinant ofr-band Hessenberg matrices.  相似文献   

5.
M. Capovani 《Calcolo》1971,8(1-2):149-159
Sommario Si espone un metodo per calcolare l'inversa di una matrice tridiagonale e si individuano aleune proprietà strutturali della inversa stessa. Tali proprietà sono utilizzate per ottenere la decomposizione di una matrice pentadiagonale nel prodotto di due matrici tridiagonali, una delle quali è simmetrica.
A direct method is proposed for the calculation of the inverse of a tridiagonal matrix. Certain properties of the inverse matrix are then put in evidence; the properties can be used in the decomposition of a pentadiagonal matrix into the product of two tridiagonal matrices, one of which is symmetric.
  相似文献   

6.
Quaternionic least squares (QLS) is an efficient method for solving approximate problems in quaternionic quantum theory. In view of the extensive applications of Hermitian tridiagonal matrices in physics, in this paper we list some properties of basis matrices and subvectors related to tridiagonal matrices, and give an iterative algorithm for finding Hermitian tridiagonal solution with the least norm to the quaternionic least squares problem by making the best use of structure of real representation matrices, we also propose a preconditioning strategy for the Algorithm LSQR-Q in Wang, Wei and Feng (2008) [14] and our algorithm. Numerical experiments are provided to verify the effectiveness of our method.  相似文献   

7.
基于对称三对角矩阵特征求解的分而治之方法,提出了一种改进的使用MPI/Cilk模型求解的混合并行实现,结合节点间数据并行和节点内多任务并行,实现了对分治算法中分治阶段和合并阶段的多任务划分和动态调度.节点内利用Cilk任务并行模型解决了线程级并行的数据依赖和饥饿等待等问题,提高了并行性;节点间通过改进合并过程中的通信流程,使组内进程间只进行互补的数据交换,降低了通信开销.数值实验体现了该混合并行算法在计算效率和扩展性方面的优势.  相似文献   

8.
基于递归耦合方法的三对角线性方程组分布式并行算法   总被引:1,自引:2,他引:1  
方蓉  赵瑛 《计算机工程与设计》2006,27(4):670-671,687
提出了一种在分布式计算机上用递归倍增方法解三对角线性方程组的并行算法。通过研究算法中的额外开销达到优化标量算法的执行和通讯,并减少了存储开销。当三对角线性方程组的系数矩阵满足对角占优时,该算法在运行过程中不会中断。最后,在采用消息传递编程模型的基于局域网MPI并行环境下对算法进行了评价。数值实验结果表明,该算法是高效的。  相似文献   

9.
基于MPI的并行八叉树碰撞检测   总被引:5,自引:1,他引:5  
通过对碰撞检测过程进行分析,发现各节点间相关性较小,存在并行化的可能.在对八叉树碰撞检测算法做适当修改的基础上,结合成熟的消息传递通信(MPI)并行编程环境,提出了基于MPI的并行碰撞检测算法.测试结果表明,碰撞检测效率有较大的提高.  相似文献   

10.
根据分块三对角矩阵逆矩阵的特殊结构,利用其LU和UL分解,并使用Sheman-Morrison-Woodbury公式,得到一个求分块周期三对角矩阵逆矩阵的新算法,并由该算法得到求周期三对角矩阵和对称周期三对角矩阵逆矩阵的新算法。新算法比传统算法的计算复杂度和计算时间要低。  相似文献   

11.
In this paper, we have proposed a pentadiagonal alternating-direction-implicit (Penta-ADI) finite-difference time-domain (FDTD) method for the two-dimensional Schrödinger equation. Through the separation of complex wave function into real and imaginary parts, a pentadiagonal system of equations for the ADI method is obtained, which results in our Penta-ADI method. The Penta-ADI method is further simplified into pentadiagonal fundamental ADI (Penta-FADI) method, which has matrix-operator-free right-hand-sides (RHS), leading to the simplest and most concise update equations. As the Penta-FADI method involves five stencils in the left-hand-sides (LHS) of the pentadiagonal update equations, special treatments that are required for the implementation of the Dirichlet’s boundary conditions will be discussed. Using the Penta-FADI method, a significantly higher efficiency gain can be achieved over the conventional Tri-ADI method, which involves a tridiagonal system of equations.  相似文献   

12.
对称矩阵三对角化是求解稠密特征问题的关键计算过程.针对GPU集群采用了MPI(message passing interface)和GPU级2级并行方法设计实现了基于MPI和CUDA(compute unified device architecture )的稠密对称矩阵三对角化算法.在MPI集群级并行中,通过将2维通信域中行-列通信域间的全局数据通信设计为完全并行的点-点数据通信方式,改善了三对角化MPI并行算法的通信性能.通过改进原矩阵三对角化的MPI并行算法,避免了在GPU级并行中使用的不规则的矩阵-向量运算,这部分的并行性能提升了1倍左右.并且,将在GPU并行中存在的小粒度计算合并为较大粒度计算,该策略可通过加大计算密集度来充分地发挥GPU的计算能力,增加GPU的利用率,从而提升了算法的性能.此外,利用多个CUDA流使算法中独立的CUDA操作可以在不同的流中并发执行.并且,在并行算法中,利用CPU与GPU之间的异步数据传输,使得在不同流中的数据传输和核函数同时执行,隐藏了数据传输的时间,进一步提升了算法的性能.在中国科学院超级计算机系统“元”上,使用Nvidia Tesla K20 GPGPU测试了不同规模矩阵的基于MPI+CUDA的三对角化并行块算法的性能,取得了较好的加速效果与性能,并且具有良好的可扩展性.  相似文献   

13.
In this paper, by using parallel computing along with recursion, we describe a reliable symbolic computational algorithm for inverting cyclic pentadiagonal matrices. The algorithm is implemented in MAPLE. Two other symbolic algorithms are developed and the computational costs for all algorithms are given. An example is presented for the sake of illustration.  相似文献   

14.
生物复杂网络motif发现是一种研究生物网络的重要方法,它基于复杂网络的理论研究,以新的视角来研究生命现象和生命机制,但是在处理较大的网络规模或者需挖掘较大的motif时计算效率低。针对这个问题,在现有串行网络motif发现算法ESU的基础上,提出一种基于消息传递接口(MPI)的并行化ESU算法。该方法在ESU计算过程中优化了节点值以解决节点值依赖问题,并以ESU算法的子图发现策略统计各节点子图数,利用动态规划策略寻找最佳节点分配策略以解决负载不均衡问题。模拟网络数据和真实生物网络数据的实验结果表明,并行化ESU算法优化了节点值依赖问题,实现了基于动态规划的负载均衡策略,其运行时间比串行算法缩短了90%,并且该并行算法对不同类型不同规模的网络都具有较强的适用性,有效地提高了网络motif发现问题的计算效率。  相似文献   

15.
采用MPI多进程和Open MP多线程两级并行相结合的方式,实现了循环盒子法的并行计算,并对其预处理算法进行了改进。在国家超算广州中心的"天河-2"系统上,完成了对亿级网格量的超燃冲压发动机燃烧室算例的测试。结果分析表明,进程盒子法和边界盒子法不存在盒子切割数的选择问题,边界盒子法较其他算法具有更好的加速比,可显著提高壁面距离的计算效率。  相似文献   

16.
为了进一步提高速度受限的多目标粒子群算法(SMPSO)求解多目标优化问题的效率和精度,文中提出基于消息传递接口(MPI)的并行化SMPSO算法(M-SMPSO).采用主从模式的MPI并行程序设计模式,将整个种群分成几个子种群,各子种群分别执行独立进化计算,提高算法效率.此外,为了均衡考虑算法的分布性与收敛性,提出自适应的全局最优解选择策略.使用标准测试函数验证算法性能,实验表明,相比其它多目标算法,文中算法能获得更高的加速比,更快收敛到多目标优化问题的Pareto前沿.  相似文献   

17.
Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical pre-processing step for the block tridiagonal divide-and-conquer algorithm for computing eigensystems and is useful for many algorithms desiring the efficiencies of block structure in matrices. For an “effectively” sparse matrix, which frequently results from applications with strong locality properties, a heuristic parallel algorithm is used to transform it into a block tridiagonal matrix such that the eigenvalue errors remain bounded by some prescribed accuracy tolerance. For a dense matrix without any usable structure, orthogonal transformations are used to reduce it to block tridiagonal form using mostly level 3 BLAS operations. Numerical experiments show that block tridiagonal structure obtained from this algorithm directly affects the computational complexity of the parallel block tridiagonal divide-and-conquer eigensolver. Reduction to block tridiagonal form provides significantly lower execution times, as well as memory traffic and communication cost, over the traditional reduction to tridiagonal form for eigensystem computations.  相似文献   

18.
针对一类典型的多变量耦合三对角工业系统,在研究(块)三对角矩阵计算的基础上,提出了一种新的三对角解耦算法。该算法关键在于构造两补偿矩阵,即前串联补偿阵L(s)和后串联补偿阵R(s),从而使耦合三对角工业系统变为对角系统,实现解耦。考虑到三对角工业系统次对角线上可能存有零传递函数分量,在此讨论了两种L(s)和R(s)的构造算法。经仿真不仅证实了方法的有效性,而且得出了构造的L(s)和R(s)具有多分量相等的特点,这将大大减轻其在工业实现方面的工作量。  相似文献   

19.
详细分析快速多极算法FMM(Fast Multipole Method)的基本原理,并对引力场的势函数的多极展开和泰勒局部展开进行了详细的推导.给出了串行FMM算法的伪码描述,并对其进行并行化分析、处理,对FMM算法进行了并行化研究.最后,在基于MPI的群集并行计算环境下进行大量的实验并采集实验数据,对算法进行并行化性能分析,得到较好的并行加速比和较高的并行效率.  相似文献   

20.
Large-scale parallelized distributed computing has been implemented in the message passing interface (MPI) environment to solve numerically, eight reaction-diffusion equations representing the anatomy and treatment of breast cancer. The numerical algorithm is perturbed functional iterations (PFI) which is completely matrix-free. Fully distributed computations with multiple processors have been implemented on a large scale in the serial PFI-code in the MPI environment. The technique of implementation is general and can be applied to any serial code. This has been validated by comparing the computed results from the serial code and those from the MPI-version of the parallel code.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号