首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
SMP集群系统上矩阵特征问题并行求解器的有效算法   总被引:2,自引:0,他引:2  
对称矩阵三对角化和三对角对称矩阵的特征值求解是稠密对称矩阵特征问题并行求解器的关键步 .针对SMP集群系统的多级体系结构,基于Householder变换的矩阵三对角化和三对角矩阵特征值问题的分而治之算法,给出了它们的MPI OpenMP混合并行算法 .算法研究集中在SMP集群系统环境下的负载平衡、通信开销和性能评价 .混合并行算法的设计结合了粗粒度线程并行模式和任务共享的动态调用方法,改善了MPI算法中的负载平衡问题、降低了通信开销 .在深腾6800上的实验表明,基于混合并行算法的求解器比纯MPI版本的求解器具有更好的性能和可扩展性 .  相似文献   

2.
对称矩阵三对角化的有效并行块算法设计   总被引:1,自引:0,他引:1  
在矩阵数值计算中,块算法通常比非块算法更有效,但这也增加了并行算法设计和实现的难度.在广义稠密对称矩阵特征问题并行求解器中,并行块算法的构造可应用到正定对称矩阵的Choleski分解、对称矩阵的三对角化和回代转化(back-transiation)操作中.本文将并行块算法的讨论集中在具有代表性的对称矩阵三对角化上,给出在非块存储方式下对称矩阵三对角化的并行块算法设计方法.分析块算法大小同矩阵规模和处理器数量的关系.在深腾6800上的试验表明,我们的算法具有很好的性能,并得到了比ScaLAPACK更高的性能.  相似文献   

3.
The computation of selected eigenvalues and eigenvectors of a symmetric (Hermitian) matrix is an important subtask in many contexts, for example in electronic structure calculations. If a significant portion of the eigensystem is required then typically direct eigensolvers are used. The central three steps are: reduce the matrix to tridiagonal form, compute the eigenpairs of the tridiagonal matrix, and transform the eigenvectors back. To better utilize memory hierarchies, the reduction may be effected in two stages: full to banded, and banded to tridiagonal. Then the back transformation of the eigenvectors also involves two stages. For large problems, the eigensystem calculations can be the computational bottleneck, in particular with large numbers of processors. In this paper we discuss variants of the tridiagonal-to-banded back transformation, improving the parallel efficiency for large numbers of processors as well as the per-processor utilization. We also modify the divide-and-conquer algorithm for symmetric tridiagonal matrices such that it can compute a subset of the eigenpairs at reduced cost. The effectiveness of our modifications is demonstrated with numerical experiments.  相似文献   

4.
根据分块三对角矩阵逆矩阵的特殊结构,利用其LU和UL分解,并使用Sheman-Morrison-Woodbury公式,得到一个求分块周期三对角矩阵逆矩阵的新算法,并由该算法得到求周期三对角矩阵和对称周期三对角矩阵逆矩阵的新算法。新算法比传统算法的计算复杂度和计算时间要低。  相似文献   

5.
In this paper, we consider simultaneous band reduction of two dense symmetric matrices by congruent transformations. The ideas of simultaneous tridiagonalization are generalized to propose an efficient algorithm for the simultaneous band reduction. In contrast to the algorithms of simultaneous tridiagonalization which are mainly based on matrix–vector operations, the proposed algorithm of simultaneous band reduction has the advantage that matrix–matrix operations can be fully used to achieve better performance on modern computer architecture. Numerical results are presented to illustrate the effectiveness of our proposed algorithm.  相似文献   

6.
一类Toeplitz循环三对角方程组的一种分布式并行算法   总被引:4,自引:1,他引:3  
提出一类Toeplitz循环三对方程组的一种分布式并行算法,在求解由一阶线性双曲型方程(如迁移方程)在一定边界条件下导出的隐式差分方程组时,要重复地求解此类Toeplitz循环三对角方程组。算法基于对系数矩阵的分解,贯彻并行算法设计中“分而治之”的原则,充分利用了系数矩阵结构的特殊性。算法实现中通过秦九韶公式的运用,避免了不必要的冗余计算;理论分析和数值试验表明,算法是数值稳定的,且当方程组规模充分大时,该算法加速比趋近线性加速比的理想情况。给出了算法在某分布存储多计算机系统上的数值试验结果。  相似文献   

7.
This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm.

The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm.

Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses.

The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression (p +6)/4.

Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization.  相似文献   


8.
基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。 关键词:SMP集群;MPI+OpenMP;混合并行;并行求解器  相似文献   

9.
The parallel stratagem in this paper uses scattered square decomposition, introduced by G. Fox, for its data assignment and then exploits parallelism in the solution steps of the sequential Householder tridiagonalization algorithm. One may condense a real symmetric full matrix A of order n into a tridiagonal form by the stratagem in concurrent machines where N(= D2) processors are used. Expressions for efficiency and speedup are given for the evaluation of the stratagem. An alternative stratagem which requires less data transmission but more computations is also discussed. The results shown that the Householder Method of tridiagonalization may be implemented on a concurrent machine efficiently by scattered square decomposition provided that the number of matrix elements contained in each processor is much larger than the number of processors of the concurrent machine, and the ratio of the time to transmit one data item from one processor to any other processor to the time to perform a floating-point arithmetic operation is small enough.  相似文献   

10.
Quasi-birth-and-death processes, that is multi-dimensional Markov chains with block tridiagonal transition probability or generator matrices, are appropriate models for various types of queueing systems, amongst many other population dynamics. We consider continuous-time level dependent quasi-birth-and-death processes (LDQBDs) extended by catastrophes, which means that the transition rates are allowed to depend on the process level and additionally in each state the level component may drop to zero such that the generator matrix deviates from the block tridiagonal form in that the first block column is allowed to be completely occupied. A matrix analytic algorithm (MAA) for computing the stationary distribution of such processes is introduced that extends and generalizes similar algorithms for LDQBDs without catastrophes. The algorithm is applied in order to analyze M/M/c queues in random environment with catastrophes and state dependent rates. We present a detailed steady state analysis by computing the stationary distribution for different parameter sets, thereby focusing on the marginal probabilities of the level component which represents the number of customers. It turns out that the stationary marginal distribution is bimodal in the sense that it has two local modes that significantly depend on the specific parameters and rates. We also study the efficiency of our matrix analytic algorithm (MAA). Comparisons with standard solution algorithms for Markov chains demonstrate its superiority in terms of runtime and memory requirements.  相似文献   

11.
A simple direct method and its variants, based on matrix multiplications, to invert square non-singular matrices (need not be positive definite) and to solve systems of linear equations are developed. Since the method and its variants involve only matrix multiplications they are straightforward to parallelise and hence have an advantage over some existing well known direct methods. Theoretical background, analysis of the proposed method and the complexity of the algorithm for sparse and dense matrices are given. Two significantly different parallel algorithms for dense and sparse matrices are developed. In the case of the dense matrix standard parallelisation of matrix multiplications was undertaken. However, for sparse matrices the standard parallel matrix multiplication is not effective which led us to develop a parallel algorithm different from that for the dense matrix. The performances of our algorithms are tested via numerical examples.  相似文献   

12.
一类Toeplitz三对角方程组的一种分布式并行算法   总被引:3,自引:0,他引:3  
文中提出一类Toeplitz三对角方程组的一种分布式并行算法。该算法以系数矩阵的分解为基础,充分利用了系数矩阵结构的特殊性,算法因并行化而引入的冗余计算量非常少,算法的通信机制简单,通信量仅与处理 机台数p有关,与方程组规模n无关,算法具有很高的并行效率,理论分析和数值试验表明,其加速比Sp(n)→p(n→ ∞),此为线性加速比的理想情况。文中给出了算法在分布存储多计算机系统上的数值试验结果。  相似文献   

13.
针对一类典型的多变量耦合三对角工业系统,在研究(块)三对角矩阵计算的基础上,提出了一种新的三对角解耦算法。该算法关键在于构造两补偿矩阵,即前串联补偿阵L(s)和后串联补偿阵R(s),从而使耦合三对角工业系统变为对角系统,实现解耦。考虑到三对角工业系统次对角线上可能存有零传递函数分量,在此讨论了两种L(s)和R(s)的构造算法。经仿真不仅证实了方法的有效性,而且得出了构造的L(s)和R(s)具有多分量相等的特点,这将大大减轻其在工业实现方面的工作量。  相似文献   

14.
We propose a novel divide-and-conquer algorithm for the solution of the all-pair shortest-path problem for directed and dense graphs with no negative cycles. We propose R-Kleene, a compact and in-place recursive algorithm inspired by Kleene's algorithm. R-Kleene delivers a better performance than previous algorithms for randomly generated graphs represented by highly dense adjacency matrices, in which the matrix components can have any integer value. We show that R-Kleene, unchanged and without any machine tuning, yields consistently between 1/7 and 1/2 of the peak performance running on five very different uniprocessor systems.  相似文献   

15.
We tackle the parallelization of Non-Negative Matrix Factorization (NNMF), using the Alternating Least Squares and Lee and Seung algorithms, motivated by its use in audio source separation. For the first algorithm, a very suitable technique is the use of active set algorithms for solving several non-negative inequality constraints least squares problems. We have addressed the NNMF for dense matrix on multicore architectures, by organizing these optimization problems for independent columns. Although in the sequential case, the method is not as efficient as the block pivoting variant used by other authors, they are very effective in the parallel case, producing satisfactory results for the type of applications where is to be used. For the Lee and Seung method, we propose a reorganization of the algorithm steps that increases the convergence speed and a parallelization of the solution. The article also includes a theoretical and experimental study of the performance obtained with similar matrices to that which arise in applications that have motivated this work.  相似文献   

16.
Jan Mayer 《Computing》2009,86(4):291-312
In this article, we present two new algorithms for solving given triangular systems in parallel on a shared memory architecture. Multilevel incomplete LU factorization based preconditioners, which have been very successful for solving linear systems iteratively, require these triangular solves. Hence, the algorithms presented here can be seen as parallelizing the application of these preconditioners. The first algorithm solves the triangular matrix by block anti-diagonals. The drawback of this approach is that it can be difficult to choose an appropriate block structure. On the other hand, if a good block partition can be found, this algorithm can be quite effective. The second algorithm takes a hybrid approach by solving the triangular system by block columns and anti-diagonals. It is usually as effective as the first algorithm, but the block structure can be chosen in a nearly optimal manner. Although numerical results indicate that the speed-up can be fairly good, systems with matrices having a strong diagonal structure or narrow bandwidth cannot be solved effectively in parallel. Hence, for these matrices, the results are disappointing. On the other hand, the results are better for matrices having a more uniform distribution of non-zero elements. Although not discussed in this article, these algorithms can possibly be adapted for distributed memory architectures.  相似文献   

17.
A block parallel partitioning method for computing the eigenvalues of symmetric tridiagonal matrix is presented. The algorithm is based on partitioning, in a way that ensures load balance during computation. This method is applicable to both shared memory- and distributed memory-MIMD systems. Compared with other parallel tridiagonal eigenvalue algorithms existing in the literature, the proposed algorithm achieves a higher speedup of O(p) on a parallel computer with p-fold parallelism, which is linear, and the data communication between processors is less than that required for other methods. The results were tested and evaluated on an MIMD machine, and were within 62% to 98% of the predicted performance.  相似文献   

18.
考虑工作站网络(NOWs)中三对角线性方程组的并行求解,基于最小秩解耦算法与分布治之并行计算模式,并行最小秩解耦算法(PMRD)。它在计算过程中保持原矩阵的结构特征,数值稳定性高,本文给出算法的数值特征分析以及计算与通讯复杂性分析并与Mehrmann分治算比较,所有算法由PVM软件系统实现并在工作站网络中测试。  相似文献   

19.
In this paper, we survey several recent results that highlight an interplay between a relatively new class of quasiseparable matrices and univariate polynomials. Quasiseparable matrices generalize two classical matrix classes, Jacobi (tridiagonal) matrices and unitary Hessenberg matrices that are known to correspond to real orthogonal polynomials and Szegö polynomials, respectively. The latter two polynomial families arise in a wide variety of applications, and their short recurrence relations are the basis for a number of efficient algorithms. For historical reasons, algorithm development is more advanced for real orthogonal polynomials. Recent variations of these algorithms tend to be valid only for the Szegö polynomials; they are analogues and not generalizations of the original algorithms.  相似文献   

20.
Brownian matrices arise in certain models used in digital signal processing. Fast algorithms for solving brownian systems of linear equations can be derived because any brownian matrix is congruent to a tridiagonal matrix. This, and related properties, are developed and extended to matrices having a more general patterned structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号