首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
In the direct solution of sparse symmetric and positive definite linear systems, finding an ordering of the matrix to minimize the height of the elimination tree (an indication of the number of parallel elimination steps) is crucial for effectively computing the Cholesky factor in parallel. This problem is known to be NP-hard. Though many effective heuristics have been proposed, the problems of how good these heuristics are near optimal and how to further reduce the height of the elimination tree remain unanswered. This paper is an effort for this investigation. We introduce a genetic algorithm tailored to this parallel ordering problem, which is characterized by two novel genetic operators, adaptive merge crossover and tree rotate mutation. Experiments showed that our approach is cost effective in the number of generations evolved to reach a better solution in reducing the height of the elimination tree.  相似文献   

2.
张博为  吴艳霞  顾国昌  孙霖 《计算机工程》2012,38(11):281-283,286
针对求解GF(2)域的线性方程组问题,改进现有的高斯消元算法,提出一种快速求解未知向量的硬件并行结构,通过增加消元与行循环位移的并行操作以降低时间复杂度,采用一类仿“smart memory”基本单元的互联完成整个算法在硬件上的映射。对结构的性能分析表明,对于密度远大于或小于0.5的n阶二值增广矩阵,并行结构平均计算时间约为2n个时钟周期,远小于软件算法时间(1/4n3)。在 3阶~50阶的二值非稀疏增广矩阵上的实现结果表明,与软件实现相比,该结构的性能可提高约2个数量级。  相似文献   

3.
为满足大规模线性方程组对内存容量的要求,针对对称方程组提出一种高斯消去的并行化方案。对称方程组在高斯消去过程中其子方阵的对称性仍然存在,因此在并行计算时只读入和计算三角部分的数据,从而减少储存空间的大小,提高并行效率。测试表明,该方案的并行效率优于传统算法,可应用于对称方程组的大规模数值计算中。  相似文献   

4.
Any factorization/back substitution scheme for the solution of linear systems consists of two phases which are different in nature, and hence may be inefficient for parallel implementation on a single computational network. The Gauss-Jordan elimination scheme unifies the nature of the two phases of the solution process and thus seems to be more suitable for parallel architectures, especially if reconfiguration of the communication pattern is not permitted. In this communication, a computational network for the Gauss-Jordan algorithm is presented. This network compares favorably with optimal implementations of the Gauss elimination/back substitution algorithm.  相似文献   

5.
CUDA架构下大规模稠密线性方程组的并行求解   总被引:1,自引:0,他引:1       下载免费PDF全文
在Gauss-Jordan消去法的基础上,给出了一种适应于CUDA架构的改进Gauss-Jordan消去并行算法。通过分析该方法的处理过程以及CUDA架构的相应限制,在CUDA的grid-block-thread三层组织结构的基础上,从算法构造的角度提出了grid-strip-group-block-thread五层结构,给出了基础行以及全局基础行等概念,并构建了适应于CUDA架构的Gauss-Jordan消去法的并行版本,在最高维数为4 000维的大规模稠密线性方程组的算例求解上与串行Gauss-Jordan消去法进行了比较,实验结果表明,该算法能够充分利用GPU的硬件特性,有效地降低了大规模稠密线性方程组的求解时间。  相似文献   

6.
计算对称带状矩阵特征值问题的并行二分/多分法   总被引:1,自引:1,他引:0  
文中提出了在分布式环境下并行求解对称带状矩阵特征值问题的并行二分.多分法及其改进,该算法利用变形高斯消去法计算对称带状矩阵的Sturm序列,并利用Rayleigh商迭代对二分/多分法加以改进,在算法的并行执行过程中,各处理机间不需通信,特别适用在分布式环境下的并行计算,最后给出了数值实验结果。  相似文献   

7.
We consider a graph theoretical model and study a parallel implementation of the well-known Gaussian elimination method on parallel distributed memory architectures, where the communication delay for the transmission of an elementary data is higher than the computation time of an elementary instruction. We propose and analyze two low-complexity algorithms for scheduling the tasks of the parallel Gaussian elimination on an unbounded number of completely connected processors. We compare these two algorithms with a higher-complexity general-purpose scheduling algorithm, the DSC heuristic, proposed by A. Gerasoulis and T. Yang (1993)  相似文献   

8.
在大规模科学计算中,求解线性代数方程组是一个非常重要的课题。而在分布式存贮的MIMD上如何求解稠密线性代数方程组,数据平衡与机间通讯是两个最大的影响因素。本文针对超立方体连接的分布式MIMD系统上高斯消去法的具体实现展开了讨论。首先,我们介绍两种非选主元的高期去法的通讯策略,然后将其推广到选主元的高斯消去法,最后提出一种新的算法。使处理机效率大大提高,基本达到全并行工作。部分已有实验数据也在文中给  相似文献   

9.
Gaussian elimination is a canonical linear algebra procedure for solving linear systems of equations. In the last few years, the algorithm has received a lot of attention in an attempt to improve its parallel performance. This article surveys recent developments in parallel implementations of Gaussian elimination for shared memory architecture. Five different flavors are investigated. Three of them are based on different strategies for pivoting: partial pivoting, incremental pivoting, and tournament pivoting. The fourth one replaces pivoting with the Partial Random Butterfly Transformation, and finally, an implementation without pivoting is used as a performance baseline. The technique of iterative refinement is applied to recover numerical accuracy when necessary. All parallel implementations are produced using dynamic, superscalar, runtime scheduling and tile matrix layout. Results on two multisocket multicore systems are presented. Performance and numerical accuracy is analyzed. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
11.
常静 《现代计算机》2008,(3):106-108
以MPI为基础,以高斯消去法求解线性方程组的并行算法为实例,设计了分析并行算法性能的分析器,该分析器能够调度网络上多台计算机协同合作进行并行计算,并分析并行算法相对于串行算法的性能优势.  相似文献   

12.
This paper deals with the theory of hidden surface elimination algorithms in Computer Graphics. A set of functions and abstract data types are defined to help concisely specify a class of hidden surface elimination algorithms in a purely functional language. A formal study of these algorithms is presented here along with theorems of equivalence between some of the specifications. It is shown here that such proofs of equivalence will help in the construction of existing as well as new algorithms. We bring in the importance of such a study to exploit alternative parallel architectures for implementation of these algorithms. The other benefits of formal specification and analysis are due to its use as a teaching aid and effective method for rapid prototyping of these algorithms.  相似文献   

13.

We extend the elimination method of Chawla and Khazal [1] to uncouple partitioned blocks of "periodic" tridiagonal linear systems. At each step of the elimination stage, we now need three simultaneous eliminations: within each block, one usual forward elimination and one backward elimination from across the succeeding block, and one elimination in the last row of the last block. An interesting feature of the present elimination procedure is that at the end of it, the property of periodicity of the original system is now passed on to the core system . Once the core system is solved, the blocks uncouple and the solution is obtained in parallel from each block by back substitution. For a system of size N , the classical elimination has an arithmetical operations count of 17N . A best serial algorithm, based on the Sherman-Morrison formula, has an operations count of 0 14N . The present parallel elimination algorithm, employing p partition blocks, has an operations count of O 17N p and, in comparison with the Sherman-Morrison algorithm, it can achieve an efficiency of nearly 82% on a p -processor machine.  相似文献   

14.
本文讨论了三维物体隐面消除的并行处理问题。给出了一类MIMD并行深度缓冲器算法,并在多Transputer系统上实现。文中还对这些算法的效率进行了比较。  相似文献   

15.
《Parallel Computing》1997,22(13):1837-1851
The PAPS (Performance Analysis of Parallel Systems) toolset is a testbed for the model based performance prediction of message passing parallel applications executed on private memory multiprocessor computer systems. PAPS allows to describe the execution behavior of the computer hardware and operating system software resources up to a very detailed level. This enables very accurate performance prediction of parallel applications even in the case of substantial performance degradation due to contention for shared resources. In this paper the fundamental design principles and implementation methodologies for the development of the PAPS toolset are presented and the PAPS parallel system specification formalisms are described. A simplified performance study of a parallel Gaussian elimination application on the nCUBE 2 multiprocessor system is used to demonstrate the usage of the tool.  相似文献   

16.
使用CUDA平台,提出在通用图形处理器(GPGPU)上实现并行的全选主元、归一和消去等操作,加速实现并行全选主元高斯-约当消去法求解线性方程组的一种基本方法。该方法在CPU上完成解向量的恢复。根据NVIDIA公司最新Fermi架构图形处理器的特点,通过一系列的优化设计,使通用GPGPU相对Intel最新架构CPU的加速比超过了6.5倍,比Intel上一代CPU的加速比超过了10倍。  相似文献   

17.
The problem of computing the triangular factors of a square, real, symmetric, and positive definite matrix by using the facilities of a multiprocessor MIMD-type computer is considered. The parallel algorithms based on Cholesky decomposition and Gaussian elimination are derived and analyzed in terms of their speedup and efficiency, when the available number of processors is O(n), where n is the size of the matrix. It is shown that the parallel elimination method can achieve the same speedup as the parallel Cholesky method while using only half the number of processors required by the Cholesky method.  相似文献   

18.
吴素萍  王定康 《微计算机信息》2007,23(32):251-252,293
机器人技术中的碰撞问题可以被表示成量词消去问题,但由于有些碰撞问题的复杂性使得这些问题在单个微机上求解需要花费的时间很长或者根本就解不出来。本文提出了基于分布Maple系统下量词消去算法的并行化.并针对分布Maple系统的特点以及算法的特点,通过实例分析,给出了两种并行策略,以达到在Maple软件环境下提高处理器利用率,提高量词消去算法的效率的目的。  相似文献   

19.
Parallel data flow analysis methods offer the promise of calculating detailed semantic information about a program at compile-time more efficiently than sequential techniques. Previous work on parallel elimination methods (Zobel, 1990) has been hampered by the lack of control over interval size; this can prohibit effective parallel execution of these methods. To overcome this problem, we have designed the region analysis method, a new elimination method for data flow analysis. Region analysis emphasizes flow graph partitioning to enable better load balancing in a more effective parallel algorithm. We present the design of region analysis and the empirical results we have obtained that indicate: the prevalence of large intervals in flow graphs derived from real programs; and the performance improvement of region analysis over parallel Allen-Cocke interval analysis. Our implementation analyzed programs from the Perfect Benchmarks and netlib running on a Sequent Symmetry S81  相似文献   

20.
It is well known that parallelism by itself does not lead to higher speeds. This study shows how to put parallelism to best use, that is, how to find an optimal balance between communication and computation overheads for two parallel matrix algorithms. The problem graph for matrix algorithms analyzed in this paper is a two-dimensional grid (toroidal mesh) which is mapped onto a hypercube topology. To perform matrix operations on a hypercube, a matrix is partitioned into several submatrices which are stored and manipulated in the nodes. We seek to find an optimal matrix partitioning to minimize overall execution time. The NCUBE parallel machine is used for experimental performance evaluation. For matrix multiplication, we derive an exact analytical model to determine the optimal partitioning size and perform its experimental verification on the NCUBE parallel processor. For a parallel Gaussian elimination known as the balanced algorithm, we present performance measurements and an approximate analytical model for performance evaluation. Our analyses show that the optimal submatrix size is typically small and does not depend on the original matrix size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号