期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving Parallel Ordering of Sparse Matrices Using Genetic Algorithms

Wen-Yang?Lin Email author 《Applied Intelligence》2005,23(3):257-265

In the direct solution of sparse symmetric and positive definite linear systems, finding an ordering of the matrix to minimize the height of the elimination tree (an indication of the number of parallel elimination steps) is crucial for effectively computing the Cholesky factor in parallel. This problem is known to be NP-hard. Though many effective heuristics have been proposed, the problems of how good these heuristics are near optimal and how to further reduce the height of the elimination tree remain unanswered. This paper is an effort for this investigation. We introduce a genetic algorithm tailored to this parallel ordering problem, which is characterized by two novel genetic operators, adaptive merge crossover and tree rotate mutation. Experiments showed that our approach is cost effective in the number of generations evolved to reach a better solution in reducing the height of the elimination tree. 相似文献

2.

一种快速求解二值线性方程组的并行结构

下载免费PDF全文

张博为吴艳霞顾国昌孙霖《计算机工程》2012,38(11):281-283,286

针对求解GF(2)域的线性方程组问题,改进现有的高斯消元算法,提出一种快速求解未知向量的硬件并行结构,通过增加消元与行循环位移的并行操作以降低时间复杂度,采用一类仿“smart memory”基本单元的互联完成整个算法在硬件上的映射。对结构的性能分析表明,对于密度远大于或小于0.5的n阶二值增广矩阵,并行结构平均计算时间约为2n个时钟周期,远小于软件算法时间(1/4n3)。在 3阶~50阶的二值非稀疏增广矩阵上的实现结果表明,与软件实现相比,该结构的性能可提高约2个数量级。相似文献

3.

高斯消去的并行化研究

下载免费PDF全文

刘琳刘青昆宋小雨《计算机工程》2011,37(8):40-42

为满足大规模线性方程组对内存容量的要求,针对对称方程组提出一种高斯消去的并行化方案。对称方程组在高斯消去过程中其子方阵的对称性仍然存在,因此在并行计算时只读入和计算三角部分的数据,从而减少储存空间的大小,提高并行效率。测试表明,该方案的并行效率优于传统算法,可应用于对称方程组的大规模数值计算中。相似文献

4.

Parallel Gauss-Jordan elimination for the solution of dense linear systems

Rami Melhem 《Parallel Computing》1987,4(3):339-343

Any factorization/back substitution scheme for the solution of linear systems consists of two phases which are different in nature, and hence may be inefficient for parallel implementation on a single computational network. The Gauss-Jordan elimination scheme unifies the nature of the two phases of the solution process and thus seems to be more suitable for parallel architectures, especially if reconfiguration of the communication pattern is not permitted. In this communication, a computational network for the Gauss-Jordan algorithm is presented. This network compares favorably with optimal implementations of the Gauss elimination/back substitution algorithm. 相似文献

5.

CUDA架构下大规模稠密线性方程组的并行求解 总被引：1，自引：0，他引：1

下载免费PDF全文

杨梅李志民曹大勇《计算机工程与应用》2011,47(32):27-30

在Gauss-Jordan消去法的基础上,给出了一种适应于CUDA架构的改进Gauss-Jordan消去并行算法。通过分析该方法的处理过程以及CUDA架构的相应限制,在CUDA的grid-block-thread三层组织结构的基础上,从算法构造的角度提出了grid-strip-group-block-thread五层结构,给出了基础行以及全局基础行等概念,并构建了适应于CUDA架构的Gauss-Jordan消去法的并行版本,在最高维数为4 000维的大规模稠密线性方程组的算例求解上与串行Gauss-Jordan消去法进行了比较,实验结果表明,该算法能够充分利用GPU的硬件特性,有效地降低了大规模稠密线性方程组的求解时间。相似文献

6.

计算对称带状矩阵特征值问题的并行二分／多分法 总被引：1，自引：1，他引：0

魏立峰李晓梅《计算机工程与设计》2001,22(1):51-55

文中提出了在分布式环境下并行求解对称带状矩阵特征值问题的并行二分．多分法及其改进,该算法利用变形高斯消去法计算对称带状矩阵的Sturm序列,并利用Rayleigh商迭代对二分／多分法加以改进,在算法的并行执行过程中,各处理机间不需通信,特别适用在分布式环境下的并行计算,最后给出了数值实验结果。相似文献

7.

Scheduling algorithms for parallel Gaussian elimination withcommunication costs

Amoura A.K. Bampis E. Konig J.-C. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(7):679-686

We consider a graph theoretical model and study a parallel implementation of the well-known Gaussian elimination method on parallel distributed memory architectures, where the communication delay for the transmission of an elementary data is higher than the computation time of an elementary instruction. We propose and analyze two low-complexity algorithms for scheduling the tasks of the parallel Gaussian elimination on an unbounded number of completely connected processors. We compare these two algorithms with a higher-complexity general-purpose scheduling algorithm, the DSC heuristic, proposed by A. Gerasoulis and T. Yang (1993) 相似文献

8.

超立方体连接的分布式存贮MIMD上稠密线性代数方程组求解 总被引：2，自引：0，他引：2

莫则尧李晓梅《计算机工程与设计》1994,(5):26-33

在大规模科学计算中，求解线性代数方程组是一个非常重要的课题。而在分布式存贮的ＭＩＭＤ上如何求解稠密线性代数方程组，数据平衡与机间通讯是两个最大的影响因素。本文针对超立方体连接的分布式ＭＩＭＤ系统上高斯消去法的具体实现展开了讨论。首先，我们介绍两种非选主元的高期去法的通讯策略，然后将其推广到选主元的高斯消去法，最后提出一种新的算法。使处理机效率大大提高，基本达到全并行工作。部分已有实验数据也在文中给相似文献

9.

A survey of recent developments in parallel implementations of Gaussian elimination

Simplice Donfack Jack Dongarra Mathieu Faverge Mark Gates Jakub Kurzak Piotr Luszczek Ichitaro Yamazaki 《Concurrency and Computation》2015,27(5):1292-1309

Gaussian elimination is a canonical linear algebra procedure for solving linear systems of equations. In the last few years, the algorithm has received a lot of attention in an attempt to improve its parallel performance. This article surveys recent developments in parallel implementations of Gaussian elimination for shared memory architecture. Five different flavors are investigated. Three of them are based on different strategies for pivoting: partial pivoting, incremental pivoting, and tournament pivoting. The fourth one replaces pivoting with the Partial Random Butterfly Transformation, and finally, an implementation without pivoting is used as a performance baseline. The technique of iterative refinement is applied to recover numerical accuracy when necessary. All parallel implementations are produced using dynamic, superscalar, runtime scheduling and tile matrix layout. Results on two multisocket multicore systems are presented. Performance and numerical accuracy is analyzed. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

10.

Hidden Surface Elimination on Parallel Processors

Julian C. Highfield Helmut E. Bez 《Computer Graphics Forum》1992,11(5):293-307

相似文献

11.

基于MPI的并行算法性能分析器的设计

常静《现代计算机》2008,(3):106-108

以MPI为基础,以高斯消去法求解线性方程组的并行算法为实例,设计了分析并行算法性能的分析器,该分析器能够调度网络上多台计算机协同合作进行并行计算,并分析并行算法相对于串行算法的性能优势. 相似文献

12.

A Framework for Functional Specification and Transformation of Hidden Surface Elimination Algorithms

A.L. Lakshminarasimhan Mandayam Srivas 《Computer Graphics Forum》1989,8(2):75-98

This paper deals with the theory of hidden surface elimination algorithms in Computer Graphics. A set of functions and abstract data types are defined to help concisely specify a class of hidden surface elimination algorithms in a purely functional language. A formal study of these algorithms is presented here along with theorems of equivalence between some of the specifications. It is shown here that such proofs of equivalence will help in the construction of existing as well as new algorithms. We bring in the importance of such a study to exploit alternative parallel architectures for implementation of these algorithms. The other benefits of formal specification and analysis are due to its use as a teaching aid and effective method for rapid prototyping of these algorithms. 相似文献

13.

A Parallel Elimination Method for "Periodic" Tridiagonal Systems

《国际计算机数学杂志》2012,89(4):473-484

We extend the elimination method of Chawla and Khazal [1] to uncouple partitioned blocks of "periodic" tridiagonal linear systems. At each step of the elimination stage, we now need three simultaneous eliminations: within each block, one usual forward elimination and one backward elimination from across the succeeding block, and one elimination in the last row of the last block. An interesting feature of the present elimination procedure is that at the end of it, the property of periodicity of the original system is now passed on to the core system . Once the core system is solved, the blocks uncouple and the solution is obtained in parallel from each block by back substitution. For a system of size N , the classical elimination has an arithmetical operations count of 17N . A best serial algorithm, based on the Sherman-Morrison formula, has an operations count of 0 14N . The present parallel elimination algorithm, employing p partition blocks, has an operations count of O 17N p and, in comparison with the Sherman-Morrison algorithm, it can achieve an efficiency of nearly 82% on a p -processor machine. 相似文献

14.

多Transputer系统上的并行消隐算法及其图形显示

王艳春许有信《计算机应用》1994,14(3):34-36

本文讨论了三维物体隐面消除的并行处理问题。给出了一类ＭＩＭＤ并行深度缓冲器算法，并在多Ｔｒａｎｓｐｕｔｅｒ系统上实现。文中还对这些算法的效率进行了比较。相似文献

15.

PAPS — A testbed for performance prediction of parallel applications

《Parallel Computing》1997,22(13):1837-1851

The PAPS (Performance Analysis of Parallel Systems) toolset is a testbed for the model based performance prediction of message passing parallel applications executed on private memory multiprocessor computer systems. PAPS allows to describe the execution behavior of the computer hardware and operating system software resources up to a very detailed level. This enables very accurate performance prediction of parallel applications even in the case of substantial performance degradation due to contention for shared resources. In this paper the fundamental design principles and implementation methodologies for the development of the PAPS toolset are presented and the PAPS parallel system specification formalisms are described. A simplified performance study of a parallel Gaussian elimination application on the nCUBE 2 multiprocessor system is used to demonstrate the usage of the tool. 相似文献

16.

使用CUDA平台关于并行高斯-约当消去法的研究与比较

毛飞陈智骏梁效斐曹奇英《计算机应用与软件》2011,(9)

使用CUDA平台,提出在通用图形处理器(GPGPU)上实现并行的全选主元、归一和消去等操作,加速实现并行全选主元高斯-约当消去法求解线性方程组的一种基本方法。该方法在CPU上完成解向量的恢复。根据NVIDIA公司最新Fermi架构图形处理器的特点,通过一系列的优化设计,使通用GPGPU相对Intel最新架构CPU的加速比超过了6.5倍,比Intel上一代CPU的加速比超过了10倍。相似文献

17.

Triangularization of a positive definite matrix on a parallel computer

Swarn P. Kumar Janusz S. Kowalik 《Journal of Parallel and Distributed Computing》1986,3(4)

The problem of computing the triangular factors of a square, real, symmetric, and positive definite matrix by using the facilities of a multiprocessor MIMD-type computer is considered. The parallel algorithms based on Cholesky decomposition and Gaussian elimination are derived and analyzed in terms of their speedup and efficiency, when the available number of processors is O(n), where n is the size of the matrix. It is shown that the parallel elimination method can achieve the same speedup as the parallel Cholesky method while using only half the number of processors required by the Cholesky method. 相似文献

18.

碰撞问题中的量词消去算法的并行化研究

吴素萍王定康《微计算机信息》2007,23(32):251-252,293

机器人技术中的碰撞问题可以被表示成量词消去问题，但由于有些碰撞问题的复杂性使得这些问题在单个微机上求解需要花费的时间很长或者根本就解不出来。本文提出了基于分布Maple系统下量词消去算法的并行化．并针对分布Maple系统的特点以及算法的特点，通过实例分析，给出了两种并行策略，以达到在Maple软件环境下提高处理器利用率，提高量词消去算法的效率的目的。相似文献

19.

Region analysis: a parallel elimination method for data flowanalysis

Yong-Fong Lee Ryder B.G. Fiuczynski M.E. 《IEEE transactions on pattern analysis and machine intelligence》1995,21(11):913-926

Parallel data flow analysis methods offer the promise of calculating detailed semantic information about a program at compile-time more efficiently than sequential techniques. Previous work on parallel elimination methods (Zobel, 1990) has been hampered by the lack of control over interval size; this can prohibit effective parallel execution of these methods. To overcome this problem, we have designed the region analysis method, a new elimination method for data flow analysis. Region analysis emphasizes flow graph partitioning to enable better load balancing in a more effective parallel algorithm. We present the design of region analysis and the empirical results we have obtained that indicate: the prevalence of large intervals in flow graphs derived from real programs; and the performance improvement of region analysis over parallel Allen-Cocke interval analysis. Our implementation analyzed programs from the Perfect Benchmarks and netlib running on a Sequent Symmetry S81 相似文献

20.

Efficient mapping and implementation of matrix algorithms on a hypercube

Vladimir Cherkassky Ross Smith 《The Journal of supercomputing》1988,2(1):7-27

It is well known that parallelism by itself does not lead to higher speeds. This study shows how to put parallelism to best use, that is, how to find an optimal balance between communication and computation overheads for two parallel matrix algorithms. The problem graph for matrix algorithms analyzed in this paper is a two-dimensional grid (toroidal mesh) which is mapped onto a hypercube topology. To perform matrix operations on a hypercube, a matrix is partitioned into several submatrices which are stored and manipulated in the nodes. We seek to find an optimal matrix partitioning to minimize overall execution time. The NCUBE parallel machine is used for experimental performance evaluation. For matrix multiplication, we derive an exact analytical model to determine the optimal partitioning size and perform its experimental verification on the NCUBE parallel processor. For a parallel Gaussian elimination known as the balanced algorithm, we present performance measurements and an approximate analytical model for performance evaluation. Our analyses show that the optimal submatrix size is typically small and does not depend on the original matrix size. 相似文献