首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we present a fast numerical algorithm for solving nearly penta-diagonal linear systems and show that the computational cost is less than those of three algorithms in El-Mikkawy and Rahmo, [Symbolic algorithm for inverting cyclic penta-diagonal matrices recursively–Derivation and implementation, Comput. Math. Appl. 59 (2010), pp. 1386–1396], Lv and Le [A note on solving nearly penta-diagonal linear systems, Appl. Math. Comput. 204 (2008), pp. 707–712] and Neossi Nguetchue and Abelman [A computational algorithm for solving nearly penta-diagonal linear systems, Appl. Math. Comput. 203 (2008), pp. 629–634.]. In addition, an efficient way of evaluating the determinant of a nearly penta-diagonal matrix is also discussed. The algorithm is suited for implementation using computer algebra systems (CAS) such as MATLAB, MACSYMA and MAPLE. Some numerical examples are given in order to illustrate the efficiency of our algorithm.  相似文献   

2.
A new numerical algorithm for solving nearly penta-diagonal Toeplitz linear systems is presented. The algorithm is suited for implementation using Computer Algebra Systems (CASs) such as MATLAB, MACSYMA and MAPLE. Numerical examples are given in order to illustrate the efficiency of our algorithm.  相似文献   

3.
For an arbitrary n × n matrix A and an n × 1 column vector b, we present a systolic algorithm to solve the dense linear equations Ax = b. An important consideration is that the pivot row can be changed during the execution of our systolic algorithm. The computational model consists of n linear systolic arrays. For 1 ≤ in, the ith linear array is responsible to eliminate the ith unknown variable xi of x. This algorithm requires 4n time steps to solve the linear system. The elapsed time unit within a time step is independent of the problem size n. Since the structure of a PE is simple and the same type PE executes the identical instructions, it is very suitable for VLSI implementation. The design process and correctness proof are considered in detail. Moreover, this algorithm can detect whether A is singular or not.  相似文献   

4.
The partition method of Wang for tridiagonal equations is generalized to the arbitrary band case. A stability criterion is given. The algorithm is compared to Gaussian elimination and cyclic reduction.  相似文献   

5.
提出了分布式环境下求解块三对角线性方程组的一种并行算法,该算法充分利用系数矩阵结构的特殊性,通过对系数矩阵进行适当分解及近似处理,使算法只在相邻处理机间通信两次。并从理论上给出了算法有效的一个充分条件。最后,在HP rx2600集群上进行了数值实验,结果表明,实算与理论是一致的,并行性也很好。  相似文献   

6.
7.
该文提出了分布式环境下求解周期块三对角线性方程组的一种并行算法,该算法通过对系数矩阵进行一次预处理后,充分利用系数矩阵结构的特殊性,使算法只在相邻处理机间通信两次。并从理论上给出了算法收敛的一个充分条件。最后,在HPrx2600集群上进行了数值试验,结果表明,实算与理论是一致的,并行性也很好。  相似文献   

8.
A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O(n/m).  相似文献   

9.
The main difficulty in solving linear Diophantine systems is the very rapid growth of intermediate results which makes many algorithms, for solving linear Diophantine systems, impractical even for large computers. One way for controlling this growth is to use the L 3-reduction algorithm, introduced by Lenstra et al. [A.K. Lenstra, H.W. Lenstra, and L. Lov?sz, Factoring polynomials with rational coefficients, Math. Ann. 261 (1982), pp. 515–534.]. Esmaeili [H. Esmaeili, How can we solve a linear Diophantine equation by the basis reduction algorithm, Int. J. Comput. Math. 82 (2005), pp. 1227–1234.] proposed a method for obtaining the general integer solution of a linear Diophantine equation by using L 3-reduction algorithm. Here we propose a procedure for generalizing Esmaeili's method, to a method for obtaining the general integer solution of systems of linear Diophantine equations by using L 3-reduction algorithm. Then we consider the complexity issues and show that the generalized algorithm controls the growth of the intermediate results and the number of required arithmetic operations well. Finally, some illustrative numerical examples are given to show the efficiency of the proposed algorithm.  相似文献   

10.
Any factorization/back substitution scheme for the solution of linear systems consists of two phases which are different in nature, and hence may be inefficient for parallel implementation on a single computational network. The Gauss-Jordan elimination scheme unifies the nature of the two phases of the solution process and thus seems to be more suitable for parallel architectures, especially if reconfiguration of the communication pattern is not permitted. In this communication, a computational network for the Gauss-Jordan algorithm is presented. This network compares favorably with optimal implementations of the Gauss elimination/back substitution algorithm.  相似文献   

11.
分段线性系统最优控制设计的一种混合算法   总被引:4,自引:0,他引:4  
将分段线性系统的最优控制设计问题转化成以反馈增益为寻优参数,以最优控制性能上界为目标的一组双线性矩阵不等式(BMI)问题.将遗传算法与内点法相结合设计出一种混合算法,对BMI问题进行求解.算例仿真表明该算法是简便而有效的.  相似文献   

12.
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU‐GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasks without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double‐precision Cholesky factorization and QR factorization. Our approach demonstrates a performance comparable to Intel MKL on shared‐memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared‐memory systems with multiple GPUs. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
CUDA架构下大规模稠密线性方程组的并行求解   总被引:1,自引:0,他引:1       下载免费PDF全文
在Gauss-Jordan消去法的基础上,给出了一种适应于CUDA架构的改进Gauss-Jordan消去并行算法。通过分析该方法的处理过程以及CUDA架构的相应限制,在CUDA的grid-block-thread三层组织结构的基础上,从算法构造的角度提出了grid-strip-group-block-thread五层结构,给出了基础行以及全局基础行等概念,并构建了适应于CUDA架构的Gauss-Jordan消去法的并行版本,在最高维数为4 000维的大规模稠密线性方程组的算例求解上与串行Gauss-Jordan消去法进行了比较,实验结果表明,该算法能够充分利用GPU的硬件特性,有效地降低了大规模稠密线性方程组的求解时间。  相似文献   

14.
HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000).The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations.The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform.Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors.

New version program summary

Program title: Huge Dense System Solver (HDSS)Catalogue identifier: AEHU_v1_1Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 87 062No. of bytes in distributed program, including test data, etc.: 1 069 110Distribution format: tar.gzProgramming language: Fortran90, CComputer: Parallel architectures: multiprocessors, computer clustersOperating system: Linux/UnixHas the code been vectorized or parallelized?: Yes, includes MPI primitives.RAM: Tested for up to 190 GBClassification: 6.5External routines: MPI (http://www.mpi-forum.org/), BLAS (http://www.netlib.org/blas/), PLAPACK (http://www.cs.utexas.edu/~plapack/), POOCLAPACK (ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution).Catalogue identifier of previous version: AEHU_v1_0Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533Does the new version supersede the previous version?: YesNature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities.Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient.Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic.Summary of revisions: Version 1.1
  • • 
    Can be used to solve linear systems using double-precision arithmetic.
  • • 
    New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment.
Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.  相似文献   

15.
Randomized algorithms are gaining ground in high-performance computing applications as they have the potential to outperform deterministic methods, while still providing accurate results. We propose a randomized solver for distributed multicore architectures to efficiently solve large dense symmetric indefinite linear systems that are encountered, for instance, in parameter estimation problems or electromagnetism simulations. The contribution of this paper is to propose efficient kernels for applying random butterfly transformations and a new distributed implementation combined with a runtime (PaRSEC) that automatically adjusts data structures, data mappings, and the scheduling as systems scale up. Both the parallel distributed solver and the supporting runtime environment are innovative. To our knowledge, the randomization approach associated with this solver has never been used in public domain software for symmetric indefinite systems. The underlying runtime framework allows seamless data mapping and task scheduling, mapping its capabilities to the underlying hardware features of heterogeneous distributed architectures. The performance of our software is similar to that obtained for symmetric positive definite systems, but requires only half the execution time and half the amount of data storage of a general dense solver.  相似文献   

16.
This document describes our freely distributed Maple library spectra, for Semidefinite Programming solved Exactly with Computational Tools of Real Algebra. It solves linear matrix inequalities with symbolic computation in exact arithmetic and it is targeted to small-size, possibly degenerate problems for which symbolic infeasibility or feasibility certificates are required.  相似文献   

17.
This paper describes the implementation and performance results for a few standard linear algebra routines on the Denelcor HEP computer. The algorithms used here are based on high-level modules that facilitate portability and perform efficiently in a wide range of environments. The modules are chosen to be of a large enough computational granularity so that reasonably optimum performance may be insured. The design of algorithms with such fundamental modules in mind will also facilitate their replacement by others more suited to gain the desired performance on a particular computer architecture.  相似文献   

18.
In this note we prove a technical result concerning the closure of finite N × N Hankel matrices of rank n. In contrast to a corresponding result of Deistler and Hannan, this partial compactification of the space of transfer functions of order n contains boundary points which cannot be interpreted as linear systems of order < n. However, the algebraic geometric structure of is rather simple and of interest for the analysis of critical points of certain distance functions which arise in model reduction.  相似文献   

19.
三角形方程组的一种分布式并行算法   总被引:8,自引:3,他引:5  
提出了分布式环境下求解三角形方程组的一种新的并行算法,该算法基于将系数矩阵和右端顶分,并将其以块行卷帘方式存储在各处理器的局部存储器,利用通信与计算重叠的技术,取得了比块列扫描算法好的效果,当方程组具有多重右端项时,效果尤为突出。文中给出了在YH3M计算机上该算法的数值试验结果及其与块列扫描算法的数值比较结果。  相似文献   

20.
改进的求解线性方程组的并行Arnoldi方法   总被引:1,自引:1,他引:0       下载免费PDF全文
以Galerkin原理为基础,提出了求解循环块三对角线性方程组的并行算法。根据系数矩阵的稀疏性,选取适当的子空间的基,使算法不但不会发生中断,并从理论上证明了当系数矩阵对称正定时,该并行算法收敛。最后,在HP rx2600集群上进行的数值实验结果表明,该算法的并行效率很高,理论和实际计算相一致。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号