期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

大规模有限元刚度矩阵存储及其并行求解算法 总被引：1，自引：0，他引：1

纪国良冯仰德《数值计算与计算机应用》2012,33(3):230-240

本文提出一种将有限元单元刚度矩阵直接集成压缩格式的总体刚度矩阵的方法,并针对其线性系统设计了预处理的重启动GMRES(m)并行求解器.集成方法使用了一个“关联结点”的数据结构,它用来记录网格中节点的关联信息,作为集成过程的中间媒介.这种方法能减少大量的存储空间,简单且高效.求解器分别使用Jacobi和稀疏近似逆(SPAI)预条件子.二维和三维弹性力学问题的数值试验表明,在二维情形下,SPAI预条件子具有很好的加速收敛效果和并行效率;在三维情形下,Jacobi预条件子更能减少迭代收敛时间. 相似文献

2.

FPGA架构上面向稀疏矩阵求解的静态调度算法

王晞阳陈继林李猛刘首文《计算机工程》2022,48(7):199-205+213

在电力系统仿真中,大型稀疏矩阵的求解会消耗大量存储和计算资源,未有效利用矩阵的稀疏性将导致存储空间浪费以及计算效率低下的问题。当前关于稀疏矩阵求解算法的研究主要针对众核加速硬件,聚焦于挖掘层次集合的并行度以提升算法的并行效率,而在众核处理器架构上频繁地进行缓存判断及细粒度访问可能导致潜在的性能问题。针对基于现场可编程门阵列（FPGA）的下三角稀疏矩阵求解问题,在吴志勇等设计的FPGA稀疏矩阵求解器硬件结构的基础上,提出一种静态调度求解算法。通过对稀疏矩阵进行预处理,设计数据分布和指令排布流程,将下三角稀疏矩阵的求解过程静态映射到多个FPGA片上的处理单元,以实现下三角稀疏矩阵在FPGA上的并行高速求解。将串行算法中所有的隐式并行关系排布到缓冲中,使得所有计算单元都能实现计算、访存和单元间通信的高效并行,从而最大限度地利用FPGA的硬件资源。典型算例上的测试结果表明,相较传统的CPU/GPU求解算法,该算法能够实现5~10倍的加速效果。相似文献

3.

稀疏线性方程组求解中的预处理技术综述 总被引：1，自引：0，他引：1

骆志刚仲妍吴枫《计算机工程与科学》2010,32(12):89

稀疏线性方程组的高效求解是数值计算方向的研究热点之一,其中包括预处理技术的研究。本文从技术分类的角度,总结了稀疏线性方程组求解中的预处理技术。首先,介绍了填充元缩减策略,旨在减少求解过程中存储量的同时,仍能保持矩阵的稀疏结构;其次,介绍了不同结构系数矩阵的多种匹配技术,旨在获得矩阵的对角优势性;最后,介绍了具有天然并行性的因子分解近似逆预条件子构造方法和不完全分解预条件中的并行求解技术等。相似文献

4.

基于谱分割的稀疏矩阵特征值问题并行求解

《数值计算与计算机应用》2015,(2)

本文给出了一个基于谱分割并行求解稀疏矩阵特征值的方案,将矩阵的特征值求解区间划分为多个独立的子区间,分别对各个子区间内的特征值进行独立的并行求解.在该方案中,提出了一种通过盖尔圆信息估计矩阵特征值分布的方法,并结合二分法以及插值方法修正特征值的分布,提高估计的准确性,进行谱区间分割.本文还结合谱分割和基于围道积分的近似谱投影算法设计出一个特征值问题多级并行算法,并在"深腾7000"和"元"超级计算机上验证了本文提出谱分割方案的有效性、均衡性以及特征值并行求解的高效性.同通用求解方法相比,基于谱区间分割的并行算法在1024核上性能提高了5倍以上,并行求解的可扩展性显著提升. 相似文献

5.

稀疏三角矩阵线性系统的基于树结构并行求解

李程田新民王鼎兴郑纬民《软件学报》1995,6(8):479-485

本文分析了大型稀疏矩阵线性方程组直接法求解的回代过程．基于改进的树结构（Ｍ—ｔｒｅｅ），提出了一种新的面向分布存储多机系统的稀疏三角矩阵线性系统并行Ｆｏｒｗａｒｄ求解算法ＭＰＦＳ．文中讨论了Ｍ—ｔｒｅｅ的结构特征，并将所提出的并行求解算法与基于Ｅｌｉｍｉｎａｔｉｏｎ—ｔｒｅｅ求解算法进行了分析和比较．结果表明，ＭＰＦＳ算法不仅适用于更多的稀疏矩阵系统，而且在求解过程中可以开发Ｅｌｉｍｉｎａｔｉｏｎ—ｔｒｅｅ算法不能开发的计算并行性，从而使求解性能得到显著改进．相似文献

6.

雅克比矩阵近似更新的三维装配约束求解研究

下载免费PDF全文

丁建完侯文洁熊涛《图学学报》2014,35(3):368

提出了三维装配约束求解中雅克比矩阵近似更新的方法。该方法通过对迭代过程中满秩以及行秩秩亏雅克比矩阵进行近似更新,提高了约束求解的效率。首先在非线性迭代求解过程中添加雅克比矩阵及其逆矩阵近似更新的公式;然后给出使用近似更新公式需要满足的限制条件;最后通过对奇异点扰动算法的描述介绍迭代求解过程中雅克比矩阵发生行秩秩亏的处理办法。文中提出的策略与算法已在三维装配约束求解引擎CBABench 中实现,给出的实例表明本文提出的方法效果显著。相似文献

7.

GPU加速不完全Cholesky分解预条件共轭梯度法

陈尧赵永华赵慰赵莲《计算机研究与发展》2015,(4):843-850

不完全 Cholesky 分解预条件共轭梯度（incomplete Cholesky factorization preconditioned conjugate gradient ,ICCG）法是求解大规模稀疏对称正定线性方程组的有效方法。然而ICCG法要求在每次迭代中求解2个稀疏三角方程组,稀疏三角方程组求解固有的串行性成为了ICCG法在GPU上并行求解的瓶颈。针对稀疏三角方程组求解,给出了一种利用GPU 加速的有效方法。为了增加稀疏三角方程组求解在GPU上的多线程并行性,提出了对不完全Cholesky分解产生的稀疏三角矩阵进行分层调度（level scheduling ）的方法。为了进一步提高稀疏三角方程组求解的并行性能,提出了在分层调度前通过近似最小度（approximate minimum degree ,AMD）算法对系数矩阵进行重排序、在分层调度后对稀疏三角矩阵进行层排序的方法,降低了分层调度过程中产生的层数,优化了稀疏三角方程组求解的GPU内存访问模式。数值实验表明,与利用NVIDIA CUSPARSE实现的ICCG法相比,采用上述方法性能可以获得平均1倍以上的提升。相似文献

8.

一种基于改进多光谱的稀疏角CB-XLCT成像方法

张文元海琳琦刘英杰张海波《计算机与现代化》2022,(5):96-101

稀疏角锥束X射线发光断层成像(Sparse-view Cone-Beam X-ray Luminescence Computed Tomography, Sparse-view CB-XLCT)是一种新型的多模光学断层成像技术，在早期肿瘤的实时检测方面展现出了良好的应用潜力。然而，受限于有限投影数据的限制，稀疏角CB-XLCT成像的逆问题相对于传统多角度CB-XLCT，其病态性更为严重。针对上述问题，本文提出一种改进多光谱的稀疏角CB-XLCT成像方法，首先，基于多光谱策略构建系统矩阵并建模逆问题；接着，利用谱回归方法对上一步逆问题中的高维系统矩阵进行特征学习；随后，采用一种矩阵预处理方法有效降低系统矩阵的列相关性并建模为新的逆问题进行准确重建。分别设计多组仿体实验以及噪声测试实验，验证本文所提方法的有效性和鲁棒性。实验结果表明，所提方法不仅可有效求解稀疏角CB-XLCT成像逆问题，还具有良好的鲁棒性。相似文献

9.

三对角矩阵的行列式的并行计算方法

玄兆鹏张莉付晓林郭希娟《计算机工程与应用》2004,40(20):64-66

文章针对三对角矩阵,利用矩阵的Schur余子式求矩阵行列式的方法,提出了一种并行求解三对角矩阵及其逆的行列式的算法,应用该算法可以得到较好的加速度。相似文献

10.

基于图模型的图像分割并行算法研究与实现

应伟勤李元香徐星王玲玲《模式识别与人工智能》2007,20(4)

为了提高图模型方法的分割速度,本文提出该方法的一种并行实现方案.该方案通过网格划分来实现相似度矩阵的并行计算.同时考虑到相似度矩阵的稀疏性和矩阵向量乘运算的内在并行性,在该方案中本文设计并行Lanczos算法来求解特征值问题.在MPI环境下的实验结果表明,该并行方案是提高图模型分割方法实时性的有效途径. 相似文献

11.

稀疏近似逆并行预条件子

迟利华刘杰李晓梅《数值计算与计算机应用》2000,21(2):88-94

1．引言考虑求解线性方程组AX一b,X,bE＊”,山其中A二（a;小＿是大型稀疏非对称矩阵．通常使用迭代法求解式（1）,如GMRESBICGSTAB,CGSTFQMRCGSZ等Kryl0V子空间迭代法．直接使用迭代法的收敛速度有时特别慢,或根本不收敛,需使用预条件以加速迭代法的收敛速度．通常使用左或右预条件子M使式（1）变成易于求解的形式＊M9一6,X二M队或＊AX二＊6．由然后用迭代法求解式（2）,M的选择要使得AM（或M则近似等于单位矩阵．构造预条件子的方法有很多,如不完全分解方法、SSOR方法、多项式方法等,不完全分解方法和SSOR… 相似文献

12.

Distributed generic approximate sparse inverses

George A. Gravvanis Christos K. Filelis-Papadopoulos 《The Journal of supercomputing》2014,70(1):365-384

The need for accuracy in the solution of linear systems derived from the discretization of partial differential equations leads to large sparse linear systems. The solution of sparse linear systems requires efficient scalable methods. Iterative solvers require efficient parallel preconditioning methods to solve effectively sparse linear systems. Herewith, a new parallel algorithm for the generic approximate sparse inverse matrix method for distributed memory systems is proposed. The computation of the distributed generic approximate sparse inverse matrix is based on a column-wise approach, which allows the separation to independent problems that can be handled in parallel without synchronization points or intermediate communications. This is achieved by reforming the generic approximate sparse inverse matrix algorithm and its process of computation with a new partial solution method for the computation of the nonzero elements of each column dictated by the approximate inverse sparsity pattern. Moreover, an algorithmic scheme is proposed for the efficient distribution of data amongst the available workstations, along with a load balancing scheme for problems with large standard deviation in the number of nonzero elements per column. Numerical results are presented for the proposed schemes for various model problems. 相似文献

13.

A rapid method of approximating the asymptote to an iterative sequence

《国际计算机数学杂志》2012,89(1-4):91-95

We discuss a procedure for the adaptive construction of sparse approximate inverse preconditionings for general sparse linear systems. The approximate inverses are based on minimizing a consistent norm of the difference between the identity and the preconditioned matrix. The analysis provides positive definiteness and condition number estimates for the preconditioned system under certain circumstances. We show that for the 1-norm, restricting the size of the difference matrix below 1 may require dense approximate inverses. However, this requirement does not hold for the 2-norm, and similarly reducing the Frobenius norm below 1 does not generally require that much fill-in. Moreover, for the Frobenius norm, the calculation of the approximate inverses yields naturally column-oriented parallelism. General sparsity can be exploited in a straightforward fashion. Numerical criteria are considered for determining which columns of the sparse approximate inverse require additional fill-in. Spare algorithms are discussed for the location of potential fill-in within each column. Results using a minimum-residual-type iterative method are presented to illustrate the potential of the method. 相似文献

14.

Design and implementation of parallel approximate inverse classes using OpenMP

Konstantinos M. Giannoutakis George A. Gravvanis 《Concurrency and Computation》2009,21(2):115-131

A new parallel normalized optimized approximate inverse algorithm, based on the concept of antidiagonal wave pattern, for computing classes of explicitly approximate inverses, is introduced for symmetric multiprocessor systems. The parallel normalized explicit approximate inverses are used in conjunction with parallel normalized explicit preconditioned conjugate gradient schemes for the efficient solution of finite element sparse linear systems. The parallel design and implementation issues of the new algorithm are discussed and the parallel performance is presented using OpenMP. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

15.

Parallel simulation of anisotropic diffusion with human brain DT-MRI Data

Ning Kang Jun Zhang Eric S. Carlson 《Computers & Structures》2004,82(28):2389-2399

We conduct simulations for the 3D unsteady state anisotropic diffusion process with DT-MRI data in the human brain by discretizing the governing diffusion equation on Cartesian grid and adopting a high performance differential-algebraic equation (DAE) solver, the parallel version of implicit differential-algebraic (IDA) solver, to tackle the resulting large scale system of DAEs. Parallel preconditioning techniques including sparse approximate inverse and banded-block-diagonal preconditioners are used with the GMRES method to accelerate the convergence rate of the iterative solution. We then investigate and compare the efficiency and effectiveness of the two parallel preconditioners. The experimental results of the diffusion simulations on a parallel supercomputer show that the sparse approximate inverse preconditioning strategy, which is robust and efficient with good scalability, gives a much better overall performance than the banded-block-diagonal preconditioner. 相似文献

16.

High Performance Inverse Preconditioning 总被引：1，自引：0，他引：1

George A. Gravvanis 《Archives of Computational Methods in Engineering》2009,16(1):77-108

The derivation of parallel numerical algorithms for solving sparse linear systems on modern computer systems and software platforms has attracted the attention of many researchers over the years. In this paper we present an overview on the design issues of parallel approximate inverse matrix algorithms, based on an anti-diagonal “wave pattern” approach and a “fish-bone” computational procedure, for computing explicitly various families of exact and approximate inverses for solving sparse linear systems. Parallel preconditioned conjugate gradient-type schemes in conjunction with parallel approximate inverses are presented for the efficient solution of sparse linear systems. Applications of the proposed parallel methods by solving characteristic sparse linear systems on symmetric multiprocessor systems and distributed systems are discussed and the parallel performance of the proposed schemes is given, using MPI, OpenMP and Java multithreading. 相似文献

17.

A parallel Self Mesh-Adaptive N-body method based on approximate inverses

P. E. Kyziropoulos C. K. Filelis-Papadopoulos G. A. Gravvanis C. Efthymiopoulos 《The Journal of supercomputing》2017,73(12):5197-5220

A new parallel Self Mesh-Adaptive N-body method based on approximate inverses is proposed. The scheme is a three-dimensional Cartesian-based method that solves the Poisson equation directly in physical space, using modified multipole expansion formulas for the boundary conditions. Moreover, adaptive-mesh techniques are utilized to form a class of separate smaller n-body problems that can be solved in parallel and increase the total resolution of the system. The solution method is based on multigrid method in conjunction with the symmetric factored approximate sparse inverse matrix as smoother. The design of the parallel Self Mesh-Adaptive method along with discussion on implementation issues for shared memory computer systems is presented. The new parallel method is evaluated through a series of benchmark simulations using N-body models of isolated galaxies or galaxies interacting with dwarf companions. Furthermore, numerical results on the performance and the speedups of the scheme are presented. 相似文献

18.

基于MP算法的语音信号稀疏分解 总被引：4，自引：1，他引：3

下载免费PDF全文

井爱雯刘云马轶丽《计算机工程与应用》2009,45(5):144-146

语音信号稀疏分解是一种新的语音信号分解方法,可以将语音信号分解为很简洁的近似表达形式。在语音信号稀疏分解的基础上,可应用于语音处理的多个方面,如语音压缩、语音去噪和语音识别等。研究利用Matching Pursuit（MP）算法实现语音信号的稀疏分解,实验结果表明基于MP算法的语音信号稀疏分解具有较好的重建精度和较高的稀疏度。相似文献

19.

Parallel and Systolic Solution of Normalized Explicit Approximate Inverse Preconditioning

Gravvanis G. A. Giannoutakis K. M. Bekakos M. P. Efremides O. B. 《The Journal of supercomputing》2004,30(2):77-96

A new class of normalized approximate inverse matrix techniques, based on the concept of sparse normalized approximate factorization procedures are introduced for solving sparse linear systems derived from the finite difference discretization of partial differential equations. Normalized explicit preconditioned conjugate gradient type methods in conjunction with normalized approximate inverse matrix techniques are presented for the efficient solution of sparse linear systems. Theoretical results on the rate of convergence of the normalized explicit preconditioned conjugate gradient scheme and estimates of the required computational work are presented. Application of the new proposed methods on two dimensional initial/boundary value problems is discussed and numerical results are given. The parallel and systolic implementation of the dominant computational part is also investigated. 相似文献