期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A scalable parallel preconditioned conjugate gradient method for bundle adjustment

Peng Jiaxin Liu Jie Wei Hua 《Applied Intelligence》2022,52(1):753-765

Bundle adjustment is a fundamental problem in computer vision, with important applications such as 3D structure reconstruction from 2D images. This paper focuses on large-scale bundle adjustment tasks, e.g., city-wide 3D reconstruction, which require highly efficient solutions. For this purpose, it is common to apply the Levenberg-Marquardt algorithm, whose bottleneck lies in solving normal equations. The majority of recent methods focus on achieving scalability through modern hardware such as GPUs and distributed systems. On the other hand, the core of the solution, i.e., the math underlying the optimizer for the normal equations, remains largely unimproved since the proposal of the classic parallel bundle adjustment (PBA) algorithm, which increasingly becomes a major limiting factor for the scalability of bundle adjustment.

This paper proposes parallel preconditioned conjugate gradient (PPCG) method, a novel parallel method for bundle adjustment based on preconditioned conjugate gradient, which achieves significantly higher efficiency and scalability than existing methods on the algorithmic level. The main idea is to exploit the sparsity of the Hessian matrix and reduce its structure parameters through an effective parallel Schur complement method; the result of this step is then fed into our carefully designed PPCG method which reduces matrix operations that are either expensive (e.g., large matrix reverse or multiplications) or scales poorly to multi-processors (e.g., parallel Reduce operators). Extensive experiments demonstrate that PPCG outperforms existing optimizers by large margins, on a wide range of datasets.

相似文献

2.

A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm

《Parallel Computing》2017

In this study, we present a novel optimization model that can automatically and rapidly generate an optimally parallel preconditioned conjugate gradient (PCG) algorithm for any given linear system on a specific multi-graphics processing unit (GPU) platform. For our proposed model, there are the following novelties: (1) a profile-based performance model for each one of the main components of the PCG algorithm, including the vector operation, inner product, and sparse matrix-vector multiplication (SpMV), is suggested, and (2) our model is general, independent of the problems, and only dependent on the resources of devices, and (3) our model is extensible. For a vector operation kernel, or inner product kernel, or SpMV kernel that is not included in our framework, once its performance model is successfully constructed, it can be incorporated into our framework. Our model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model. 相似文献

3.

A partial preconditioned conjugate gradient method for large eigenproblems

《Computer Methods in Applied Mechanics and Engineering》1987,62(2):195-207

In this paper a preconditioned iterative method suitable for the solution of the generalized eigenvalue problem Ax = λBx is presented. The proposed method is suitable for the determination of extreme eigenvalues and their corresponding eigenvectors of large sparse eigenproblems derived from the finite element discretization method. The solution is obtained through the minimization of the Rayleigh quotient by a preconditioned conjugate gradient (CG) method. The proposed triangular splitting preconditioner combines Evans' SSOR preconditioner with a drop-off tolerance criterion and is called partial preconditioner (PPR). The PPR is attractive in a large FE framework because it is simple and can be implemented at the element level, as opposed to incomplete Choleski preconditioners, which require a sparse assembly. Because of the renewed interest in CG techniques for FE work on microprocessors and parallel computers it is believed that this improved approach to the generalized eigenvalue problem, through the minimization of the Rayleigh quotient, is likely to be very promising. 相似文献

4.

A parallel preconditioned conjugate gradient solution method for finite element problems with coarse–fine mesh formulation

A. Zucchini 《Computers & Structures》2000,78(6):781-787

The object of this paper is a parallel preconditioned conjugate gradient iterative solver for finite element problems with coarse-mesh/fine-mesh formulation. An efficient preconditioner is easily derived from the multigrid stiffness matrix. The method has been implemented, for the sake of comparison, both on a IBM-RISC590 and on a Quadrics-QH1, a massive parallel SIMD machine with 128 processors. Examples of solutions of simple linear elastic problems on rectangular grids are presented and convergence and parallel performance are discussed. 相似文献

5.

基于GPU的稀疏线性系统的预条件共轭梯度法

张健飞沈德飞《计算机应用》2013,33(3):825-829

研究了基于GPU的稀疏线性方程组的预条件共轭梯度法加速求解问题,并基于统一计算设备架构(CUDA)平台编制了程序,在NVIDIAGT430 GPU平台上进行了程序性能测试和分析。稀疏矩阵采用压缩稀疏行(CSR)格式压缩存储,针对预条件共轭梯度法的算法特性,研究了基于GPU的稀疏矩阵与向量相乘的性能优化、数据从CPU端传到GPU端的加速传输措施。将编制的稀疏矩阵与向量相乘的kernel函数和CUSPARSE函数库中的cusparseDcsrmv函数性能进行了对比,最优得到了2.1倍的加速效果。对于整个预条件共轭梯度法,通过自编kernel函数来实现的算法较之采用CUBLAS库和CUSPARSE库实现的算法稍具优势,与CPU端的预条件共轭梯度法相比,最优可以得到7.4倍的加速效果。相似文献

6.

A class of parallel two-level nonlinear Schwarz preconditioned inexact Newton algorithms

Feng-Nan Hwang 《Computer Methods in Applied Mechanics and Engineering》2007,196(8):1603-1611

We propose and test a new class of two-level nonlinear additive Schwarz preconditioned inexact Newton algorithms (ASPIN). The two-level ASPIN combines a local nonlinear additive Schwarz preconditioner and a global linear coarse preconditioner. This approach is more attractive than the two-level method introduced in [X.-C. Cai, D.E. Keyes, L. Marcinkowski, Nonlinear additive Schwarz preconditioners and applications in computational fluid dynamics, Int. J. Numer. Methods Fluids, 40 (2002), 1463-1470], which is nonlinear on both levels. Since the coarse part of the global function evaluation requires only the solution of a linear coarse system rather than a nonlinear coarse system derived from the discretization of original partial differential equations, the overall computational cost is reduced considerably. Our parallel numerical results based on an incompressible lid-driven flow problem show that the new two-level ASPIN is quite scalable with respect to the number of processors and the fine mesh size when the coarse mesh size is fine enough, and in addition the convergence is not sensitive to the Reynolds numbers. 相似文献

7.

Nonlinear preconditioned conjugate gradient and least-squares finite elements

《Computer Methods in Applied Mechanics and Engineering》1987,62(2):145-154

A least-squares variational formulation for first-order systems is extended to a class of nonlinear problems. A finite element analysis and associated preconditioners for iterative solution are developed. The preconditioned system is not sensitive to problem (mesh) size as is demonstrated in numerical comparison studies. 相似文献

8.

A Large scale finite element analysis using domain decomposition method on a parallel computer

G. Yagawa N. Soneda S. Yoshimura 《Computers & Structures》1991,38(5-6):615-625

A parallel finite element analysis based on a domain decomposition technique (DDT) is considered. In the present DDT, an analysis domain is divided into a number of smaller subdomains without overlap. Finite element analyses of the subdomains are performed under the constraint of both displacement continuity and force equivalence among them. The constraint is satisfied through iterative calculations based on either the Uzawa algorithm or the Conjugate Gradient (CG) method. Owing to the iterative algorithm, a large scale finite element analysis can be divided into a number of smaller ones which can be carried out in parallel.

The DDT is implemented on a parallel computer network composed of a number of 32-bit microprocessors, transputers. The developed parallel calculation system named the ‘FEM server type system’ involves peculiar features such as network independence and dynamic workload balance.

The characteristics of the domain decomposition method such as computational speed and memory requirement are first examined in detail through the finite element calculations of homogeneous or inhomogeneous cracked plate subjected to a tensile load on a single CPU computer.

The ‘speedup’ and ‘performance’ features of the FEM server type system are discussed on a parallel computer system composed of up to 16 transputers, with changing network types and domain decompositions. It is clearly demonstrated that the present parallel computing system requires a much smaller amount of computational memory than the conventional finite element method and also that, due to the feature of dynamic workload balancing, high performance (over 90%) is achieved even in a large scale finite element calculation with irregular domain decomposition. 相似文献

9.

Domain decomposition preconditioners for the conjugate gradient method

G. Meurant 《Calcolo》1988,25(1-2):103-119

The aim of this paper is to compare different techniques for constructing incomplete Domain Decomposition preconditioners which can be used on parallel computers with a large number of processors (typically from 1 to 100) in connection with the Conjugate Gradient method for solving symmetric linear systems that arise from finite difference discretization of two dimensional elliptic partial differential equations. We summarize some methods which have been presented during the last years and we introduce new ideas to improve the degree of parallelism and the efficiency of our preconditioners. Finally, we show some numerical results that demonstrate the usefulness of the preconditioners described in this paper. 相似文献

10.

Performance and scalability of preconditioned conjugate gradientmethods on parallel computers

Gupta A. Kumar V. Sameh A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(5):455-469

This paper analyzes the performance and scalability of an iteration of the preconditioned conjugate gradient algorithm on parallel architectures with a variety of interconnection networks, such as the mesh, the hypercube, and that of the CM-5 parallel computer. It is shown that for block-tridiagonal matrices resulting from two-dimensional finite difference grids, the communication overhead due to vector inner products dominates the communication overheads of the remainder of the computation on a large number of processors. However, with a suitable mapping, the parallel formulation of a PCG iteration is highly scalable for such matrices on a machine like the CM-5 whose fast control network practically eliminates the overheads due to inner product computation. The use of the truncated Incomplete Cholesky (IC) preconditioner can lead to further improvement in scalability on the CM-5 by a constant factor,as a result, a parallel formulation of the PCG algorithm with IC preconditioner may execute faster than that with a simple diagonal preconditioner even if the latter runs faster in a serial implementation 相似文献

11.

On the restrictively preconditioned conjugate gradient method for solving saddle point problems

Xiao-Fei Peng Wen Li 《国际计算机数学杂志》2016,93(1):142-159

Bai et al. [2003, IMA J Numer. Anal. 23, 561–580] proposed the restrictively preconditioned conjugate gradient (RPCG) method. In this paper, based on the special structure of saddle point systems, we consider the RPCG method and propose a new format. This new format can be obtained by applying the classical PCG method to a simpler system instead of the original format, which greatly reduces computational cost. The new format of the RPCG method can often attain almost the same convergence rate as the original one. In particular, for some practical problems, the former converges faster than the latter. Numerical experiments show the efficiency of the proposed format. 相似文献

12.

基于残余平滑预处理共轭梯度算法的有限元并行计算

付朝江陈洪均《计算机应用》2015,35(12):3387-3391

针对弹塑性问题的有限元分析非常耗时,基于消息传递接口(MPI)集群环境,提出了残余平滑的子结构预处理共轭梯度并行算法。采取区域分解,将子结构通过界面条件处理为独立的有限元模型。整体分析时,每个处理器仅存储与其相关的子结构信息并生成局部刚度矩阵。采用对角存储方式和最小残余平滑法,设计出了结合残余平滑(MR)的并行子结构预处理共轭梯度(PCG)算法。并行算法中对负载平衡进行了探讨,对处理器间的通信进行了优化。利用子步法对弹塑性应力应变进行积分,根据预定的容许值自动调整每个子步的大小来控制积分过程的误差。在工作站集群上实现了数值算例,分析了算法的性能,计算性能与传统的PCG算法进行了比较。算例显示:所提算法具有良好的加速比和效率,优于传统的PCG算法,对弹塑性问题的有限元分析,是一种有效的并行求解算法。相似文献

13.

A conjugate gradient iterative method

P.K. Khosla S.G. Rubin 《Computers & Fluids》1981,9(2):109-121

A strongly implicit pre-conditioned form of the conjugate gradient method is considered. The resulting iterative technique is applicable for sparse systems of difference equations arising from boundary value problems. The method is used to solve two- and three-dimensional potential flows. In addition, it is extended to a 2 x 2 coupled system to solve the Navier-Stokes equations in stream function-vorticity form. 相似文献

14.

A preconditioned conjugate gradient algorithm for GeneRank with application to microarray data mining

Gang Wu Wei Xu Ying Zhang Yimin Wei 《Data mining and knowledge discovery》2013,26(1):27-56

The problem of identifying key genes is of fundamental importance in biology and medicine. The GeneRank model explores connectivity data to produce a prioritization of the genes in a microarray experiment that is less susceptible to variation caused by experimental noise than the one based on expression levels alone. The GeneRank algorithm amounts to solving an unsymmetric linear system. However, when the matrix in question is very large, the GeneRank algorithm is inefficient and even can be infeasible. On the other hand, the adjacency matrix is symmetric in the GeneRank model, while the original GeneRank algorithm fails to exploit the symmetric structure of the problem in question. In this paper, we discover that the GeneRank problem can be rewritten as a symmetric positive definite linear system, and propose a preconditioned conjugate gradient algorithm to solve it. Numerical experiments support our theoretical results, and show superiority of the novel algorithm. 相似文献

15.

Optimization techniques for parallel molecular dynamics using domain decomposition

《Computer Physics Communications》1998,113(2-3):145-167

In this paper we describe the implementation of a new parallelized Molecular Dynamics code for many-particle problems with short-ranged interactions. While the basic algorithms have their foundation in the fairly standard methods of domain decomposition, linked-cell pair search and Verlet pair list, we have developed some refined techniques for optimizing them. The rewards of these optimizations are a up to 45% overall improvement in the scalar performance and very good scaling behavior in the number of processors even down to a few hundred particles per processor on a CRAY T3E.The best speedup can be obtained for systems with pair forces only since then the data structures can be organized in a very simple manner. To deal with more complex situations as well, we have developed a partial replicated data scheme which is suitable to simulate many molecules consisting of many simple particles (e.g. polymer chains) for many types of short-range interactions. 相似文献

16.

A class of explicit preconditioned conjugate gradient methods for solving large finite element systems

《国际计算机数学杂志》2012,89(1-4):189-206

A class of Explicit Preconditioned Conjugate Gradient (EPCG) methods for solving large sparse linear systems of algebraic equations resulting from the Finite Element discretization of Elliptic and Parabolic PDE's is introduced. The EPCG methods are based on explicit Approximate Inverse Matrix techniques and are particularly suitable for solving numerically initial/boundary-value problems on multiprocessor systems. The application of the new methods on 2D-linear boundary-value problems is discussed and numerical results are given. 相似文献

17.

区域分解算法在CoLM模式并行计算中的应用

石建辉蒋宗礼周文波《计算机应用》2012,32(11):2994-2997

结合通用陆面模式（CoLM）的特点,针对模式原始数据区域较大、计算精度要求较低的情况,提出基于加权平均的数据区域分解算法。算法根据网格的地表覆盖类型对网格进行分解,对各参数采用时间一维有限差分法进行离散,并对每个区域所含有的地块数进行加权累加,得到每个区域的最终输出结果。通过CoLM模式中的地表感热通量及蒸散模拟结果对区域分解算法进行验证,并进行并行算法性能分析。相似文献

18.

嵌入共轭梯度法的混合蛙跳算法

《计算机工程与科学》2017,(10):1958-1965

针对基本蛙跳算法在处理复杂函数优化问题时求解精度低且易陷入局部最优的缺点,提出了一种嵌入共轭梯度法的混合蛙跳算法。该算法在基本蛙跳算法划分模因组的基础上引入共轭梯度法,由于基本蛙跳算法模因组的划分规则,使得排在最后的青蛙子群个体位置较差,严重影响着整个群体的寻优速度,因而选取排列在后面的一部分模因组使用共轭梯度法进行求解,这使得算法在进化中后期易跳出局部最优,提高了算法的收敛精度。所得混合蛙跳算法有效结合了基本蛙跳算法较强的全局搜索能力和共轭梯度法快速精确的局部搜索能力。数值实验结果表明,所提出的改进蛙跳算法较基本蛙跳算法具有更高的收敛精度,避免了陷入局部最优的缺点,且优化结果更加稳定。相似文献

19.

Block adaptive shifting algorithm for filtering based on the Toeplitz preconditioned conjugate gradient technique

S.S. R.S. X.F. M.W. X.D. 《Computers & Electrical Engineering》2006,32(6):468-473

We present a new block adaptive algorithm as a variant of the Toeplitz-preconditioned block conjugate gradient (TBCG) algorithm. The proposed algorithm is formulated by combining TBCG algorithm with a data-reusing scheme that is realized by processing blocks of data in an overlapping manner, as in the optimum block adaptive shifting (OBAS) algorithm. Simulation results show that the proposed algorithm is superior to the block conjugate gradient shifting (BCGS), TBCG and Toeplitz-OBAS (TOBAS) algorithms in both convergence rate and tracking property of input signal conditioning. 相似文献

20.

Diakoptics, domain decomposition and parallel computing

Lai C.-H. 《Computer Journal》1994,37(10):840-846

相似文献