首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dao-qi Chen  R. P. Tewarson 《Computing》1984,33(3-4):315-329
A new quasi-Newton method for unconstrained minimization is presented. It uses sparse triple factorization of an approximation to the sparse Hessian matrix. At each step a new column and a corresponding row of the approximation to the Hessian is determined and its triple factorization is updated. Our method deals with the same updating problem as in J. Bräuninger's paper [2]. However, we make use of a rank-two instead of a rank-one updating scheme. Our method saves over half the number of operations required in J. Bräuninger's method. Moreover, our method utilizes the sparsity and, therefore, only the nonzero entries of the factors need to be stored. The positive definiteness can be preserved easily by taking suitable precautions. Under reasonable conditions our method is globally convergent and locally superlinearly evenn-step ρ-order convergent.  相似文献   

2.
A method is presented for updating the Cholesky factorization of a band symmetric matrix modified by a rank-one matrix which has the same band width. Problems which could involve applications of such a method arise frequently in plasticity and structural optimization where repeated solutions of a band algebraic system with a changing matrix are needed. The Cholesky factorization of a stiffness matrix can be updated after modifying a local stiffness matrix which can be written as a sum of a few rank-one matrices. The number of operations required for the updating is of the order mn or less, where n is the dimension of the global matrix and m is its half band width (including the diagonal).  相似文献   

3.
Sparse factorization is a fundamental tool in scientific computing. As the major component of a sparse direct solver, it represents the dominant computational cost for many analyses. For factorizations which involve sufficient dense math, the substantial computational capability provided by GPUs (Graphics Processing Units) can help alleviate this cost. However, for many other cases, the prevalence of small/irregular dense math and the relatively slow communication between the host and device over the PCIe bus, make it challenging to significantly accelerate sparse factorization using the GPU.In this paper we describe a left-looking supernodal Cholesky factorization algorithm which permits improved utilization of the GPU when factoring sparse matrices. The central idea is to stream subtrees of the elimination tree through the GPU and perform the factorization of each subtree entirely on the GPU. This avoids the majority of the PCIe communication without the need for a complex task scheduler. Importantly, within these subtrees, many independent, small, dense operations are batched to minimize kernel launch overhead and many of these batched kernels are executed concurrently to maximize device utilization.Performance results for commonly studied matrices are presented along with suggested actions for further optimization.  相似文献   

4.
In this paper we present a simplified subsurface scattering model that exploits a diffusion mechanism to provide a simpler solution to the transport equation. Our model is based on numerical analysis techniques that are amenable to Cholesky factorization. We treat the factorization as a precomputed scattering quantity which can be used to significantly speed up multiple scattering calculations as the global light source changes. On low resolution meshes, we have been able to achieve real-time solutions of the subsurface scattering while still maintaining good visual quality of the solution.  相似文献   

5.
Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speed up the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current multi-core and many-core SIMD architectures (SSE, AVX2, KNC, AVX512, Neon, Altivec). We focus on the fact that, on some architectures, compilers are unable to vectorize and on other architectures, vectorizing compilers are not efficient. Thus hand-made SIMDization is mandatory. We achieve with these transformations combined with SIMD a speedup from × 14 to × 28 for the whole resolution in single precision compared to the naive code on a AVX2 machine and a speedup from × 6 to × 14 on double precision, both with a strong scalability.  相似文献   

6.
This paper presents a solution to the problem of partitioning the work for sparse matrix factorization to individual processors on a multiprocessor system. The proposed task assignment strategy is based on the structure of the elimination tree associated with the given sparse matrix. The goal of the task scheduling strategy is to achieve load balancing and a high degree of concurrency among the processors while reduçing the amount of processor-to-processor data comnication, even for arbitrarily unbalanced elimination trees. This is important because popular fill-reducing ordering methods, such as the minimum degree algorithm, often produce unbalanced elimination trees. Results from the Intel iPSC/2 are presented for various finite-element problems using both nested dissection and minimum degree orderings.Research supported by the Applied Mathematical Sciences Research Program, Office of Energy Research, U.S. Department of Energy under contract DE-AC05-84OR21400 with Martin Marietta Energy Systems Inc.  相似文献   

7.
《Parallel Computing》1988,7(2):199-210
We develop an algorithm for computing the symbolic Cholesky factorization of a large sparse symmetric positive definite matrix. The algorithm is intended for a message-passing multiprocessor system, such as the hypercube, and is based on the concept of elimination forest. In addition, we provide an algorithm for computing these forests along with a discussion of the algorithm's complexity and a proof of its correctness.  相似文献   

8.
Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algorithm that calls serial Level 3 BLAS is also described. The efficiency of these five algorithms on the CRAY-2, CRAY Y-MP/832, Hitachi Data Systems EX 80, and IBM 3090-600J is evaluated and compared with a vendor-optimized parallel Cholesky factorization algorithm. The fifth parallel Cholesky algorithm that calls serial Level 3 BLAS provided the best performance of all algorithms that called BLAS routines. In fact, this algorithm outperformed the Cray-optimized libsci routine (SPOTRF) by 13–44%;, depending on the problem size and the number of processors used.This work was supported by grants from IMSL, Inc., and Hitachi Data Systems. The first version of this paper was presented as a poster session at Supercomputing '90, New York City, November 1990.  相似文献   

10.
We consider a method to solve constrained system of nonlinear equations based on a modification of the Linear-Programming-Newton method and replacing the first-order information with a quasi-Newton secant update, providing a computationally simple method. The proposed strategy combines good properties of two methods: the least change secant update for unconstrained system of nonlinear equations with isolated solutions and the Linear-Programming-Newton for constrained nonlinear system of equations with possible nonisolated solutions. We analyse the local convergence of the proposed method under a standard error bound condition proving its linear convergence for nonisolated solutions. Numerical experiments were done in order to show the claimed convergence rate.  相似文献   

11.
Preconditioning techniques based on incomplete Cholesky factorization are very efficient in increasing the convergence rates of basic iterative methods. Complicated addressings and high demands for auxiliary storage, or increased factorization time, have reduced their appeal as general purpose preconditioners. In this study an elegant computational implementation is presented which succeeds in reducing both computing storage and factorization time. The proposed implementation is applied to two incomplete factorization schemes. The first is based on the rejection of certain terms according to their magnitude, while the second is based on a rejection criterion relative to the position of the zero terms of the coefficient matrix. Numerical results demonstrate the superiority of the proposed preconditioners over other types of preconditioning matrices, particularly for ill-conditioned problems. They also show their efficiency for large-scale problems in terms of computer storage and CPU time, over a direct solution method using the skyline storage scheme.  相似文献   

12.
一种基于Cholesky分解的动态无偏LS-SVM学习算法   总被引:3,自引:0,他引:3  
蔡艳宁  胡昌华 《控制与决策》2008,23(12):1363-1367
针对最小二乘支持向量机用于在线建模时存在的计算复杂性问题,提出一种动态无偏最小二乘支持向量回归模型.该模型通过改进标准最小二乘支持向量机结构风险的形式消除了偏置项.得到了无偏的最小二乘支持向量机,简化了回归系数的求解.根据模型动态变化过程中核函数矩阵的特点,设计了基于Cholesky分解的在线学习算法.该算法能充分利用历史训练结果,减少计算复杂性.仿真实验表明了所提出模型的有效性.  相似文献   

13.
The present paper is dedicated to the preconditioning of boundary element matrices which are given in wavelet coordinates. We investigate the incomplete Cholesky factorization (ICF) for a pattern which includes also the coefficients of all off-diagonal bands associated with the level–level-interactions. The pattern is chosen in such a way that the ICF is computable in log-linear complexity. Numerical experiments are performed to quantify the effects of the proposed preconditioning.  相似文献   

14.
We present a method, based on a variational problem, for solving a non-smooth unconstrained optimization problem. We assume that the objective function is a Lipschitz continuous and a regular function. In this case the function of our variational problem is semismooth and a quasi-Newton method may be used to solve the variational problem. A convergence theorem for our algorithm and its discrete version is also proved. Preliminary computational results show that the method performs quite well and can compete with other methods.  相似文献   

15.
In this paper, a systematic and unified treatment of computational task models for parallel sparse Cholesky factorization is presented. They are classified as fine-, medium-, and large-grained graph models. In particular, a new medium-grained model based on column-oriented tasks is introduced, and it is shown to correspond structurally to the filled graph of the given sparse matrix. The task scheduling problem for the various task graphs is also discussed. A practical algorithm to schedule the column tasks of the medium-grained model for multiple processors is described. It is based on a heuristic critical path scheduling method. This will give an overall scheme for parallel sparse Cholesky factorization, appropriate for parallel machines with shared-memory architecture like the Denelcor HEP.  相似文献   

16.
We consider the problem of reducing data traffic among processor nodes during the parallel factorization of a sparse matrix on a hypercube multiprocessor. A task assignment strategy based on the structure of an elimination tree is presented. This assignment is aimed at achieving load balancing among the processors and also reducing the amount of processor-to-processor data communication. An analysis of regular grid problems is presented, providing a bound on communication volume generated by the new strategy, and showing that the allocation scheme is optimal in the asymptotic sense. Some experimental results on the performance of this scheme are presented.  相似文献   

17.
Due to its wide applicability, semi-supervised learning is an attractive method for using unlabeled data in classification. In this work, we present a semi-supervised support vector classifier that is designed using quasi-Newton method for nonsmooth convex functions. The proposed algorithm is suitable in dealing with very large number of examples and features. Numerical experiments on various benchmark datasets showed that the proposed algorithm is fast and gives improved generalization performance over the existing methods. Further, a non-linear semi-supervised SVM has been proposed based on a multiple label switching scheme. This non-linear semi-supervised SVM is found to converge faster and it is found to improve generalization performance on several benchmark datasets.  相似文献   

18.
In this paper we study various implementations of Cholesky factorization on SIMD architectures. A submatrix algorithm is implemented on the MasPar MP-2 using both block and torus-wrap data mappings. Both LLT and LDLT (square root free) implementations of the algorithm are investigated. The execution times and performance results for the MP-2 are presented. The performance of these algorithms is characterized in terms of the problem size, machine size and other machine dependent communication and computation parameters. Analysis for the communication and computation complexities for these algorithms is also presented, and models to predict the performance are derived. The torus-wrap mapped implementations outperformed the block approach for all problem sizes. The LDLT implementation outperformed LLT for small to medium problem sizes. © 1997 John Wiley & Sons, Ltd.  相似文献   

19.
Recovering network connectivity structure from high‐dimensional observations is of increasing importance in statistical learning applications. A prominent approach is to learn a Sparse Gaussian Markov Random Field by optimizing regularized maximum likelihood, where the sparsity is induced by imposing L1 norm on the entries of a precision matrix. In this article, we shed light on an alternative objective, where instead of precision, its Cholesky factor is penalized by the L1 norm. We show that such an objective criterion possesses attractive properties that allowed us to develop a very fast Scale‐Free Networks Estimation Through Cholesky factorization (SNETCH) optimization algorithm based on coordinate descent, which is highly parallelizable and can exploit an active set approach. The approach is particularly suited for problems with structures that allow sparse Cholesky factor, an important example being scale‐free networks. Evaluation on synthetically generated examples and high‐impact applications from a biomedical domain of up to more than 900,000 variables provides evidence that for such tasks the SNETCH algorithm can learn the underlying structure more accurately, and an order of magnitude faster than state‐of‐the‐art approaches based on the L1 penalized precision matrix.  相似文献   

20.
One of the widely used methods for solving a nonlinear system of equations is the quasi-Newton method. The basic idea underlining this type of method is to approximate the solution of Newton’s equation by means of approximating the Jacobian matrix via quasi-Newton update. Application of quasi-Newton methods for large scale problems requires, in principle, vast computational resource to form and store an approximation to the Jacobian matrix of the underlying problem. Hence, this paper proposes an approximation for Newton-step based on the update of approximation requiring a computational effort similar to that of matrix-free settings. It is made possible by approximating the Jacobian into a diagonal matrix using the least-change secant updating strategy, commonly employed in the development of quasi-Newton methods. Under suitable assumptions, local convergence of the proposed method is proved for nonsingular systems. Numerical experiments on popular test problems confirm the effectiveness of the approach in comparison with Newton’s, Chord Newton’s and Broyden’s methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号