期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A hybrid GMRES/LS-arnoldi method to accelerate the parallel solution of linear systems

《Computers & Mathematics with Applications》2006,51(11):1647-1662

We present a parallel hybrid asynchronous method to solve large sparse linear systems by the use of a large parallel machine. This method combines a parallel GMRES(m) algorithm with the least squares method that needs some eigenvalues obtained from a parallel Arnoldi algorithm. All of the algorithms run on different processors of an IBM SP3 or IBM SP4 computer simultaneously. This implementation of this hybrid method allows us to take advantage of the parallelism available and to accelerate the convergence by decreasing considerably the number of iterations. 相似文献

2.

Highly scalable parallel algorithms for sparse matrix factorization

Gupta A. Karypis G. Kumar V. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):502-520

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer 相似文献

3.

Matrix strategies for computing the least trimmed squares estimation of the general linear and SUR models

Marc Hofmann Erricos John Kontoghiorghes 《Computational statistics & data analysis》2010,54(12):3392-3403

An algorithm for computing the exact least trimmed squares (LTS) estimator of the standard regression model has recently been proposed. The LTS algorithm is adapted to the general linear and seemingly unrelated regressions models with possible singular dispersion matrices. It searches through a regression tree to find the optimal estimates and has combinatorial complexity. The model is formulated as a generalized linear least squares problem. Efficient matrix techniques are employed to update the generalized residual sum of squares of a subset model. Specifically, the new algorithm utilizes previous computations to update a generalized QR decomposition by a single row. The sparse structure of the model is exploited. Theoretical measures of computational complexity are provided. Experimental results confirm the ability of the new algorithms to identify outlying observations. 相似文献

4.

Solving linear least squares by orthogonal factorization and pseudoinverse computation via the modified Huang algorithm in the ABS class

Prof. E. Spedicato E. Bodon 《Computing》1989,42(2-3):195-205

We consider three algorithms for solving linear least squares problems based upon the modified Huang algorithm (MHA) in the ABS class for linear systems recently introduced by Abaffy, Broyden and Spedicato. The first algorithm uses an explicit QR factorization of the coefficient matrixA computed by applying MHA to the matrixA ^T. The second and the third algorithm is based upon two representations of the Moore-Penrose pseudoinverse constructed with the use of MHA. The three algorithms are tested on a large set of problems and compared with the NAG code using QR factorization with Householder rotations. The comparison shows that the algorithms based on MHA are generally more accurate. 相似文献

5.

Matrix iterative algorithms for least-squares problem in quaternionic quantum theory

《国际计算机数学杂志》2012,89(3):727-745

Quaternionic least squares (QLS) is an efficient method for solving approximate problems in quaternionic quantum theory. Based on Paige's algorithms LSQR and residual-reducing version of LSQR proposed in Paige and Saunders [LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Softw. 8(1) (1982), pp. 43–71], we provide two matrix iterative algorithms for finding solution with the least norm to the QLS problem by making use of structure of real representation matrices. Numerical experiments are presented to illustrate the efficiency of our algorithms. 相似文献

6.

Implementing QR factorization updating algorithms on GPUs

Robert Andrew Nicholas Dingle 《Parallel Computing》2014

Linear least squares problems are commonly solved by QR factorization. When multiple solutions need to be computed with only minor changes in the underlying data, knowledge of the difference between the old data set and the new can be used to update an existing factorization at reduced computational cost. We investigate the viability of implementing QR updating algorithms on GPUs and demonstrate that GPU-based updating for removing columns achieves speed-ups of up to 13.5× compared with full GPU QR factorization. We characterize the conditions under which other types of updates also achieve speed-ups. 相似文献

7.

Algorithms of parallel computations for linear algebra problems with irregularly structured matrices

A. N. Khimich A. V. Popov V. V. Polyankoa 《Cybernetics and Systems Analysis》2011,47(6):973-985

Parallel algorithms for direct methods of analysis and solution of linear algebra problems with sparse symmetric irregularly structured matrices are considered. The performance of such algorithms is investigated. Upper estimates of the speedup and efficiency factors are obtained for a parallel algorithm for triangular decomposition of sparse matrices. Some results of numerical experiments carried out on a MIMD computer are given. 相似文献

8.

鲁棒最小二乘孪生支持向量机及其稀疏算法

下载免费PDF全文

靳启帆陈丽徐明亮姜晓恒《计算机工程与应用》2022,58(18):78-89

最小二乘孪生支持向量机通过求解两个线性规划问题来代替求解复杂的二次规划问题,具有计算简单和训练速度快的优势。然而,最小二乘孪生支持向量机得到的超平面易受异常点影响且解缺乏稀疏性。针对这一问题,基于截断最小二乘损失提出了一种鲁棒最小二乘孪生支持向量机模型,并从理论上验证了模型对异常点具有鲁棒性。为使模型可处理大规模数据,基于表示定理和不完全Cholesky分解得到了新模型的稀疏解,并提出了适合处理带异常点的大规模数据的稀疏鲁棒最小二乘孪生支持向量机算法。数值实验表明,新算法比已有算法分类准确率、稀疏性、收敛速度分别提高了1.97%~37.7%、26~199倍和6.6~2 027.4倍。相似文献

9.

Simplified neural networks for solving linear least squares andtotal least squares problems in real time 总被引：1，自引：0，他引：1

Cichocki A. Unbehauen R. 《Neural Networks, IEEE Transactions on》1994,5(6):910-923

In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifications and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm and/or the LMS (Adaline) Widrow-Hoff algorithms. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results. 相似文献

10.

On least squares algorithms for system parameter identification

Hsia T. 《Automatic Control, IEEE Transactions on》1976,21(1):104-108

A new least squares solution for obtaining asymptotically unbiased and consistent estimates of unknown parameters in noisy linear systems is presented. The proposed algorithms are in many ways more advantageous than generalized least squares algorithm. Extensions to on-line and multivariable problems can be easily implemented. Examples are given to illustrate the performance of these new algorithms. 相似文献

11.

VBLAST-OFDM系统中改进的QR分解检测算法

傅洪亮李永杰张元管爱红《数据采集与处理》2011,26(1)

基于VBLAST-OFDM系统中传统QR分解算法,将最大似然和并行干扰消除的思想引入QR分解算法中,提出了对QR分解算法的改进方法,克服了传统QR分解算法检测性能差的缺点,运用QR分解从最后两层信号开始,等效为2收2发的MIMO系统。用最优算法检测判决后,回代到原QR分解算法检测余下层信号;或者依次并行消除已判决信号的影响,再进行下个等效MIMO系统的判决,直至所有信号检测完毕。仿真结果表明,本文改进的QR分解检测算法比传统的QR算法和迫零算法在误码性能上得到改善。相似文献

12.

递归最小二乘循环神经网络

赵杰张春元刘超周辉欧宜贵宋淇《自动化学报》2022,48(8):2050-2061

针对循环神经网络(Recurrent neural networks, RNNs)一阶优化算法学习效率不高和二阶优化算法时空开销过大,提出一种新的迷你批递归最小二乘优化算法.所提算法采用非激活线性输出误差替代传统的激活输出误差反向传播,并结合加权线性最小二乘目标函数关于隐藏层线性输出的等效梯度,逐层导出RNNs参数的迷你批递归最小二乘解.相较随机梯度下降算法,所提算法只在RNNs的隐藏层和输出层分别增加了一个协方差矩阵,其时间复杂度和空间复杂度仅为随机梯度下降算法的3倍左右.此外,本文还就所提算法的遗忘因子自适应问题和过拟合问题分别给出一种解决办法.仿真结果表明,无论是对序列数据的分类问题还是预测问题,所提算法的收敛速度要优于现有主流一阶优化算法,而且在超参数的设置上具有较好的鲁棒性. 相似文献

13.

Identification of block-oriented nonlinear systems using orthonormal bases

Juan C. Gmez Enrique Baeyens 《Journal of Process Control》2004,14(6):667

In this paper, new noniterative algorithms for the identification of (multivariable) block-oriented nonlinear models consisting of the interconnection of linear time invariant systems and static nonlinearities are presented. The proposed algorithms are numerically robust, since they are based only on least squares estimation and singular value decomposition. Two different block-oriented nonlinear models are considered in this paper, viz., the Hammerstein model, and the Wiener model. For the Hammerstein model, the proposed algorithm provides consistent estimates even in the presence of colored output noise, under weak assumptions on the persistency of excitation of the inputs. For the Wiener model, consistency of the estimates can only be guaranteed in the noise free case. Key in the derivation of the results is the use of basis functions for the representation of the linear and nonlinear parts of the models. The performance of the proposed identification algorithms is illustrated through simulation examples of two benchmark problems drawn from the process control literature, viz., a binary distillation column and a pH neutralization process. 相似文献

14.

FPGA架构上面向稀疏矩阵求解的静态调度算法

王晞阳陈继林李猛刘首文《计算机工程》2022,48(7):199-205+213

在电力系统仿真中,大型稀疏矩阵的求解会消耗大量存储和计算资源,未有效利用矩阵的稀疏性将导致存储空间浪费以及计算效率低下的问题。当前关于稀疏矩阵求解算法的研究主要针对众核加速硬件,聚焦于挖掘层次集合的并行度以提升算法的并行效率,而在众核处理器架构上频繁地进行缓存判断及细粒度访问可能导致潜在的性能问题。针对基于现场可编程门阵列（FPGA）的下三角稀疏矩阵求解问题,在吴志勇等设计的FPGA稀疏矩阵求解器硬件结构的基础上,提出一种静态调度求解算法。通过对稀疏矩阵进行预处理,设计数据分布和指令排布流程,将下三角稀疏矩阵的求解过程静态映射到多个FPGA片上的处理单元,以实现下三角稀疏矩阵在FPGA上的并行高速求解。将串行算法中所有的隐式并行关系排布到缓冲中,使得所有计算单元都能实现计算、访存和单元间通信的高效并行,从而最大限度地利用FPGA的硬件资源。典型算例上的测试结果表明,相较传统的CPU/GPU求解算法,该算法能够实现5~10倍的加速效果。相似文献

15.

A New Sparse Gaussian Elimination Algorithm and the Niederreiter Linear System for Trinomials over F 2

Fatima K. Abu Salem 《Computing》2006,77(2):179-203

An important factorization algorithm for polynomials over finite fields was developed by Niederreiter. The factorization problem is reduced to solving a linear system over the finite field in question, and the solutions are used to produce the complete factorization of the polynomial into irreducibles. One charactersistic feature of the linear system arising in the Niederreiter algorithm is the fact that, if the polynomial to be factorized is sparse, then so is the Niederreiter matrix associated with it. In this paper, we investigate the special case of factoring trinomials over the binary field. We develop a new algorithm for solving the linear system using sparse Gaussian elmination with the Markowitz ordering strategy. Implementing the new algorithm to solve the Niederreiter linear system for trinomials over F₂ suggests that, the system is not only initially sparse, but also preserves its sparsity throughout the Gaussian elimination phase. When used with other methods for extracting the irreducible factors using a basis for the solution set, the resulting algorithm provides a more memory efficient and sometimes faster sequential alternative for achieving high degree trinomial factorizations over F₂. 相似文献

16.

Parallel tiled QR factorization for multicore architectures

Alfredo Buttari Julien Langou Jakub Kurzak Jack Dongarra 《Concurrency and Computation》2008,20(13):1573-1590

As multicore systems continue to gain ground in the high‐performance computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine‐grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data (referred to as ‘tiles’). These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out‐of‐order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

17.

Optimal self-tuning filtering, prediction, and smoothing for discrete multivariable processes 总被引：1，自引：0，他引：1

Moir T. Grimble M. 《Automatic Control, IEEE Transactions on》1984,29(2):128-137

An algorithm is proposed for self-tuning optimal fixed-lag smoothing or filtering for linear discrete-time multivariable processes. Az-transfer function solution to the discrete multivariable estimation problem is first presented. This solution involves spectral factorization of polynomial matrices and assumes knowledge of the process parameters and the noise statistics. The assumption is then made that the signal-generating process and noise statistics are unknown. The problem is reformulated so that the model is in an innovations signal form, and implicit self-tuning estimation algorithms are proposed. The parameters of the innovation model of the process can be estimated using an extended Kalman filter or, alternatively, extended recursive least squares. These estimated parameters are used directly in the calculation of the predicted, smoothed, or filtered estimates. The approach is an attempt to generalize the work of Hagander and Wittenmark. 相似文献

18.

Supernodal sparse Cholesky factorization on graphics processing units

Dan Zou Yong Dou Song Guo Rongchun Li Lin Deng 《Concurrency and Computation》2014,26(16):2713-2726

Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

19.

Piece wise linear least–squares approximation of planar curves

NABIH N. ABDELMALEK 《International journal of systems science》2013,44(7):1393-1403

Two algorithms for solving the piecewise linear least–squares approximation problem of plane curves are presented. The first is for the case when the L ₂ residual (error) norm in any segment is not to exceed a pre–assigned value. The second algorithm is for the case when the number of segments is given and a (balanced) L ₂ residual norm solution is required. The given curve is first digitized and either algorithm is then applied to the discrete points. For each segment, we obtain the upper triangular matrix R in the QR factorization of the (augmented) coefficient matrix of the resulting system of linear equations. The least–squares solutions are calculated in terms of the R (and Q) matrices. The algorithms then work in an iterative manner by updating the least–squares solutions for the segments via up dating the R matrices. The calculation requires as little computational effort as possible. Numerical results and comments are given. This, in a way, is a tutorial paper. 相似文献

20.

Distributed generic approximate sparse inverses

George A. Gravvanis Christos K. Filelis-Papadopoulos 《The Journal of supercomputing》2014,70(1):365-384

The need for accuracy in the solution of linear systems derived from the discretization of partial differential equations leads to large sparse linear systems. The solution of sparse linear systems requires efficient scalable methods. Iterative solvers require efficient parallel preconditioning methods to solve effectively sparse linear systems. Herewith, a new parallel algorithm for the generic approximate sparse inverse matrix method for distributed memory systems is proposed. The computation of the distributed generic approximate sparse inverse matrix is based on a column-wise approach, which allows the separation to independent problems that can be handled in parallel without synchronization points or intermediate communications. This is achieved by reforming the generic approximate sparse inverse matrix algorithm and its process of computation with a new partial solution method for the computation of the nonzero elements of each column dictated by the approximate inverse sparsity pattern. Moreover, an algorithmic scheme is proposed for the efficient distribution of data amongst the available workstations, along with a load balancing scheme for problems with large standard deviation in the number of nonzero elements per column. Numerical results are presented for the proposed schemes for various model problems. 相似文献