首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
We present a parallel hybrid asynchronous method to solve large sparse linear systems by the use of a large parallel machine. This method combines a parallel GMRES(m) algorithm with the least squares method that needs some eigenvalues obtained from a parallel Arnoldi algorithm. All of the algorithms run on different processors of an IBM SP3 or IBM SP4 computer simultaneously. This implementation of this hybrid method allows us to take advantage of the parallelism available and to accelerate the convergence by decreasing considerably the number of iterations.  相似文献   

2.
In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer  相似文献   

3.
An algorithm for computing the exact least trimmed squares (LTS) estimator of the standard regression model has recently been proposed. The LTS algorithm is adapted to the general linear and seemingly unrelated regressions models with possible singular dispersion matrices. It searches through a regression tree to find the optimal estimates and has combinatorial complexity. The model is formulated as a generalized linear least squares problem. Efficient matrix techniques are employed to update the generalized residual sum of squares of a subset model. Specifically, the new algorithm utilizes previous computations to update a generalized QR decomposition by a single row. The sparse structure of the model is exploited. Theoretical measures of computational complexity are provided. Experimental results confirm the ability of the new algorithms to identify outlying observations.  相似文献   

4.
We consider three algorithms for solving linear least squares problems based upon the modified Huang algorithm (MHA) in the ABS class for linear systems recently introduced by Abaffy, Broyden and Spedicato. The first algorithm uses an explicit QR factorization of the coefficient matrixA computed by applying MHA to the matrixA T . The second and the third algorithm is based upon two representations of the Moore-Penrose pseudoinverse constructed with the use of MHA. The three algorithms are tested on a large set of problems and compared with the NAG code using QR factorization with Householder rotations. The comparison shows that the algorithms based on MHA are generally more accurate.  相似文献   

5.
Quaternionic least squares (QLS) is an efficient method for solving approximate problems in quaternionic quantum theory. Based on Paige's algorithms LSQR and residual-reducing version of LSQR proposed in Paige and Saunders [LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Softw. 8(1) (1982), pp. 43–71], we provide two matrix iterative algorithms for finding solution with the least norm to the QLS problem by making use of structure of real representation matrices. Numerical experiments are presented to illustrate the efficiency of our algorithms.  相似文献   

6.
Linear least squares problems are commonly solved by QR factorization. When multiple solutions need to be computed with only minor changes in the underlying data, knowledge of the difference between the old data set and the new can be used to update an existing factorization at reduced computational cost. We investigate the viability of implementing QR updating algorithms on GPUs and demonstrate that GPU-based updating for removing columns achieves speed-ups of up to 13.5× compared with full GPU QR factorization. We characterize the conditions under which other types of updates also achieve speed-ups.  相似文献   

7.
Parallel algorithms for direct methods of analysis and solution of linear algebra problems with sparse symmetric irregularly structured matrices are considered. The performance of such algorithms is investigated. Upper estimates of the speedup and efficiency factors are obtained for a parallel algorithm for triangular decomposition of sparse matrices. Some results of numerical experiments carried out on a MIMD computer are given.  相似文献   

8.
最小二乘孪生支持向量机通过求解两个线性规划问题来代替求解复杂的二次规划问题,具有计算简单和训练速度快的优势。然而,最小二乘孪生支持向量机得到的超平面易受异常点影响且解缺乏稀疏性。针对这一问题,基于截断最小二乘损失提出了一种鲁棒最小二乘孪生支持向量机模型,并从理论上验证了模型对异常点具有鲁棒性。为使模型可处理大规模数据,基于表示定理和不完全Cholesky分解得到了新模型的稀疏解,并提出了适合处理带异常点的大规模数据的稀疏鲁棒最小二乘孪生支持向量机算法。数值实验表明,新算法比已有算法分类准确率、稀疏性、收敛速度分别提高了1.97%~37.7%、26~199倍和6.6~2 027.4倍。  相似文献   

9.
In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifications and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm and/or the LMS (Adaline) Widrow-Hoff algorithms. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results.  相似文献   

10.
A new least squares solution for obtaining asymptotically unbiased and consistent estimates of unknown parameters in noisy linear systems is presented. The proposed algorithms are in many ways more advantageous than generalized least squares algorithm. Extensions to on-line and multivariable problems can be easily implemented. Examples are given to illustrate the performance of these new algorithms.  相似文献   

11.
基于VBLAST-OFDM系统中传统QR分解算法,将最大似然和并行干扰消除的思想引入QR分解算法中,提出了对QR分解算法的改进方法,克服了传统QR分解算法检测性能差的缺点,运用QR分解从最后两层信号开始,等效为2收2发的MIMO系统。用最优算法检测判决后,回代到原QR分解算法检测余下层信号;或者依次并行消除已判决信号的影响,再进行下个等效MIMO系统的判决,直至所有信号检测完毕。仿真结果表明,本文改进的QR分解检测算法比传统的QR算法和迫零算法在误码性能上得到改善。  相似文献   

12.
赵杰  张春元  刘超  周辉  欧宜贵  宋淇 《自动化学报》2022,48(8):2050-2061
针对循环神经网络(Recurrent neural networks, RNNs)一阶优化算法学习效率不高和二阶优化算法时空开销过大,提出一种新的迷你批递归最小二乘优化算法.所提算法采用非激活线性输出误差替代传统的激活输出误差反向传播,并结合加权线性最小二乘目标函数关于隐藏层线性输出的等效梯度,逐层导出RNNs参数的迷你批递归最小二乘解.相较随机梯度下降算法,所提算法只在RNNs的隐藏层和输出层分别增加了一个协方差矩阵,其时间复杂度和空间复杂度仅为随机梯度下降算法的3倍左右.此外,本文还就所提算法的遗忘因子自适应问题和过拟合问题分别给出一种解决办法.仿真结果表明,无论是对序列数据的分类问题还是预测问题,所提算法的收敛速度要优于现有主流一阶优化算法,而且在超参数的设置上具有较好的鲁棒性.  相似文献   

13.
In this paper, new noniterative algorithms for the identification of (multivariable) block-oriented nonlinear models consisting of the interconnection of linear time invariant systems and static nonlinearities are presented. The proposed algorithms are numerically robust, since they are based only on least squares estimation and singular value decomposition. Two different block-oriented nonlinear models are considered in this paper, viz., the Hammerstein model, and the Wiener model. For the Hammerstein model, the proposed algorithm provides consistent estimates even in the presence of colored output noise, under weak assumptions on the persistency of excitation of the inputs. For the Wiener model, consistency of the estimates can only be guaranteed in the noise free case. Key in the derivation of the results is the use of basis functions for the representation of the linear and nonlinear parts of the models. The performance of the proposed identification algorithms is illustrated through simulation examples of two benchmark problems drawn from the process control literature, viz., a binary distillation column and a pH neutralization process.  相似文献   

14.
王晞阳  陈继林  李猛  刘首文 《计算机工程》2022,48(7):199-205+213
在电力系统仿真中,大型稀疏矩阵的求解会消耗大量存储和计算资源,未有效利用矩阵的稀疏性将导致存储空间浪费以及计算效率低下的问题。当前关于稀疏矩阵求解算法的研究主要针对众核加速硬件,聚焦于挖掘层次集合的并行度以提升算法的并行效率,而在众核处理器架构上频繁地进行缓存判断及细粒度访问可能导致潜在的性能问题。针对基于现场可编程门阵列(FPGA)的下三角稀疏矩阵求解问题,在吴志勇等设计的FPGA稀疏矩阵求解器硬件结构的基础上,提出一种静态调度求解算法。通过对稀疏矩阵进行预处理,设计数据分布和指令排布流程,将下三角稀疏矩阵的求解过程静态映射到多个FPGA片上的处理单元,以实现下三角稀疏矩阵在FPGA上的并行高速求解。将串行算法中所有的隐式并行关系排布到缓冲中,使得所有计算单元都能实现计算、访存和单元间通信的高效并行,从而最大限度地利用FPGA的硬件资源。典型算例上的测试结果表明,相较传统的CPU/GPU求解算法,该算法能够实现5~10倍的加速效果。  相似文献   

15.
As multicore systems continue to gain ground in the high‐performance computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine‐grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data (referred to as ‘tiles’). These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out‐of‐order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

16.
An important factorization algorithm for polynomials over finite fields was developed by Niederreiter. The factorization problem is reduced to solving a linear system over the finite field in question, and the solutions are used to produce the complete factorization of the polynomial into irreducibles. One charactersistic feature of the linear system arising in the Niederreiter algorithm is the fact that, if the polynomial to be factorized is sparse, then so is the Niederreiter matrix associated with it. In this paper, we investigate the special case of factoring trinomials over the binary field. We develop a new algorithm for solving the linear system using sparse Gaussian elmination with the Markowitz ordering strategy. Implementing the new algorithm to solve the Niederreiter linear system for trinomials over F2 suggests that, the system is not only initially sparse, but also preserves its sparsity throughout the Gaussian elimination phase. When used with other methods for extracting the irreducible factors using a basis for the solution set, the resulting algorithm provides a more memory efficient and sometimes faster sequential alternative for achieving high degree trinomial factorizations over F2.  相似文献   

17.
An algorithm is proposed for self-tuning optimal fixed-lag smoothing or filtering for linear discrete-time multivariable processes. Az-transfer function solution to the discrete multivariable estimation problem is first presented. This solution involves spectral factorization of polynomial matrices and assumes knowledge of the process parameters and the noise statistics. The assumption is then made that the signal-generating process and noise statistics are unknown. The problem is reformulated so that the model is in an innovations signal form, and implicit self-tuning estimation algorithms are proposed. The parameters of the innovation model of the process can be estimated using an extended Kalman filter or, alternatively, extended recursive least squares. These estimated parameters are used directly in the calculation of the predicted, smoothed, or filtered estimates. The approach is an attempt to generalize the work of Hagander and Wittenmark.  相似文献   

18.
Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
Two algorithms for solving the piecewise linear least–squares approximation problem of plane curves are presented. The first is for the case when the L 2 residual (error) norm in any segment is not to exceed a pre–assigned value. The second algorithm is for the case when the number of segments is given and a (balanced) L 2 residual norm solution is required. The given curve is first digitized and either algorithm is then applied to the discrete points. For each segment, we obtain the upper triangular matrix R in the QR factorization of the (augmented) coefficient matrix of the resulting system of linear equations. The least–squares solutions are calculated in terms of the R (and Q) matrices. The algorithms then work in an iterative manner by updating the least–squares solutions for the segments via up dating the R matrices. The calculation requires as little computational effort as possible. Numerical results and comments are given. This, in a way, is a tutorial paper.  相似文献   

20.
Parallel programming is elusive. The relative performance of different parallel implementations varies with machine architecture, system and problem size. How to compare different implementations over a wide range of machine architectures and problem sizes has not been well addressed due to its difficulty. Scalability has been proposed in recent years to reveal scaling properties of parallel algorithms and machines. In this paper, the relation between scalability and execution time is carefully studied. The concepts of crossing point analysis and range comparison are introduced. Crossing point analysis finds slow/fast performance crossing points of parallel algorithms and machines. Range comparison compares performance over a wide range of ensemble and problem size via scalability and crossing point analysis. Three algorithms from scientific computing are implemented on an Intel Paragon and an IBM SP2 parallel computer. Experimental and theoretical results show how the combination of scalability, crossing point analysis, and range comparison provides a practical solution for scalable performance evaluation and prediction. While our testings are conducted on homogeneous parallel computers, the proposed methodology applies to heterogeneous and network computing as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号