首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gaussian elimination and LU factoring have been greatly studied from the algorithmic point of view, but much less from the point view of the best output format. In this paper, we give new output formats for fraction free LU factoring and for QR factoring. The formats and the algorithms used to obtain them are valid for any matrix system in which the entries are taken from an integral domain, not just for integer matrix systems. After discussing the new output format of LU factoring, the complexity analysis for the fraction free algorithm and fraction free output is given. Our new output format contains smaller entries than previously suggested forms, and it avoids the gcd computations required by some other partially fraction free computations. As applications of our fraction free algorithm and format, we demonstrate how to construct a fraction free QR factorization and how to solve linear systems within a given domain.  相似文献   

2.
QR and LU decompositions are the most important matrix decomposition algorithms. Many studies work on accelerating these algorithms by FPGA or ASIC in a case by case style. In this paper, we propose a unified framework for the matrix decomposition algorithms, combining three QR decomposition algorithms and LU algorithm with pivoting into a unified linear array structure. The QR and LU decomposition algorithms exhibit the same two-level loop structure and the same data dependency. Utilizing the similarities in loop structure and data dependency of matrix decomposition, we unify a fine-grained algorithm for all four matrix decomposition algorithms. Furthermore, we present a unified co-processor structure with a scalable linear array of processing elements (PEs), in which four types of PEs are same in the structure of memory channels and PE connections, but the only difference exists in the internal structure of data path. Our unified co-processor, which is IEEE 32-bit floating-point precision, is implemented and mapped onto a Xilinx Virtex5 FPGA chip. Experimental results show that our co-processors can achieve speedup of 2.3 to 14.9 factors compared to a Pentium Dual CPU with double SSE threads.  相似文献   

3.
基于有限元总刚矩阵的大规模稀疏性、对称性等特性,采用全稀疏存储结构以及最小填入元算法,使得计算机的存储容量达到最少。为了节省计算机的运算时间,对总刚矩阵进行符号LU分解方法,大大减少了数值求解过程中的数据查询。这种全稀疏存储结构和符号LU分解相结合的求解方法,使大规模稀疏线性化方程组的求解效率大大提高。数值算例证明该算法在时间和存贮上都较为占优,可靠高效,能够应用于有限元线性方程组的求解。  相似文献   

4.
Strassen's algorithm for fast matrix-matrix multiplication has been implemented for matrices of arbitrary shapes on the CRAY-2 and CRAY Y-MP supercomputers. Several techniques have been used to reduce the scratch space requirement for this algorithm while simultaneously preserving a high level of performance. When the resulting Strassen-based matrix multiply routine is combined with some routines from the new LAPACK library, LU decomposition can be performed with rates significantly higher than those achieved by conventional means. We succeeded in factoring a 2048 × 2048 matrix on the CRAY Y-MP at a rate equivalent to 325 MFLOPS.This work is supported through NASA Contract NAS 2-12961.  相似文献   

5.
A new algorithm is presented for computing the solution of a Hankel system with integer entries by means of structured matrix techniques. By combining subresultant theory and factorization properties of Hankel matrices, we prove that this algorithm has a Boolean sequential cost which is almost optimal among the algorithms based on fast integer LU factorization.  相似文献   

6.
LU, QR, and Cholesky factorizations are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Most of these factorization routines are implemented with block‐partitioned algorithms in order to perform matrix–matrix operations, that is, to obtain the highest performance by maximizing reuse of data in the upper levels of memory, such as cache. Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different from one another in order to generate the maximum performance of an algorithm. Therefore, the ata matrix should be distributed with the machine specific optimal block size before the computation. Too small or large a block size makes achieving good performance on a machine nearly impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix. In this paper, we present parallel LU, QR, and Cholesky factorization routines with an ‘algorithmic blocking’ on two‐dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routines are implemented on the Intel Paragon and the SGI/Cray T3E and compared with the corresponding ScaLAPACK factorization routines. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

7.
韩林  张春海  徐建良 《计算机科学》2016,43(Z11):520-522
针对保密性较高的工作数据或者其他原因导致的内外网物理隔离环境下数据交换困难的问题,通过对二维码生成和解析过程的研究,并且利用二维码可以携带数据、成本低和可随载体移动的特性,提出了使用二维码来解决一些特殊情况下的数据交换问题。由于单个二维码可携带数据有限,提出利用Protocol Buffer格式和LZMA压缩算法来简化和压缩需要通过二维码传输的数据,对于大型数据则采取多个二维码的组合方式传输。另外还简述了基于二维码的数据交换的应用前景。  相似文献   

8.
9.
A variant of the fraction free form of Gaussian elimination is presented. This algorithm reduces the amount of arithmetic involved when the matrix has many zero entries. The advantage can be great for matrices with symbolic entries (integers, polynomials, expressions in trigonometric functions, etc.). These claims are supported with some analysis and experimental data.  相似文献   

10.

One of the important design problems in systolic array processing is the development of a systematic methodology for transforming an algorithm represented in some high level constructs into a systolic architecture specified by the timing of data movement and the interconnection of processing elements such that the design requirements are satisfied. In this paper a method using the SFG (signal flow graph) of a given algorithm to design systolic arrays through graphic mapping and retiming is presented. An algorithm is first represented by a DG (dependence graph). Then the DG is mapped into an SFG by a graph projection. Cut-set retiming procedures are then applied to derive a regular localised SFG from which a systolic array design can be obtained for the given matrix examples i.e. LU and QR decompositions.  相似文献   

11.
VFP数据表中的通用型字段中可存储声音、图像、动画等类型的文件,文件添加到通用型字段中VFP提供了相应的命令格式,但将通用型字段中的文件导出恢复到原对应的格式文件却没有提供相应的命令,本文介绍了如何利用文件格式标识确定文件存储位置并导出文件的方法。  相似文献   

12.
The realization of linear time-varying systems specified by an analytic weighting pattern is approached in a novel manner using an algebraic framework defined over the ring of analytic functions. Realizations are given by a state representation consisting of a first-order vector differential equation and an output equation, both with analytic coefficients. Various new criteria for realizability are derived, including conditions given in terms of the finiteness of modules over the ring of analytic functions generated by the elementary rows or columns of a (generalized) Hankel matrix. These results are related to local criteria for realizability specified in terms of the rank of matrix functions, as developed in the work of Silverman and Meadows [5], [8], [9] and Kalman [7]. It is shown that the construction of minimal realizations reduces to the problem of computing a basis for a finite free module defined over the ring of analytic functions. A minimal realization algorithm is then derived using a constructive procedure for computing bases for finite free modules over a Bezout domain. The Silverman-Meadows realization algorithm [5] is a special case of the procedure given here. In the last part of the paper, the realization algorithm is applied to the problem of system reduction.  相似文献   

13.
By a tensor problem in general, we mean one where all the data on input and output are given (exactly or approximately) in tensor formats, the number of data representation parameters being much smaller than the total amount of data. For such problems, it is natural to seek for algorithms working with data only in tensor formats maintaining the same small number of representation parameters—by the price of all results of computation to be contaminated by approximation (recompression) to occur in each operation. Since approximation time is crucial and depends on tensor formats in use, in this paper we discuss which are best suitable to make recompression inexpensive and reliable. We present fast recompression procedures with sublinear complexity with respect to the size of data and propose methods for basic linear algebra operations with all matrix operands in the Tucker format, mostly through calls to highly optimized level-3 BLAS/LAPACK routines. We show that for three-dimensional tensors the canonical format can be avoided without any loss of efficiency. Numerical illustrations are given for approximate matrix inversion via proposed recompression techniques.   相似文献   

14.
In solving application problems,many large-scale nonlinear systems of equaions result in sparse Jacobian matrices.Such nonlinear systems are called sparse nonlinear systems.The irregularity of the locations of nonzrero elements of a general sparse matrix makes it very difficult to generally map sparse matrix computations to multiprocessors for parallel processing in a well balanced manner.To overcome this difficulty,we define a new storage scheme for general sparse matrices in this paper,With the new storage scheme,we develop parallel algorithms to solve large-scale general sparse systems of equations by interval Newton/Generalized bisection methods which reliably find all numerical solutions within a given domain.I n Section 1,we provide an introduction to the addressed problem and the interval Newton‘s methods.In Section 2,some currently used storage schemes for sparse systems are reviewed.In Section 3,new index schemes to store general sparse matrices are reported.In Section 4,we present a parallel algorithm to evaluate a general sparse Jacobian matrix.In Section 5,we present a parallel algorithm to solve the corresponding interval linear system by the all-row preconditioned scheme.Conclusions and future work are discussed in Section 6.  相似文献   

15.
The paper presents a general purpose free format input data system. This free format input data system is designed and presented as an interface between the user and the parent program. The system converts user's free format input data set into the fixed format data set of the parent program. Its features include data entries free of format specifications, appearance of leading, trailing and embedded blanks in a data line, horizontal repetition of data entries, vertical repetition of data entries, data group identification, comment data lines, tab settings, i.e. skipping to predetermined fields, elimination of group terminators and teletype compatibility. Organization of the program is described in detail. Full notation, sample example problem—its input and program output, program listing with comment cards and its implementation details are also provided. This input system is ideally suited as a “front end” for a general purpose finite element program. The system has been in use with the “NISA” finite element computer program since early 1976.  相似文献   

16.
Visual formats have advanced beyond single‐view images and videos: 3D movies are commonplace, researchers have developed multi‐view navigation systems, and VR is helping to push light field cameras to mass market. However, editing tools for these media are still nascent, and even simple filtering operations like color correction or stylization are problematic: naively applying image filters per frame or per view rarely produces satisfying results due to time and space inconsistencies. Our method preserves and stabilizes filter effects while being agnostic to the inner working of the filter. It captures filter effects in the gradient domain, then uses input frame gradients as a reference to impose temporal and spatial consistency. Our least‐squares formulation adds minimal overhead compared to naive data processing. Further, when filter cost is high, we introduce a filter transfer strategy that reduces the number of per‐frame filtering computations by an order of magnitude, with only a small reduction in visual quality. We demonstrate our algorithm on several camera array formats including stereo videos, light fields, and wide baselines.  相似文献   

17.
In this note we present a constructive technique for obtaining polynomial matrix fraction representations for reachable linear systems over a principal ideal domain. An example is given to illustrate the algorithm.  相似文献   

18.
对于MIMO系统而言,其复杂度主要集中在检测上。在分析现有MIMO检测算法的基础上,给出了一种新的基于Givens旋转的排序QR分解检测器的实现方法。该方法通过将复数域的矩阵进行对称变换,转化到实数域进行QR分解,大大降低了运算量。依据上述方法,给出了硬件实现流程和模块结构图。软件仿真和硬件实现结果表明该检测方法在保证检测性能的基础上,大大降低了其硬件实现的复杂度,节省了FPGA中宝贵的乘法器资源和逻辑资源,保证了后续MIMO原理样机的研制。  相似文献   

19.
基于GPU的高性能稀疏矩阵向量乘及CG求解器优化   总被引:1,自引:1,他引:0  
以有限元/有限差分等为代表的一类数值方法,其总体矩阵常常具有“带状”、稀疏的特点。针对“带状”稀疏矩阵,提出和实现了一种高效的矩阵向量乘存储格式和算法“bDIA"。基于nVidia的GTX280系列GPU对其进行了测试,结果显示:与CUSP支持的5种常见稀疏矩阵存储格式和算法相比较,所提出的bDIA格式以及相应的spMV算法的单双精度浮点效率均可以提高1倍以上,并突破了该系列GPU在spMV计算时4%的单精度浮点效率上限和22.2%的双精度浮点效率上限;应用于共扼梯度(CG)与稳定双共扼梯度(BiCGStab)求解器,相对于DIA格式均有1.5倍左右的加速。  相似文献   

20.
We present a lattice algorithm specifically designed for some classical applications of lattice reduction. The applications are for lattice bases with a generalized knapsack-type structure, where the target vectors have bounded depth. For such applications, the complexity of the algorithm improves traditional lattice reduction by replacing some dependence on the bit-length of the input vectors by some dependence on the bound for the output vectors. If the bit-length of the target vectors is unrelated to the bit-length of the input, then our algorithm is only linear in the bit-length of the input entries, which is an improvement over the quadratic complexity floating-point LLL algorithms. To illustrate the usefulness of this algorithm we show that a direct application to factoring univariate polynomials over the integers leads to the first complexity bound improvement since 1984. A second application is algebraic number reconstruction, where a new complexity bound is obtained as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号