首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this work, efficient algorithms for sparse computations (reordering algorithms, storage schemes, symbolic factorization, master degree-of-freedom, L1D1U numerical factorization, forward and backward solutions) are developed and integrated into the proposed procedures. In order to exploit fast saxpy operations offered by many vector computers, take advantage of available cache in many workstations, and minimize data movements into fast memory, special storage schemes are designed to store the coefficient (unsymmetrical) matrix. Thus, the upper triangular portion of the coefficient matrix is stored in a compressed sparse row format, while the lower triangular portion of the same matrix is stored in a compressed sparse column format. A reordering algorithm is applied on one portion of the matrix to minimize fill-ins. Unsymmetrical matrix–vector multiplication has also been sparsely computed for “error-norm check” purpose. The entire sparse procedures have been coded in standard Fortran language.  相似文献   

2.
We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k.  相似文献   

3.
In this paper we show how to utilize Java's native arrays for matrix computations. The disadvantages of Java arrays used as a 2D array for dense matrix computation are discussed and ways to improve the performance are examined. We show how to create efficient dynamic data structures for sparse matrix computations using Java's native arrays. This data structure is unique for Java and shown to be more dynamic and efficient than the traditional storage schemes for large sparse matrices. Numerical testing indicates that this new data structure, called Java Sparse Array, is competitive with the traditional Compressed Row Storage scheme on matrix computation routines. Java gives increased flexibility without losing efficiency. Compared with other object‐oriented data structures Java Sparse Array is shown to have the same flexibility. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

4.
In earlier work we have introduced the “Recursive Sparse Blocks” (RSB) sparse matrix storage scheme oriented towards cache efficient matrix–vector multiplication (SpMV) and triangular solution (SpSV) on cache based shared memory parallel computers. Both the transposed (SpMV_T) and symmetric (SymSpMV) matrix–vector multiply variants are supported. RSB stands for a meta-format: it recursively partitions a rectangular sparse matrix in quadrants; leaf submatrices are stored in an appropriate traditional format — either Compressed Sparse Rows (CSR) or Coordinate (COO). In this work, we compare the performance of our RSB implementation of SpMV, SpMV_T, SymSpMV to that of the state-of-the-art Intel Math Kernel Library (MKL) CSR implementation on the recent Intel’s Sandy Bridge processor. Our results with a few dozens of real world large matrices suggest the efficiency of the approach: in all of the cases, RSB’s SymSpMV (and in most cases, SpMV_T as well) took less than half of MKL CSR’s time; SpMV’s advantage was smaller. Furthermore, RSB’s SpMV_T is more scalable than MKL’s CSR, in that it performs almost as well as SpMV. Additionally, we include comparisons to the state-of-the art format Compressed Sparse Blocks (CSB) implementation. We observed RSB to be slightly superior to CSB in SpMV_T, slightly inferior in SpMV, and better (in most cases by a factor of two or more) in SymSpMV. Although RSB is a non-traditional storage format and thus needs a special constructor, it can be assembled from CSR or any other similar row-ordered representation arrays in the time of a few dozens of matrix–vector multiply executions. Thanks to its significant advantage over MKL’s CSR routines for symmetric or transposed matrix–vector multiplication, in most of the observed cases the assembly cost has been observed to amortize with fewer than fifty iterations.  相似文献   

5.
The solution of a large class of problems requires the repeated evaluation of matrix vector products: y = Ax. An appropriate data decomposition and communications system to exchange x components among processors is necessary for efficient evaluation of these vector products on an MIMD concurrent computer. A communications system is presented for the case of a sparse matrix A that arises from a finite element or finite difference discretization of a partial differential equation on an irregular region, or from some kind of finite range interaction between particles. The method presented here uses a domain decomposition of the physical space to distribute A and x among processors. A packed form of the matrix is used which turns out to be very convenient to set up the data structures necessary to send and receive the extra x components. The resulting communications scheme has been used in a multigrid solver for finite element static elasticity problems and in a program which solves an eigenvalue problem. Speed up factors were determined on a 32 processor Caltech/JPL Mark II hypercube with good results. The communications system is not hypercube specific and can easily be implemented on other types of MIMD parallel computers.  相似文献   

6.
The sparse matrix vector product (SpMV) is a key operation in engineering and scientific computing and, hence, it has been subjected to intense research for a long time. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim of maximizing the performance. Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for NVIDIA GPUs have already appeared on the scene. This work proposes and evaluates a new implementation of SpMV for NVIDIA GPUs based on a new format, ELLPACK‐R, that allows storage of the sparse matrix in a regular manner. A comparative evaluation against a variety of storage formats previously proposed has been carried out based on a representative set of test matrices. The results show that, although the performance strongly depends on the specific pattern of the matrix, the implementation based on ELLPACK‐R achieves higher overall performance. Moreover, a comparison with standard state‐of‐the‐art superscalar processors reveals that significant speedup factors are achieved with GPUs. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

7.
针对基于GPU求解大规模稀疏线性方程组进行了研究,提出一种稀疏矩阵的分块存储格式HMEC(hybrid multiple ELL and CSR)。通过重排序优化系数矩阵的存储结构,将系数矩阵以一定的比例分块存储,采用ELL与CSR存储格式相结合的方式以适应不同的分块特征,分别使用适用于不对称矩阵的不完全LU分解预处理BICGStab法和对称正定矩阵的不完全Cholesky分解预处理共轭梯度法求解大规模稀疏线性系统。实验表明,应用HMEC格式存储稀疏矩阵并以调用GPU kernel的方式实现前述两种方法,与其他存储格式的实现方式作比较,最优可分别获得31.89%和17.50%的加速效果。  相似文献   

8.
Many high performance computing applications require computing both sparse matrix‐vector product (SMVP) and sparse matrix‐transpose vector product (SMTVP) for better overall performance. Under such a circumstance, it is critical to maintain a similarly high throughput for these two computing patterns with the underlying sparse matrix encoded in a single storage format. The compressed sparse block (CSB) format proposed by Buluç et al. allows computing both problems on multi‐core CPUs with nearly identical throughputs. On the other hand, a direct porting of CSB to graphics processing units (GPUs), which have been recently recognized as a powerful general purpose computing platform, turns out to be inefficient. In this work, we propose a new data structure, designated as expanded CSB (eCSB), to minimize the throughput gap between SMVP and SMTVP computations on GPUs, while at the same time enable a high computing throughput. We also use a hybrid storage format to store elements in each block, which can be selected dynamically at runtime. Experimental results show that the proposed techniques implemented on a Kepler GPU delivers similar throughput on both SMVP and SMTVP and the throughput is up to 13 times faster than that of the CPU‐based CSB implementation. In addition, our eCSB procedure outperforms the previous GPU results by up to 188% and 914% in computing SMVP and SMTVP, and we validate the effectiveness of eCSB by means of wall‐clock time of bi‐conjugate gradient algorithm; our eCSB is 25% faster than Compressed Sparse Rows (CSR) and 6% faster than HYB, respectively. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
A computer program, MRF, has been designed to generate all possible equations for chemical reactions between a chosen set of phases. All subsets of the phases are chosen and arranged into an equation and balanced using the formula Ax = b. A is a m × n matrix of rank n composed of n compositional vectors, b is a m dimensional vector and x is a (n + 1) dimensional vector of reaction coefficients. A m × n matrix also allows the solution to be independent of the components chosen. Provision is made for the exclusion of up to 5-phase assemblages.  相似文献   

10.
开源指令集架构RISC-V具有高性能、模块化、简易性和易拓展等优势,在物联网、云计算等领域的应用日渐广泛,其向量拓展部分V模块更是很好地支持了矩阵数值计算.稀疏矩阵向量乘法SpM V作为矩阵数值计算的一个重要组成部分,具有深刻的研究意义与价值.利用RISC-V指令集的向量可配置性和寻址特性,分别对基于CSR、ELLPA...  相似文献   

11.
12.
《Parallel Computing》1988,6(2):165-184
The multiplication of a vector by a matrix and the solution of triangular linear systems are the most demanding operations in the majority of iterative techniques for the solution of linear systems. Data-driven VLSI networks which perform these two operations, efficiently, for certain sparse matrices are introduced. In order to avoid computations that involve zero operands, the non-zero elements in a sparse matrix are organized in the form of non-overlapping stripes, and only the elements within the stripe structure of the matrix are manipulated. Detailed analysis of the networks proves that both operations may be completed in n global cycles with minimal communication overhead, where n is the order of the linear system. The number of cells in each network as well as the communication overhead, are determined by the stripe structure of the matrix. Different stripe structures for the class of sparse matrices generated in Finite Element Analysis are examined in a separate paper.  相似文献   

13.
Interval Newton/Generalized Bisection methods reliably find all numerical solutions within a given domain. Both computational complexity analysis and numerical experiments have shown that solving the corresponding interval linear system generated by interval Newton's methods can be computationally expensive (especially when the nonlinear system is large). In applications, many large-scale nonlinear systems of equations result in sparse interval jacobian matrices. In this paper, we first propose a general indexed storage scheme to store sparse interval matrices We then present an iterative interval linear solver that utilizes the proposed index storage scheme It is expected that the newly proposed general interval iterative sparse linear solver will improve the overall performance for interval Newton/Generalized bisection methods when the jacobian matrices are sparse. In section 1, we briefly review interval Newton's methods. In Section 2, we review some currently used storage schemes for sparse systems. In Section 3, we introduce a new index scheme to store general sparse matrices. In Section 4, we present both sequential and parallel algorithms to evaluate a general sparse Jacobian matrix. In Section 5, we present both sequential and parallel algorithms to solve the corresponding interval linear system by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6.  相似文献   

14.
In this paper, we present a novel volumetric mesh representation suited for parallel computing on modern GPU architectures. The data structure is based on a compact, ternary sparse matrix storage of boundary operators. Boundary operators correspond to the first‐order top‐down relations of k‐faces to their (k ? 1)‐face facets. The compact, ternary matrix storage format is based on compressed sparse row matrices with signed indices and allows for efficient parallel computation of indirect and bottom‐up relations. This representation is then used in the implementation of several parallel volumetric mesh algorithms including Laplacian smoothing and volumetric Catmull‐Clark subdivision. We compare these algorithms with their counterparts based on OpenVolumeMesh and achieve speedups from 3× to 531×, for sufficiently large meshes, while reducing memory consumption by up to 36%.  相似文献   

15.
In solving application problems,many large-scale nonlinear systems of equaions result in sparse Jacobian matrices.Such nonlinear systems are called sparse nonlinear systems.The irregularity of the locations of nonzrero elements of a general sparse matrix makes it very difficult to generally map sparse matrix computations to multiprocessors for parallel processing in a well balanced manner.To overcome this difficulty,we define a new storage scheme for general sparse matrices in this paper,With the new storage scheme,we develop parallel algorithms to solve large-scale general sparse systems of equations by interval Newton/Generalized bisection methods which reliably find all numerical solutions within a given domain.I n Section 1,we provide an introduction to the addressed problem and the interval Newton‘s methods.In Section 2,some currently used storage schemes for sparse systems are reviewed.In Section 3,new index schemes to store general sparse matrices are reported.In Section 4,we present a parallel algorithm to evaluate a general sparse Jacobian matrix.In Section 5,we present a parallel algorithm to solve the corresponding interval linear system by the all-row preconditioned scheme.Conclusions and future work are discussed in Section 6.  相似文献   

16.
尹孟嘉  许先斌  何水兵  胡婧  叶从欢  张涛 《计算机科学》2017,44(4):182-187, 206
稀疏矩阵向量乘(Sparse matrix-vector multiplication,SPMV)是广泛应用于大规模线性求解系统和求解矩阵特征值等问题的基本运算,但在迭代处理过程中它也常常成为处理的瓶颈,影响算法的整体性能。对于不同形态的矩阵,选择不同的存储格式 ,对应的算法往往会产生较大的性能影响。通过实验分析,找到各种矩阵形态在不同存储结构下体现的性能变化特征,构建一个有效的性能度量模型,为评估稀疏矩阵运算开销、合理选择存储格式做出有效的指导。在14组CSR,COO,HYB格式和8组ELL格式的测试用例下,性能预测模型和测量之间的差异低于9%。  相似文献   

17.
In many scientific applications, dynamic data redistribution is used to enhance algorithm performance and achieve data locality in parallel programs on distributed memory multi-computers. In this paper, we present a new method, Compressed Diagonals Remapping (CDR) technique aims to the efficiency of runtime data redistribution on banded sparse matrices. The main idea of the proposed technique is first to compress the source matrix into a Compressed Diagonal Matrix (CDM) form. Based on the compressed diagonal matrix, a one-dimensional local and global index transformation method can be applied to carry out data redistribution on the compressed diagonal matrix. This process is identical to redistribute data in the two-dimensional banded sparse matrix. The CDR technique uses an efficient one-dimensional indexing scheme to perform data redistribution on banded sparse matrix. A significant improvement of this approach is that a processor does not need to determine the complicated sending or receiving data sets for dynamic data redistribution. The indexing cost is reduced significantly. The second advantage of the present techniques is the achievement of optimal packing/unpacking stages consequent upon the consecutive attribute of column elements in a compressed diagonal matrix. Another contribution of our methods is the ability to handle sparse matrix redistribution under two disjoint processor grids in the source and destination phases. A theoretical model to analyze the performance of the proposed technique is also presented in this paper. To evaluate the performance of our methods, we have implemented the present techniques on an IBM SP2 parallel machine along with the v2m algorithm and a dense redistribution strategy. The experimental results show that our technique provides significant improvement for runtime data redistribution of banded sparse matrices in most test samples.  相似文献   

18.
《Parallel Computing》1988,7(1):111-130
The principal theme herein is the direct hardware implementation on a special-purpose network of processors, the Wavefront Array Processor (WAP), of an alternate matrix procedure for the solution of linear systems Ax = b, where A is a compact dense (n × n) matrix. The method is based on the factorization of the coefficient matrix into components which are of ‘butterfly’ form, i.e., interlocking matrix quadrants, and for its implementation the concept of computational ‘dewavefronts’ is investigated.  相似文献   

19.
The recently introduced circulant block-factorization preconditioners are studied in this paper. The general approach is first formulated for the case of block-tridiagonal sparse matrices. Then an estimate of the condition number of the preconditioned matrix for a model anisotropic Dirichlet boundary value problem is derived in the formκ<√2ε(n+1)+2, whereN=n 2 is the size of the discrete problem, andε stands for the ratio of the anisotropy. Various numerical tests demonstrating the behavior of the circulant block-factorization preconditioners for anisotropic problems are presented.  相似文献   

20.
Let X and Y be generic n by n matrices of indeterminates. Let S = k[x1,…, xr , y1,…, yr] where k is a field of characteristic 0 and r = n2 . Let IS be the ideal generated by the entries of the matrix XY - YX. We will consider the ring S/I and show that it is Cohen-Macaulay for the case n = 4. In order to calculate its Groebner basis we use a product order with 3 blocks of variables and reverse lexicographic order in each block. This makes the computation much smaller and less time consuming.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号