期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

High Performance Inverse Preconditioning 总被引：1，自引：0，他引：1

George A. Gravvanis 《Archives of Computational Methods in Engineering》2009,16(1):77-108

The derivation of parallel numerical algorithms for solving sparse linear systems on modern computer systems and software platforms has attracted the attention of many researchers over the years. In this paper we present an overview on the design issues of parallel approximate inverse matrix algorithms, based on an anti-diagonal “wave pattern” approach and a “fish-bone” computational procedure, for computing explicitly various families of exact and approximate inverses for solving sparse linear systems. Parallel preconditioned conjugate gradient-type schemes in conjunction with parallel approximate inverses are presented for the efficient solution of sparse linear systems. Applications of the proposed parallel methods by solving characteristic sparse linear systems on symmetric multiprocessor systems and distributed systems are discussed and the parallel performance of the proposed schemes is given, using MPI, OpenMP and Java multithreading. 相似文献

2.

Parallel,iterative solution of sparse linear systems: Models and architectures

Daniel A Reed Merrell L Patrick 《Parallel Computing》1985,2(1):45-67

Solving large, sparse, linear systems of equations is a fundamental problems in large scale scientific and engineering computation. A model of a general class of asynchronous, iterative solution methods for linear systems is developed. In the model, the system is solved by creating several cooperating tasks that each compute a portion of the solution vector. A data transfer model predicting both the probability that data must be transferred between two tasks and the amount of data to be transferred is presented. This model is used to derive an execution time model for predicting parallel execution time and an optimal number of tasks given the dimension and sparsity of the coefficient matrix and the costs of computation, synchronization, and communication.The suitability of different parallel architectures for solving randomly sparse linear systems is discussed. Based on the complexity of task scheduling, one parallel architecture, based on a broadcast bus, is presented and analyzed. 相似文献

3.

Sparse iterative algorithm software for large-scale MIMD machines: An initial discussion and implementation

John N. Shadid Ray S. Tuminaro 《Concurrency and Computation》1992,4(6):481-497

The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures. 相似文献

4.

Performance analysis of parallel Schwarz preconditioners in the LES of turbulent channel flows

Pasqua D’Ambra Daniela di Serafino Salvatore Filippone 《Computers & Mathematics with Applications》2013,65(3):352-361

We present a comparative study of parallel Schwarz preconditioners in the solution of linear systems arising in a Large Eddy Simulation (LES) procedure for turbulent plane channel flows. This procedure applies a time-splitting technique to suitably filtered Navier–Stokes equations, in order to decouple the continuity and momentum equations, and uses a semi-implicit scheme for time integration and finite volumes for space discretisation. This approach requires the solution of four sparse linear systems at each time step, accounting for a large part of the overall simulation; hence the linear system solvers are a crucial component in the whole procedure. Several preconditioners are applied in the simulation of a reference test case for the LES community, using discretisation grids of different sizes, with the aim of analysing the effects of different algorithmic choices defining the preconditioners, and identifying the most effective ones for the selected problem. The preconditioners, coupled with the GMRES method, are run within SParC-LES, a recently developed LES code based on the PSBLAS and MLD2P4 libraries for parallel sparse matrix computations and preconditioning. 相似文献

5.

Performance analysis of direct N-body algorithms for astrophysical simulations on distributed systems

《Parallel Computing》2007,33(3):159-173

We discuss the performance of direct summation codes used in the simulation of astrophysical stellar systems on highly distributed architectures. These codes compute the gravitational interaction among stars in an exact way and have an O(N²) scaling with the number of particles. They can be applied to a variety of astrophysical problems, like the evolution of star clusters, the dynamics of black holes, the formation of planetary systems, and cosmological simulations. The simulation of realistic star clusters with sufficiently high accuracy cannot be performed on a single workstation but may be possible on parallel computers or grids. We have implemented two parallel schemes for a direct N-body code and we study their performance on general purpose parallel computers and large computational grids. We present the results of timing analyzes conducted on the different architectures and compare them with the predictions from theoretical models. We conclude that the simulation of star clusters with up to a million particles will be possible on large distributed computers in the next decade. Simulating entire galaxies however will in addition require new hybrid methods to speedup the calculation. 相似文献

6.

A parallel block row-action method for solving large sparse linear systems on distributed memory multiprocessors

Marco D'apuzzo Maria Assunta De Rosa 《Concurrency and Computation》1994,6(1):69-84

Recently developed block-iterative versions of some row-action algorithms for solving general systems of sparse linear equations allow parallelism in the computations when the underlying problem is appropriately decomposed. However, problems associated with the parallel implementation of these algorithms have to be addressed. In this paper we present an implementation on distributed memory multiprocessors of a block version of the Kaczmarz row-action method. One of the main issues related to the efficient implementation of this method on a concurrent environment is to develop suitable communication schemes in order to reduce the amount of communication needed at each iteration. We propose two data distribution strategies which lead to different computation and communication schemes. To verify and compare the effectiveness of the proposed strategies, numerical experiments have been carried out on a Symult S2010 and a Meiko Computing Surface. The performance evaluation has been done using a scaled efficiency model. 相似文献

7.

Scalable plasma simulation with ELMFIRE using efficient data structures for process communication

Artur Signell Francisco Ogando Mats Aspnäs 《Computer Physics Communications》2008,179(5):330-338

We describe the parallel full-f gyrokinetic particle-in-cell plasma simulation code ELMFIRE and the issue of solving an electrostatic potential from particle data distributed across several MPI (Message Passing Interface) processes. The potential is solved through a linear system with a strongly sparse matrix and ELMFIRE stores data of the estimated non-zero diagonals of the whole matrix in every MPI process. We present and compare several memory efficient structures for gathering the matrix data while keeping only a local part of the matrix in each process. We also demonstrate that these alternative structures improve scalability, thus enabling ELMFIRE to use more MPI processes and a finer time and space scale than before without sacrificing performance. 相似文献

8.

HYPRE中多重网格解法器的并行可扩展性能分析

徐小文莫则尧曹小林《软件学报》2009,20(Z1):8-14

测试并分析了高性能预条件库HYPRE的多重网格解法器SMG和BoomerAMG在某国产大规模并行机数千个处理器上的可扩展性能,得到若干对线性解法器算法研究和并行实现技术发展具有启示性意义的结论.这些结论对实际复杂物理系统数值模拟中线性解法器的应用和发展具有一定的指导意义. 相似文献

9.

Multicoloring of grid-structured PDE solvers on shared-memorymultiprocessors

Hwang-Cheng Wang Kai Hwang 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(11):1195-1205

In order to execute a parallel PDE (partial differential equation) solver on a shared-memory multiprocessor, we have to avoid memory conflicts in accessing multidimensional data grids. A new multicoloring technique is proposed for speeding sparse matrix operations. The new technique enables parallel access of grid-structured data elements in the shared memory without causing conflicts. The coloring scheme is formulated as an algebraic mapping which can be easily implemented with low overhead on commercial multiprocessors. The proposed multicoloring scheme bas been tested on an Alliant FX/80 multiprocessor for solving 2D and 3D problems using the CGNR method. Compared to the results reported by Saad (1989) on an identical Alliant system, our results show a factor of 30 times higher performance in Mflops. Multicoloring transforms sparse matrices into ones with a diagonal diagonal block (DDB) structure, enabling parallel LU decomposition in solving PDE problems. The multicoloring technique can also be extended to solve other scientific problems characterized by sparse matrices 相似文献

10.

Distributed model predictive control for polytopic uncertain systems subject to actuator saturation

Langwen Zhang Jingcheng Wang Chuang Li 《Journal of Process Control》2013,23(8):1075-1089

In this paper, we present a distributed model predictive control (MPC) algorithm for polytopic uncertain systems subject to actuator saturation. The global system is decomposed into several subsystems. A set invariance condition for polytopic uncertain system with input saturation is identified and a min–max distributed MPC strategy is proposed. The distributed MPC controller is designed by solving a linear matrix inequalities (LMIs) optimization problem. An iterative algorithm is developed for making coordination among subsystems. Case studies are carried out to illustrate the effectiveness of the proposed algorithm. 相似文献

11.

Virtue: performance visualization of parallel and distributedapplications

Shaffer E. Reed D.A. Whitmore S. Schaeffer B. 《Computer》1999,32(12):44-51

High-speed, wide-area networks have made it both possible and desirable to interconnect geographically distributed applications that control distributed collections of scientific data, remote scientific instruments and high-performance computer systems. Historically, performance analysis has focused on monolithic applications executing on large, stand-alone, parallel systems. In such a domain, measurement, postmortem analysis and code optimization suffice to eliminate performance bottlenecks and optimize applications. Distributed visualization, data mining and analysis tools allow scientists to collaboratively analyze and understand complex phenomena. Likewise, real-time performance measurement and immersive performance display systems-i.e. systems providing large stereoscopic displays of complex data-enable collaborating groups to interact with executing software, tuning its behavior to meet research and performance goals. To satisfy these demands, the authors designed Virtue, a prototype system that integrates collaborative, immersive performance visualization with real-time performance measurement and adaptive control of applications on computational grids. These tools enable physically distributed users to explore and steer the behavior of complex software in real time and to analyze and optimize distributed application dynamics 相似文献

12.

改进的并行广义共轭残差算法

下载免费PDF全文

赵利斌田有先《计算机工程》2009,35(4):80-82

针对大型非对称稀疏线性方程组的求解,通过利用广义共轭残差（GCR）算法的固有性质,消除GCR算法的内积计算数据相关性,给出一种改进的广义共轭残差（IGCR）算法。IGCR算法与GCR算法有相同的收敛性,在基于MPI的分布式存储并行机群上进行并行计算时,同步开销次数减少为GCR算法的一半。数值计算结果与理论分析表明,IGCR算法的性能优干GCR算法。相似文献

13.

Distributed generic approximate sparse inverses

George A. Gravvanis Christos K. Filelis-Papadopoulos 《The Journal of supercomputing》2014,70(1):365-384

The need for accuracy in the solution of linear systems derived from the discretization of partial differential equations leads to large sparse linear systems. The solution of sparse linear systems requires efficient scalable methods. Iterative solvers require efficient parallel preconditioning methods to solve effectively sparse linear systems. Herewith, a new parallel algorithm for the generic approximate sparse inverse matrix method for distributed memory systems is proposed. The computation of the distributed generic approximate sparse inverse matrix is based on a column-wise approach, which allows the separation to independent problems that can be handled in parallel without synchronization points or intermediate communications. This is achieved by reforming the generic approximate sparse inverse matrix algorithm and its process of computation with a new partial solution method for the computation of the nonzero elements of each column dictated by the approximate inverse sparsity pattern. Moreover, an algorithmic scheme is proposed for the efficient distribution of data amongst the available workstations, along with a load balancing scheme for problems with large standard deviation in the number of nonzero elements per column. Numerical results are presented for the proposed schemes for various model problems. 相似文献

14.

并行GCR(k)算法在多尺度预报模式中的应用

田有先赵利斌《计算机工程与设计》2009,30(14)

针对多尺度预报模式离散得到的非对称稀疏线性方程组的求解,通过利用GCR(k)算法的固有性质,消除GCR(k)算法的内积计算数据相关性,给出了一种改进的GCR(R)(IGCR(k))算法.同GCR(k)算法对比,IGCR(k)算法与GCR(k)算法有相同的收敛性,在基于MPI的分布式存储并行机群上进行并行计算时,同步开销次数减少为GCR(k)算法的一半.数值计算结果与理论分析表明改进的GCR(k)算法的性能要优于GCR(k)算法. 相似文献

15.

A general iterative sparse linear solver and its parallelization for interval Newton methods

Chenyi Hu Anna Frolov R. Baker Kearfott Qing Yang 《Reliable Computing》1995,1(3):251-263

Interval Newton/Generalized Bisection methods reliably find all numerical solutions within a given domain. Both computational complexity analysis and numerical experiments have shown that solving the corresponding interval linear system generated by interval Newton's methods can be computationally expensive (especially when the nonlinear system is large). In applications, many large-scale nonlinear systems of equations result in sparse interval jacobian matrices. In this paper, we first propose a general indexed storage scheme to store sparse interval matrices We then present an iterative interval linear solver that utilizes the proposed index storage scheme It is expected that the newly proposed general interval iterative sparse linear solver will improve the overall performance for interval Newton/Generalized bisection methods when the jacobian matrices are sparse. In section 1, we briefly review interval Newton's methods. In Section 2, we review some currently used storage schemes for sparse systems. In Section 3, we introduce a new index scheme to store general sparse matrices. In Section 4, we present both sequential and parallel algorithms to evaluate a general sparse Jacobian matrix. In Section 5, we present both sequential and parallel algorithms to solve the corresponding interval linear system by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6. 相似文献

16.

SPIKE: A parallel environment for solving banded linear systems

Eric Polizzi 《Computers & Fluids》2007,36(1):113-120

The hybrid banded linear solver SPIKE is proposed as a parallel environment for solving banded systems that are either dense or sparse within the band. The SPIKE algorithm is a domain decomposition technique that allows performing independent calculations on each subdomain or partition of the original linear system. The interface problem leads to a reduced linear system of much smaller size than that of the original system. Three different members of the SPIKE family are described. Each handles the reduced system in a different way depending on the characteristics of the system and the architecture of the high-end parallel computing platform. Numerical experiments are presented that demonstrate the effectiveness of our parallel scheme. Comparison with the corresponding algorithms of ScaLAPACK are also provided for those banded systems that are dense within the band. A SPIKE scheme with multi-level parallelism is also introduced for solving large banded systems that are sparse within the band. 相似文献

17.

Structure-adaptive parallel solution of sparse triangular linear systems

Ehsan Totoni Michael T. HeathLaxmikant V. Kale 《Parallel Computing》2014

Solving sparse triangular systems of linear equations is a performance bottleneck in many methods for solving more general sparse systems. Both for direct methods and for many iterative preconditioners, it is used to solve the system or improve an approximate solution, often across many iterations. Solving triangular systems is notoriously resistant to parallelism, however, and existing parallel linear algebra packages appear to be ineffective in exploiting significant parallelism for this problem. 相似文献

18.

Multiagent organization of system diagnosis of large toroidal digital systems

V. A. Vedeshenkov V. N. Lebedev 《Automation and Remote Control》2008,69(2):318-333

The components of large digital systems structured as toroidal grids are tested and diagnosed in three stages. At the first stage, the components of the subsystems as specified by the primary layout are checked in parallel. At the following two stages, the subsystems obtained from the primary subsystems by one-step right (downward) shift are checked in parallel as well. For the large digital systems, a multiagent organization of system diagnosis was developed. The functional characteristics and interrelations of the seven types of agents required for realization of the test and diagnosis components of the large digital systems under consideration were determined. An example of the multiagent organization of diagnosis of the components of a nine-module toroidal subsystem was considered. 相似文献

19.

GRID technology for structural analysis

J.M. Alonso C. de Alfonso G. García V. Hernndez 《Advances in Engineering Software》2007,38(11-12):738

This paper presents a High Performance Computing-based application for 3D structural analysis of buildings. Since the solution of a large linear system of sparse equations supposes the most time-consuming phase, several public domain parallel numerical libraries, with state-of-the-art capabilities, have been tested. The parallel application developed allows reducing the analysis time and simulating larger structures. Nevertheless, structural engineers rarely have available high cost parallel machines. Thus, a Grid Structural Analysis service, that integrates the parallel application, has been implemented, taking advantage of computers geographically distributed in Internet. This service makes it possible to simulate in a realistic way, and concurrently, a high number of different structural alternatives of large dimension buildings during their design stage, without considering structural simplifications or investing in expensive computers. 相似文献

20.

Parallelization of a Monte Carlo particle transport simulation code

P. Hadjidoukas D. Emfietzoglou 《Computer Physics Communications》2010,181(5):928-936

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time. 相似文献