首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Michael H. Hohn 《Software》2004,34(9):797-813
This paper describes a concise specification language for linear partial differential equations (PDEs) on a union of rectangles, along with three tools: a pretty-printer, TEX generator, and a code generator. The pretty-printer and TEX generator help users by allowing equations to be specified (and read) in their natural form, while the code generator allows implementors to separate their numerical solver from the input equations, and greatly simplifies testing. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

2.
3.
We describe the Uintah Computational Framework (UCF), a set of software components and libraries that facilitate the simulation of partial differential equations on structured adaptive mesh refinement grids using hundreds to thousands of processors. The UCF uses a non-traditional approach to achieving parallelism, employing an abstract taskgraph representation to describe computation and communication. This representation has a number of advantages that affect the performance of the resulting simulation. We demonstrate performance of the system on a solid mechanics algorithm, two different computational fluid-dynamics (CFD) algorithms, as well as coupled CFD/mechanics algorithms. We show performance of the UCF using up to 2000 processors.  相似文献   

4.
In order to execute a parallel PDE (partial differential equation) solver on a shared-memory multiprocessor, we have to avoid memory conflicts in accessing multidimensional data grids. A new multicoloring technique is proposed for speeding sparse matrix operations. The new technique enables parallel access of grid-structured data elements in the shared memory without causing conflicts. The coloring scheme is formulated as an algebraic mapping which can be easily implemented with low overhead on commercial multiprocessors. The proposed multicoloring scheme bas been tested on an Alliant FX/80 multiprocessor for solving 2D and 3D problems using the CGNR method. Compared to the results reported by Saad (1989) on an identical Alliant system, our results show a factor of 30 times higher performance in Mflops. Multicoloring transforms sparse matrices into ones with a diagonal diagonal block (DDB) structure, enabling parallel LU decomposition in solving PDE problems. The multicoloring technique can also be extended to solve other scientific problems characterized by sparse matrices  相似文献   

5.
In this paper we describe and compare two different methods for solving general sparse triangular systems in distributed memory multiprocessor architectures. The two methods involve some preprocessing overheads so they are primarily of interest in solving many systems with the same coefficient matrix. Both algorithms start off from the idea of the classical substitution method. The first algorithm we present introduces a concept of data driven flow and makes use of non-blocking communications in order to dynamically extract the inherent parallelism of sparse systems. The second algorithm uses a reordering technique for the unknowns, so the final system can be grouped in variable blocksizes where the rows are independent and can be solved in parallel. This latter technique is called level scheduling because of the way it is represented in the adjacency graph. These methods have been tested in the Fujitsu AP1000 and the Cray T3D and T3E multicomputers. The performance has been analysed using matrices from the Harwell-Boeing collection.  相似文献   

6.
This paper explores the problem of solving triangular linear systems on parallel distributed-memory machines. Working within the LogP model, tight asymptotic bounds for solving these systems using forward/backward substitution are presented. Specifically, lower bounds on execution time independent of the data layout, lower bounds for data layouts in which the number of data items per processor is bounded, and lower bounds for specific data layouts commonly used in designing parallel algorithms for this problem are presented in this paper. Furthermore, algorithms are provided which have running times within a constant factor of the lower bounds described. One interesting result is that the popular two-dimensional block matrix layout necessarily results in significantly longer running times than simpler one-dimensional schemes. Finally, a generalization of the lower bounds to banded triangular linear systems is presented.  相似文献   

7.
This paper describes algorithms for solving narrow banded systems and the Helmholtz difference equations that are suitable for multiprocessing systems. The organization of the algorithms highlight the large grain parallelism inherent in the problems.  相似文献   

8.
Yu-Xin Ren   《Computers & Fluids》2003,32(10):1379-1403
This paper presents a robust finite volume shock-capturing scheme based on the rotated approximate Riemann solver. A general framework for constructing the rotated Riemann solver is described and a rotated Roe scheme is discussed in detail. It is found that the robustness of the rotated shock-capturing scheme is closely related to the way in which the direction of upwind differencing is determined. When the upwind direction is determined by the velocity-difference vector, the rotated Roe scheme demonstrates a robust shock-capturing capability and the shock instabilities or carbuncle phenomena can be eliminated completely. The dissipation property associated with the linear field of the rotated flux-difference splitting scheme is analyzed, and several test cases are presented to validate the proposed scheme.  相似文献   

9.
In this paper, we describe the decomposition of six algorithms: two partial differential equations (PDE) solvers (successive over-relaxation [SOR] and alternating direction implicit [ADI]), fast Fourier transform (FFT), Monte Carlo simulations, Simplex linear programming, and Sparse solvers. The algorithms were selected not only because of their importance in scientific applications, but also because they represent a variety of computational (structured to irregular) and communication (low to high) requirements. We present the performance results of these algorithms on two shared-memory VAX/VMSTM1 multiprocessor prototypes: the VAX 6300 series with up to 8 processors and the M31 with up to 22 processors. We demonstrate that by efficient decomposition it is possible to achieve high performance for all algorithms on both prototypes. We describe the efficient decomposition techniques applied to optimize the performance of parallel algorithms. Also, we discuss the performance implications due to different cache designs on two multiprocessors.An earlier version of this paper was presented at Supercomputing '90.At the time of writing, all three authors were with Digital Equipment Corporation, VMS Systems and Servers Group.  相似文献   

10.
Two parallel programming models represented by OpenMP and MPI are compared for PDE solvers based on regular sparse numerical operators. As a typical representative of such an operator, a finite difference approximation of the Euler equations for fluid flow is considered.The comparison of programming models is made with regard to uniform memory access (UMA), non-uniform memory access (NUMA), and self-optimizing NUMA (NUMA-opt) computer architectures. By NUMA-opt, we mean NUMA systems extended with self-optimization algorithms, in order to reduce the non-uniformity of the memory access time.The main conclusions of the study are: (1) that OpenMP is a viable alternative to MPI on UMA and NUMA-opt architectures; (2) that OpenMP is not competitive on NUMA platforms, unless special care is taken to get an initial data placement that matches the algorithm; (3) that for OpenMP to be competitive in the NUMA-opt case, it is not necessary to extend the OpenMP model with additional data distribution directives, nor to include user-level access to the page migration library.  相似文献   

11.
A generalized mapping strategy that uses a combination of graph theory, mathematical programming, and heuristics is proposed. The authors use the knowledge from the given algorithm and the architecture to guide the mapping. The approach begins with a graphical representation of the parallel algorithm (problem graph) and the parallel computer (host graph). Using these representations, the authors generate a new graphical representation (extended host graph) on which the problem graph is mapped. An accurate characterization of the communication overhead is used in the objective functions to evaluate the optimality of the mapping. An efficient mapping scheme is developed which uses two levels of optimization procedures. The objective functions include minimizing the communication overhead and minimizing the total execution time which includes both computation and communication times. The mapping scheme is tested by simulation and further confirmed by mapping a real world application onto actual distributed environments  相似文献   

12.
Recently,a qualitative approach was proposed for 3-D shape recovery based on a hybrid object representation^[1].In this approach,aspect recovery is the most important stage which binds regions in the image into meaningful aspects to support 3-D primitive recovery.There is no known polynomial time algorithm to solve this problem.the previous approach dealt with this problem by using a heuristic method based on the conditional probability.Unlike the previous method,this paper presents a novel parallel voting scheme to conquer the problem for efficiency.For this purpose,the previous global aspect representation is replaced with a distributed representation of aspects.Based on this representation,a three-layer parallel voting network for aspect recovery is proposed.For evaluating likelihood,a continuous Hopfield net is employed so that all aspect coverings in decreasing order of likelihood can be enumerated.The paper describes this method in detail and demonstrates its usefulness with simulation.  相似文献   

13.
This paper discusses the parallel implementation of a multigrid full approximation scheme (FAS) for the solution of non-linear elliptic PDEs in both 2 and 3 dimensions. The method used for smoothing is Red Black Newton approximation. The purpose of this paper is to investigate whether it is possible to construct a 16 processor transputer network which permits the efficient execution of multigrid algorithms. In particular, our aim is to maintain the parallel efficiency of the underlying iterative method, whilst achieving vastly improved convergence rates due to multigrid.  相似文献   

14.
15.
A number of important phenomena in ecology can be modeled by one-dimensional, nonlinear reaction-diffusion PDEs. This paper considers a modified Fisher PDE for which the diffusion term is nonlinear. A nonstandard finite difference scheme is constructed using methods generated by the previous work of Mickens. As a check on the mathematical properties of this scheme, a linear stability analysis is carried out for the two fixed-points appearing in the differential and difference equations. The finite difference scheme is shown to have solutions which satisfy a positivity condition as well as the requirement of boundedness. Further, the scheme is explicit and a functional relationship is obtained between the space and time step-sizes. A numerical test of the scheme is done for a particular initial/boundary value problem. A brief discussion of how the work can be extended and/or generalized is also given.  相似文献   

16.
17.
Adaptive mesh refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper, we present an efficient DLB scheme for structured AMR (SAMR) applications. This scheme interleaves a grid-splitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded processor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a SAMR cosmology application, the ENZO code. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 57% and the quality of load balancing can be improved by a factor of six, as compared to the original DLB scheme used in ENZO.  相似文献   

18.
19.
A new method, namely, the parallel two-level hybrid (PTH) method, is developed to solve tridiagonal systems on parallel computers. PTH has two levels of parallelism. The first level is based on algorithms developed from the Sherman-Morrison modification formula, and the second level can choose different parallel tridiagonal solvers for different applications. By choosing different outer and inner solvers and by controlling its two-level partition, PTH can deliver better performance for different applications on different machine ensembles and problem sizes. In an extreme case, the two levels of parallelism can be merged into one, and PTH can be the best algorithm otherwise available. Theoretical analyses and numerical experiments indicate that PTH is significantly better than existing methods on massively parallel computers. For instance, using PTH in a fast Poisson solver results in a 2-folds speedup compared to a conventional parallel Poisson solver on a 512 nodes IBM machine. When only the tridiagonal solver is considered, PTH is over 10 times faster than the currently used implementation.  相似文献   

20.
Earlier work on parallel copying garbage collection is based on parallelization of sequential schemes with breadth-first traversing of live data. Recently it has been demonstrated that sequential copying schemes with depth-first traversing of live data are more flexible and more efficient than the corresponding ones with breadth-first traversal. A clear advantage of the former is that they work with no extra space overheads on non-contiguous memory blocks, which allows more flexible implementation. This paper shows how to parallelize an efficient depth-first copying scheme while retaining its high efficiency and flexibility. Research interests: His research interests include techniques for parallel and distributed implementation of functional, logic, concurrent constraints and object-oriented programming systems. He is also interested in performance analysis and garbage collection. His current interest is efficient techniques for distributed implementation of the concurrent programming language Oz.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号