期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Implementation of a parallel unstructured Euler solver on shared- and distributed-memory architectures

D. J. Mavriplis Raja Das Joel Saltz R. E. Vermeland 《The Journal of supercomputing》1995,8(4):329-344

An efficient three-dimensional unstructured Euler solver is parallelized on a CRAY Y-MP C90 shared-memory computer and on an Intel Touchstone Delta distributed-memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between the two differing architectures are made.This work was sponsored in part by ARPA (NAG-1-1485) and by NASA Contract No. NAS1-19480 while authors Mavriplis, Saltz and Das were in residence at ICASE, NASA Langley Research Center, Hampton, Virginia. This research was performed in part using the Intel Touchstone Delta System operated by Caltech on behalf of the Concurrent Supercomputing Consortium. Access to this fecility was provided by NASA Langley Research Center and the Center for Research in Parallel Processing. The content of the information does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred. 相似文献

2.

Coupling of a nonlinear finite element structural method with a Navier-Stokes solver

Raymond E. Gordnier Robert Fithen 《Computers & Structures》2003,81(2):75-89

A new three-dimensional viscous aeroelastic solver is developed in the present work. A well validated full Navier-Stokes code is coupled with a nonlinear finite element plate model. Implicit coupling between the computational fluid dynamics and structural solvers is achieved using a subiteration approach. Computations of several benchmark static and dynamic plate problems are used to validate the finite element portion of the code. This coupled aeroelastic scheme is then applied to the problem of three-dimensional panel flutter. Inviscid and viscous supersonic results match previous computations using the same aerodynamic method coupled with a finite difference structural solver. For the case of subsonic flow, multiple solutions consisting of static, upward and downward deflections of the panel are discussed. The particular solution obtained is shown to be sensitive to the cavity pressure specified underneath the panel. 相似文献

3.

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

Li Xiao Xiaodong Zhang Zhengqian Kuang Baiming Feng Jichang Kang 《The Journal of supercomputing》2006,38(2):189-217

Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective Networks of Workstations (NOW). Auto-CFD-NOW is a pre-compiler that transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on NOW. Our work makes the following three unique contributions. First, this pre-compiler is highly automatic, requiring a minimum number of user directives for parallelization. Second, we have applied a dependency analysis technique for the CFD applications, called analysis after partitioning. We propose a mirror-image decomposition technique to parallelize self-dependent field loops that are hard to parallelize by existing methods. Finally, traditional optimizations of communication focus on eliminating redundant synchronizations. We have developed an optimization scheme which combines all the non-redundant synchronizations in CFD programs to further reduce the communication overhead. The Auto-CFD-NOW has been implemented on networks of workstations and has been successfully used for automatically parallelizing structured CFD application programs. Our experiments show its effectiveness and scalability for parallelizing large CFD applications. This work is supported in part by the China National Aerospace Science Foundation, and by the U.S. National Science Foundation under grants CCR-9812187, CCR-0098055, CCF-0325760, CCF 0514078, and CNS 0549006. 相似文献

4.

Shared-Memory Parallel Vector Implementation of the Immersed Boundary Method for the Computation of Blood Flow in the Beating Mammalian Heart 总被引：3，自引：0，他引：3

McQueen David Peskin Charles 《The Journal of supercomputing》1997,11(3):213-236

This paper describes the parallel implementation of the immersed boundary method on a shared-memory machine such as the Cray C-90 computer. In this implementation, outer loops are parallelized and inner loops are vectorized. The sustained computation rates achieved are 0.258 Gflops with a single processor, 1.89 Gflops with 8 processors, and 2.50 Gflops with 16 processors. An application to the computer simulation of blood flow in the heart is presented. 相似文献

5.

Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters

Ramzi?Basharahil Brian?Wims Cheng-Zhong?Xu Email author Song?Fu 《The Journal of supercomputing》2005,31(2):161-184

This paper presents a Distributed Shared Array runtime system to support Java-compliant multithreaded programming on clusters of symmetric multiprocessors (SMPs). As a hybrid of message passing and shared address space programming models, the DSA programming model allows programmers to explicitly control data distribution so as to take advantage of the deep memory hierarchy, while relieving them from error-prone orchestration of communication and synchronization at run-time. The DSA system is developed as an integral component of mobility support middleware for grid computing so that DSA-based virtual machines can be reconfigured to adapt to the varying resource supplies or demand over the course of a computation. The DSA runtime system also features a directory-based cache coherence protocol in support of replication of user-defined sharing granularity and a communication proxy mechanism for reducing network contention. We demonstrate the programmability of the model in a number of parallel applications and evaluate its performance on a cluster of SMP servers, in particular, the impact of the coherence granularity. 相似文献

6.

ParC—An Extension of C for Shared Memory Parallel Processing

YOSI BEN-ASHER DROR G. FEITELSON LARRY RUDOLPH 《Software》1996,26(5):581-612

ParC is an extension of the C programming language with block-oriented parallel constructs that allow the programmer to express fine-grain parallelism in a shared-memory model. It is suitable for the expression of parallel shared-memory algorithms, and also conducive for the parallelization of sequential C programs. In addition, performance enhancing transformations can be applied within the language, without resorting to low-level programming. The language includes closed constructs to create parallelism, as well as instructions to cause the termination of parallel activities and to enforce synchronization. The parallel constructs are used to define the scope of shared variables, and also to delimit the sets of activities that are influenced by termination or synchronization instructions. The semantics of parallelism are discussed, especially relating to the discrepancy between the limited number of physical processors and the potentially much larger number of parallel activities in a program. 相似文献

7.

An efficient passive planar micromixer with ellipse-like micropillars for continuous mixing of human blood

Nhut Tran-Minh Tao Dong Frank Karlsen 《Computer methods and programs in biomedicine》2014

In this paper, a passive planar micromixer with ellipse-like micropillars is proposed to operate in the laminar flow regime for high mixing efficiency. With a splitting and recombination (SAR) concept, the diffusion distance of the fluids in a micromixer with ellipse-like micropillars was decreased. Thus, space usage for micromixer of an automatic sample collection system is also minimized. Numerical simulation was conducted to evaluate the performance of proposed micromixer by solving the governing Navier–Stokes equation and convection–diffusion equation. With software (COMSOL 4.3) for computational fluid dynamics (CFD) we simulated the mixing of fluids in a micromixer with ellipse-like micropillars and basic T-type mixer in a laminar flow regime. The efficiency of the proposed micromixer is shown in numerical results and is verified by measurement results. 相似文献