期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel computational methods for 3D simulation of a parafoil with prescribed shape changes

《Parallel Computing》1997,23(9):1349-1363

In this paper we describe parallel computational methods for 3D simulation of the dynamics and fluid dynamics of a parafoil with prescribed, time-dependent shape changes. The mathematical model is based on the time-dependent, 3D Navier-Stokes equations governing the incompressible flow around the parafoil and Newton's law of motion governing the dynamics of the parafoil, with the aerodynamic forces acting on the parafoil calculated from the flow field. The computational methods developed for these 3D simulations include a stabilized space-time finite element formulation to accommodate for the shape changes, special mesh generation and mesh moving strategies developed for this purpose, iterative solution techniques for the large, coupled nonlinear equation systems involved, and parallel implementation of all these methods on scalable computing systems such as the Thinking Machines CM-5. As an example, we report 3D simulation of a flare maneuver in which the parafoil velocity is reduced by pulling down the flaps. This simulation requires solution of over 3.6 million coupled, nonlinear equations at every time step of the simulation. 相似文献

2.

Using multiple levels of parallelism to enhance the performance of domain decomposition solvers

L. Giraud A. Haidar S. Pralet 《Parallel Computing》2010,36(5-6):285-296

Large-scale scientific simulations are nowadays fully integrated in many scientific and industrial applications. Many of these simulations rely on modelisations based on PDEs that lead to the solution of huge linear or nonlinear systems of equations involving millions of unknowns. In that context, the use of large high performance computers in conjunction with advanced fully parallel and scalable numerical techniques is mandatory to efficiently tackle these problems.In this paper, we consider a parallel linear solver based on a domain decomposition approach. Its implementation naturally exploits two levels of parallelism, that offers the flexibility to combine the numerical and the parallel implementation scalabilities. The combination of the two levels of parallelism enables an optimal usage of the computing resource while preserving attractive numerical performance. Consequently, such a numerical technique appears as a promising candidate for intensive simulations on massively parallel platforms.The robustness and parallel numerical performance of the solver is investigated on large challenging linear systems arising from the finite element discretization in structural mechanics applications. 相似文献

3.

A comparison of some methods for bounding connected and disconnected solution sets of interval linear systems

R. Baker Kearfott 《Computing》2008,82(1):77-102

Summary Finding bounding sets to solutions to systems of algebraic equations with uncertainties in the coefficients, as well as rapidly but rigorously locating all solutions to nonlinear systems or global optimization problems, involves bounding the solution sets to systems of equations with wide interval coefficients. In many cases, singular systems are admitted within the intervals of uncertainty of the coefficients, leading to unbounded solution sets with more than one disconnected component. This, combined with the fact that computing exact bounds on the solution set is NP-hard, limits the range of techniques available for bounding the solution sets for such systems. However, the componentwise nature and other properties make the interval Gauss–Seidel method suited to computing meaningful bounds in a predictable amount of computing time. For this reason, we focus on the interval Gauss–Seidel method. In particular, we study and compare various preconditioning techniques we have developed over the years but not fully investigated, comparing the results. Based on a study of the preconditioners in detail on some simple, specially–designed small systems, we propose two heuristic algorithms, then study the behavior of the preconditioners on some larger, randomly generated systems, as well as a small selection of systems from the Matrix Market collection. 相似文献

4.

Parallel computation of three-dimensional nonlinear magnetostatic problems

David Levine William Gropp Kimmo Forsman Lauri Kettunen 《Concurrency and Computation》1999,11(2):109-120

We describe a general-purpose parallel code for computing accurate solutions to large computationally demanding, 3D, nonlinear magnetostatic problems. The code, CORAL, is based on a volume integral equation formulation. Using an IBM SP parallel computer and iterative solution methods, we successfully solved the dense linear systems inherent in such formulations. A key component of our work was the use of the PETSc library, which provides parallel portability and access to the latest linear algebra solution technology. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

5.

A parallel computational framework to solve flow and transport in integrated surface–subsurface hydrologic systems

《Environmental Modelling & Software》2014

Hydrologic modeling requires the handling of a wide range of highly nonlinear processes from the scale of a hill slope to the continental scale, and thus the computational efficiency of the model becomes a critical issue for water resource management. This work is aimed at implementing and evaluating a flexible parallel computing framework for hydrologic simulations by applying OpenMP in the HydroGeoSphere (HGS) model. HGS is a 3D control-volume finite element model that solves the nonlinear coupled equations describing surface–subsurface water flow, solute migration and energy transport. The computing efficiency of HGS is improved by three parallel computing schemes: 1) parallelization of Jacobian matrix assembly, 2) multi-block node reordering for performing LU solve efficiently, and 3) parameter privatization for reducing memory access latency. Regarding to the accuracy and consistency of the simulation solutions obtained with parallel computing, differences in the solutions are entirely due to use of a finite linear solver iteration tolerance, which produces slightly different solutions which satisfy the convergence tolerance. The maximum difference in the head solution between the serial and parallel simulations is less than 10⁻³ m, using typical convergence tolerances. Using the parallel schemes developed in this work, three key achievements can be summarized: (1) parallelization of a physically-based hydrologic simulator can be performed in a manner that allows the same code to be executed on various shared memory platforms with minimal maintenance; (2) a general, flexible and robust parallel iterative sparse-matrix solver can be implemented in a wide range of numerical models employing either structured or unstructured mesh; and (3) the methodology is flexible, especially for the efficient construction of the coefficient and Jacobian matrices, compared to other parallelized hydrologic models which use parallel library packages. 相似文献

6.

SELF-GENERATION FUZZY MODELING SYSTEMS THROUGH HIERARCHICAL RECURSIVE-BASED PARTICLE SWARM OPTIMIZATION

Hsuan-Ming Feng 《控制论与系统》2013,44(6):623-639

ABSTRACT

In this article, fuzzy modeling systems are automatically developed by Hierarchical Recursive-Based Particle Swarm Optimization (HRPSO). HRPSO, which contains Fuzzy C-Mean (FCM) clustering, Particle Swarm Optimization (PSO), and recursive least-squares technology, self constructs adjustable parameters for approximating a nonlinear function and a discrete dynamic system. In general, the heuristic PSO is an evolutional computing technology when solving complex and global problems. However, the necessary training time is unsuitable for large population sizes and many adjustable parameters. To quickly approximate the actual output of nonlinear functions, the input-output training data is initially clustered by an FCM algorithm. From there favorable features are extracted from the training data and some fuzzy structures with fewer adjustable parameters will be collected as the initial population of the PSO. The FCM procedure is used to directly extract necessary small populations of PSO from training samples. After that, the recursive-based PSO is proposed to tune some adjustable parameters to quickly construct the desired fuzzy modeling system. Therefore, the proposed HRPSO determines fuzzy modeling systems with a small number of fuzzy rules to approach high accuracy within a short training time. Simulation results demonstrate the efficiency of our fuzzy model systems to solve two nonlinear problems. 相似文献

7.

大规模油藏数值模拟软件并行计算技术及在Beowulf系统上的应用进展

曹建文刘洋孙家昶姚继锋潘峰《数值计算与计算机应用》2006,27(2):86-95

本文主要介绍了大规模油藏数值模拟并行计算技术在国内的研究进展,提供了精细油藏模拟在国产Beowulf系统上的计算实例和应用效果,给出了百万网格点规模的油藏应用算例在不同处理器规模下的数值模拟计算结果与性能分析,并实现了一个针对海量数据可视化的三维图、二维图、表格显示的后处理显示系统．相似文献

8.

Newton Iterative Parallel Finite Element Algorithm for the Steady Navier-Stokes Equations

Yinnian He Liquan Mei Yueqiang Shang Juan Cui 《Journal of scientific computing》2010,44(1):92-106

A combination method of the Newton iteration and parallel finite element algorithm is applied for solving the steady Navier-Stokes equations under the strong uniqueness condition. This algorithm is motivated by applying the Newton iterations of m times for a nonlinear problem on a coarse grid in domain Ω and computing a linear problem on a fine grid in some subdomains Ω_j⊂Ω with j=1,…,M in a parallel environment. Then, the error estimation of the Newton iterative parallel finite element solution to the solution of the steady Navier-Stokes equations is analyzed for the large m and small H and h≪H. Finally, some numerical tests are made to demonstrate the the effectiveness of this algorithm. 相似文献

9.

A fine grained parallel smooth particle mesh Ewald algorithm for biophysical simulation studies: Application to the 6-D torus QCDOC supercomputer

Bin Fang Yuefan Deng 《Computer Physics Communications》2007,177(4):362-377

In order to model complex heterogeneous biophysical macrostructures with non-trivial charge distributions such as globular proteins in water, it is important to evaluate the long range forces present in these systems accurately and efficiently. The Smooth Particle Mesh Ewald summation technique (SPME) is commonly used to determine the long range part of electrostatic energy in large scale molecular simulations. While the SPME technique does not give rise to a performance bottleneck on a single processor, current implementations of SPME on massively parallel, supercomputers become problematic at large processor numbers, limiting the time and length scales that can be reached. Here, a synergistic investigation involving method improvement, parallel programming and novel architectures is employed to address this difficulty. A relatively simple modification of the SPME technique is described which gives rise to both improved accuracy and efficiency on both massively parallel and scalar computing platforms. Our fine grained parallel implementation of the modified SPME method for the novel QCDOC supercomputer with its 6D-torus architecture is then given. Numerical tests of algorithm performance on up to 1024 processors of the QCDOC machine at BNL are presented for two systems of interest, a β-hairpin solvated in explicit water, a system which consists of 1142 water molecules and a 20 residue protein for a total of 3579 atoms, and the HIV-1 protease solvated in explicit water, a system which consists of 9331 water molecules and a 198 residue protein for a total of 29508 atoms. 相似文献

10.

Efficient unsteady high Reynolds number flow computations on unstructured grids

Peter Lucas Hester Bijl Alexander H. van Zuijlen 《Computers & Fluids》2010,39(2):271-9215

Despite the advances in computer power and numerical algorithms over the last decades, solutions to unsteady flow problems remain computing time intensive. Especially for high Reynolds number flows, nonlinear multigrid, which is commonly used to solve the nonlinear systems of equations, converges slowly. The stiffness induced by the high-aspect ratio cells and turbulence is not tackled well by this solution method.In this paper, it is investigated if a Jacobian-free Newton-Krylov (jfnk) solution method can speed up unsteady flow computations at high Reynolds numbers. Preconditioning of the linear systems that arise after Newton linearization is commonly performed with matrix-free preconditioners or approximate factorizations based on crude approximations of the Jacobian. Approximate factorizations based on a Jacobian that matches the target residual operator are unpopular because these preconditioners consume a large amount of memory and can suffer from robustness issues. However, these preconditioners remain appealing because they closely resemble A^-1.In this paper, it is shown that a jfnk solution method with an approximate factorization preconditioner based on a Jacobian that approximately matches the target residual operator enables a speed up of a factor 2.5-12 over nonlinear multigrid for two-dimensional high Reynolds number flows. The solution method performs equally well as nonlinear multigrid for three-dimensional laminar problems. A modest memory consumption is achieved with partly lumping the Jacobian before constructing the approximate factorization preconditioner, whereas robustness is ensured with enhanced diagonal dominance. 相似文献

11.

A parallel monotone iterative method for the numerical solution of multi-dimensional semiconductor Poisson equation

Yiming Li 《Computer Physics Communications》2003,153(3):359-372

Various self-consistent semiconductor device simulation approaches require the solution of Poisson equation that describes the potential distribution for a specified doping profile (or charge density). In this paper, we solve the multi-dimensional semiconductor nonlinear Poisson equation numerically with the finite volume method and the monotone iterative method on a Linux-cluster. Based on the nonlinear property of the Poisson equation, the proposed method converges monotonically for arbitrary initial guesses. Compared with the Newton's iterative method, it is easy implementing, relatively robust and fast with much less computation time, and its algorithm is inherently parallel in large-scale computing. The presented method has been successfully implemented; the developed parallel nonlinear Poisson solver tested on a variety of devices shows it has good efficiency and robustness. Benchmarks are also included to demonstrate the excellent parallel performance of the method. 相似文献

12.

Improving the energy efficiency of data-intensive applications running on clusters

Weifeng Liu Jie Zhou Bin Gong Hongjun Dai Meng Guo 《International Journal of Parallel, Emergent and Distributed Systems》2020,35(3):246-259

Abstract

As an alternative to traditional computing architecture, cloud computing now is rapidly growing. However, it is based on models like cluster computing in general. Now supercomputers are getting more and more powerful, helping scientists have more indepth understanding of the world. At the same time, clusters of commodity servers have been mainstream in the IT industry, powering not only large Internet services but also a growing number of data-intensive scientific applications, such as MPI based deep learning applications. In order to reduce the energy cost, more and more efforts are made to improve the energy consumption of HPC systems. Because I/O accesses account for a large portion of the execution time for data intensive applications, it is critical to design energy-aware parallel I/O functions for addressing challenges related to HPC energy efficiency. As the de facto standard for designing parallel applications in cluster environment, the Message Passing Interface has been widely used in high performance computing, therefore, getting the energy consumption information of MPI applications is critical for improving the energy efficiency of HPC systems. In this work we first present our energy measurement tool, a software framework that eases the energy collection in cluster environment. And then we present an approach which can optimise the parallel I/O operation’s energy efficiency. The energy scheduling algorithm is evaluated in a cluster. 相似文献

13.

On a high-order compact scheme and its utilization in parallel solution of a time-dependent system on a distributed memory processor

Okon H. Akpan 《The Journal of supercomputing》2012,60(3):410-419

The focus of this study is the design of a parallel solution method that utilizes a fourth-order compact scheme. The applicability of the method is demonstrated on a time-dependent parabolic system with Neumann boundaries. The core of the parallel computing facilities used in the study is a 2-head-node, 224-compute-node Apple Xserve G5 multiprocessor. The system is first discretized in both time and space such that it remains in its stability regimes, before being solved with the method. The solution requires time marching in which every time step, h_t, calls for a single parallel solve of the intermediary subsystems generated. The solution uses p processors ranging in numbers from 3 to 63. The speedups, s _p, approach their limiting value of p only when p is small. The solution produces good computational results at large p, but poor results as p becomes progressively small. Also, the parallel solution produces accurate results yielding good speedups and efficiencies only when p is within some reasonable range of values. The intermediary systems generated by this method are linear and fine-grained, therefore, they are best suited for solution on massively-parallel processors. The solution method proposed in this study is, therefore, expected to yield more impressive results if applied in a massively-parallel computing environment. 相似文献

14.

Development of a parallel direct simulation code to investigate reactive flows

《Computers & Fluids》1996,25(5):485-496

Solving the Navier-Stokes equations with detailed modeling of the transport and reaction terms remains at the present time a very difficult challenge. Direct simulations of two-dimensional reactive flows using accurate models for the chemical reactions generally require days of computing time on today's most powerful serial vector supercomputers. Up to now, realistic three-dimensional simulations remain practically impossible. Working with parallel computers seems to be at the present time the only possible solution to investigate more complicated problems at acceptable costs, however, lack of standards on parallel architectures constitutes a real obstacle. In this paper, we describe the structure of a parallel two-dimensional direct simulation code using detailed transport, thermodynamic and reaction models. Separating the modules controlling the parallel work from the flow solver, it is possible to get a high compatibility degree between parallel computers using distributed memory and message-passing communication. A dynamic load-balancing procedure is implemented in order to optimize the distribution of the load among the different nodes. Efficiencies obtained with this code on many different architectures are given. First examples of application conceding the interaction between vortices and a diffusion flame are shown in order to illustrate the possibilities of the solver. 相似文献

15.

Performance evaluation of a parallel cascade semijoin algorithm for computing path expressions in object database systems

下载免费PDF全文

王国仁于戈《计算机科学技术学报》2002,17(2):0-0

With the emerging of new applications,especially in Web,Such as E-Commerce,Digital Library and DNA Bank,object database systems show their stronger funcitons than other kinds of database systems due to their powerful representation ability on complex semantics and relationshiop.One distinguished feature of object database systems is path expression,and most queries on an object database ar based on path expression because it is the most natural and convenient way to access the object databse,for example,to navigate the hyper-links in a web-based database,The execution of path expression is usually extremely expensive on a very large database.Therefore,the improvement of path expression eecution efficiency is critical for the performance ofobject databases.As an importan approach realizing high-performance query processing ,the parallel processing of path expression on distributed object databases is explored in this paper.Up to now,some algorithms about how to compute path expressions and how to optimize path expression processing have been proposed for centralizedenvironments.But,few approaches have been presented for computing path expressions in parallel.In this paper,a new paralle algorithm for computing path expression named Parallel Cascade Semijoin(PCSJ)is proposed.Moreover,a new scheduling strategy called right-deep zigzag tree is designed to further improve the performance of the PCSJ algorithm.The exper-iments have been implemented in an NOW distributed and parallel environment.The results show that the PCSJ algorithm outperforms the other two parallel algorithms(the parallel version of forward pointer chasing algorithm(PFPC)and the index splitting parallel algorithm(IndexSplit) when computing path expressions with restrictive predicates and that the right-deep zigzage tree scheduling strategy has better performance than the right-deep tree scheduling strategy. 相似文献

16.

Fast Four‐Way Parallel Radix Sorting on GPUs

Linh Ha Jens Krüger Cláudio T. Silva 《Computer Graphics Forum》2009,28(8):2368-2378

Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real‐time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware‐optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution. 相似文献

17.

Platform design for large-scale artificial market simulation and preliminary evaluation on the K computer

Takuma Torii Tomio Kamada Kiyoshi Izumi Kenta Yamada 《Artificial Life and Robotics》2017,22(3):301-307

Artificial market simulations have the potential to be a strong tool for studying rapid and large market fluctuations and designing financial regulations. High-frequency traders, that exchange multiple assets simultaneously within a millisecond, are said to be a cause of rapid and large market fluctuations. For such a large-scale problem, this paper proposes a software or computing platform for large-scale and high-frequency artificial market simulations (Plham: /pl\(\Lambda\)m). The computing platform, Plham, enables modeling financial markets composed of various brands of assets and a large number of agents trading on a short timescale. The design feature of Plham is the separation of artificial market models (simulation models) from their execution (execution models). This allows users to define their simulation models without parallel computing expertise and to choose one of the execution models they need. This computing platform provides a prototype execution model for parallel simulations, which exploits the variety in trading frequency among traders, that is, the fact that some traders do not require up-to-date information of markets changing in millisecond order. We evaluated a prototype implementation on the K computer using up to 256 computing nodes. 相似文献

18.

GPU支持的SAR影像几何校正大规模并行处理

下载免费PDF全文

杨景辉程春泉张继贤黄国满《中国图象图形学报》2015,20(3):374-385

目的几何校正(又称地理编码)是合成孔径雷达(SAR)影像处理流程中重要的一个步骤,具有一定的计算复杂度,需要用到几何定位模型。本文针对星载SAR影像,采用有理多项式系数(RPC)定位模型,提出了图形处理器(GPU)支持的几何校正大规模并行处理方法。方法该方法充分利用GPU计算资源强大及几何校正过程中每个像素处理步骤一致的特点,每次导入大量像素至GPU,为每个像素分配一个线程,每个线程执行有理函数计算、投影变换、插值采样等计算复杂度高的步骤,通过优化配置dim Grid和dim Block参数,提升GPU的并行性能。该方法通过分块处理实现SAR影像大幅面处理,且可适用于多个不同分块大小。结果实验结果显示其计算加速比为38 44,为全面客观地分析GPU并行处理的特点,还计算了整体加速比,通过多个实验分析影响整体加速性能的因素,提出大块读写提高I/O性能的优化方法。结论该方法形式简洁,通用性好,可适用于几乎所有的星载SAR影像、不同的影像幅面;且加速性能明显。相似文献

19.

适用于异构集群的混合并行流线生成系统

刘俊高阳单桂华迟学斌《计算机系统应用》2021,30(3):60-69

流线是流场可视化的主要方法之一,而针对大规模流场的流线生成由于计算量大往往需要采用高性能计算机这样的并行计算环境结合并行化算法以实现计算加速.在当前异构计算系统越来越普遍的情况下,为了充分利用并行异构计算环境的计算能力,实现更高效的并行流线生成,本文采用了基于数据并行原语结合分布式消息通讯的技术架构,设计了一套适用于异... 相似文献

20.

A survey on parallel and distributed multi-agent systems for high performance computing simulations

《Computer Science Review》2016

Simulation has become an indispensable tool for researchers to explore systems without having recourse to real experiments. Depending on the characteristics of the modeled system, methods used to represent the system may vary. Multi-agent systems are often used to model and simulate complex systems. In any cases, increasing the size and the precision of the model increases the amount of computation, requiring the use of parallel systems when it becomes too large. In this paper, we focus on parallel platforms that support multi-agent simulations and their execution on high performance resources as parallel clusters. Our contribution is a survey on existing platforms and their evaluation in the context of high performance computing. We present a qualitative analysis of several multi-agent platforms, their tests in high performance computing execution environments, and the performance results for the only two platforms that fulfill the high performance computing constraints. 相似文献