首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 937 毫秒
1.
2.
The formulation for solving numerically the two-dimensional Newcomb equation has been extended to calculate the vacuum energy integral by using a vector potential method. According to this extension, a stability code MARG2D has been adapted, and coded for parallel computing in order to reduce substantially the CPU time. The MARG2D code enables a fast stability analysis of ideal external MHD modes from low to high toroidal mode numbers on the basis of the single physical model, and then the code works as a powerful tool in an integrated simulation where it is combined with transport codes, and also in the analysis of tokamak edge plasma experiments.  相似文献   

3.
I report on a new version of the magnetohydrodynamics code NIRVANA1 which is targeted at the study of astrophysical problems. The new version allows for distributed-memory simulations supporting adaptive mesh refinement. Numerical algorithms include dissipative terms (viscosity, Ohmic diffusion, thermal heat conduction) in a conservative form. Domain decomposition is preferably block-wise in case of unigrid applications but adopts space-filling curve techniques for adaptive mesh applications with a hierarchical block-structured mesh. The code architecture facilitates workload balancing among processors for arbitrary mesh refinement depths maintaining intra-level data locality via space-filling curve mappings and, at the same time, ensuring inter-level data locality by applying a novel technique called block sharing. This way, it is demonstrated that comparable performance can be achieved for problems with locally highly refined grid. The data transfer between processors extensively utilizes the coarse-granularity concept of parallel computing and makes use of the MPI library. Conservation properties of the numerical method carry over to the parallel framework. In particular, the solenoidality condition for the magnetic field is preserved to roundoff precision applying the constrained transport machinery. This paper has its focus of discussion on implementation details related to the parallelization and on a code performance analysis.  相似文献   

4.
The filling-in theory of brightness perception has gained much attention recently owing to the success of vision models. However, the theory and its instantiations have suffered from incorrectly dealing with transitive brightness relations. This paper describes an advance in the filling-in theory that overcomes the problem. The advance is incorporated into the BCS/FCS neural network model, which allows it, for the first time, to account for all of Arend's test stimuli for assessing brightness perception models. The theory also suggests a new teleology for parallel ON- and OFF-channels.  相似文献   

5.
Possibilities of a programming environment that integrates the specificity of the different types of parallel computers are presented in the framework of computational structural mechanics. An extension of the development environment of the Finite Element code CASTEM 2000 has been realized to offer the user a global vision on all objects of the parallel application. To facilitate the implementation of parallel applications, this system hides data transfers between processors and allows a direct reuse of modules of the original sequential code. It is an object-based shared virtual memory system which allows a parallelism by data distribution (for non-structured data) or by control distribution; it is therefore well suited to “mechanic” parallelism. To validate this programming environment, domain decomposition techniques well suited to parallel computation have been used.  相似文献   

6.
The numerical investigation of the interaction of large, solid particles with fluids is an important area of research for many manufacturing processes. Such studies frequently lead to models that are very large and require the use of parallel solution techniques. This paper presents the results of a parallel implementation of a serial code for the direct numerical simulation of solid-liquid flows. The base code is a serial, arbitrary Lagrangian-Eulerian (ALE) formulation of the equations of motion, which views that particles as solid bodies are embedded into the flow domain. This particular model poses some interesting difficulties for domain decomposition type approaches for parallel solutions. In particular, it is not fully understood how the partitioning of the particles among the subdomains influences the performance of parallel solvers. We present several strategies for the partitioning of the solid particles, focusing on the effectiveness of these techniques in terms of parallel speedup and efficiency.  相似文献   

7.
A scalable and portable Fortran code is developed to calculate Coulomb interaction potentials of charged particles on parallel computers, based on the fast multipole method. The code has a unique feature to calculate microscopic stress tensors due to the Coulomb interactions, which is useful in constant-pressure simulations and local stress analyses. The code is applicable to various boundary conditions, including periodic boundary conditions in two and three dimensions, corresponding to slab and bulk systems, respectively. Numerical accuracy of the code is tested through comparison of its results with those obtained by the Ewald summation method and by direct calculations. Scalability tests show the parallel efficiency of 0.98 for 512 million charged particles on 512 IBM SP3 processors. The timing results on IBM SP3 are also compared with those on IBM SP4.  相似文献   

8.
This article proposes a programming language called “Espace” for parallel and distributed computation. In general, it is difficult to code a distributed, parallel program due to multi-threading, message passing, managing clients, and so on. Espace involves a few simple syntax rules added to Java. Developers do not need to know how to write a parallel, distributed program source code in detail. This work applies Espace to parallelize an evolutionary computation program, and shows that the Espace compiler allows the conversion of an evolutionary computation program written in Java into a distributed, parallel system by adding a few words to the program.  相似文献   

9.
Glare is a consequence of light scattered within the human eye when looking at bright light sources. This effect can be exploited for tone mapping since adding glare to the depiction of high-dynamic range (HDR) imagery on a low-dynamic range (LDR) medium can dramatically increase perceived contrast. Even though most, if not all, subjects report perceiving glare as a bright pattern that fluctuates in time, up to now it has only been modeled as a static phenomenon. We argue that the temporal properties of glare are a strong means to increase perceived brightness and to produce realistic and attractive renderings of bright light sources. Based on the anatomy of the human eye, we propose a model that enables real-time simulation of dynamic glare on a GPU. This allows an improved depiction of HDR images on LDR media for interactive applications like games, feature films, or even by adding movement to initially static HDR images. By conducting psychophysical studies, we validate that our method improves perceived brightness and that dynamic glare-renderings are often perceived as more attractive depending on the chosen scene.  相似文献   

10.
赵博  赵荣彩  徐金龙  高伟 《计算机科学》2015,42(1):50-53,58
为了充分发挥高性能计算机的计算能力,缓解程序员设计和编写并行程序的压力,扩充可用软件集合,设计并实现了利用交互界面深入挖掘程序中的可向量化语句,优化生成代码中的向量化语句,提高生成代码的执行效率.该方法对充分发挥高性能计算机的计算能力,增强系统可用性和扩展应用范围具有重要的意义,同时能够提供有效的辅助手段和工具支持.渐进式智能回溯向量化代码调优架构通过对用户提交的串行程序进行程序分析和变换,采用串行程序分析、数据依赖分析、向量化分析等技术手段,根据分析结果对程序进行变换和优化,自动生成最终的向量化代码.该方法通过分析串行程序中潜在的并行性,将其自动变换为等价的向量化代码形式,大大简化了程序员的工作.  相似文献   

11.
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.  相似文献   

12.
In this paper, we describe lazy threads, a new approach for implementing multithreaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so that it can execute as a parallel-ready sequential call. This allows excess parallelism to degrade into sequential calls with the attendant efficient stack management and direct transfer of control and data, yet a call that truly needs to execute in parallel, gets its own thread of control. The efficiency of lazy threads is achieved through a careful attention to storage management and a code generation strategy that allows us to represent potential parallel work with no overhead.  相似文献   

13.
In this work, we introduce a Model Driven Development method for developing context-aware pervasive systems. This method allows us to specify a context-aware pervasive system at a high level of abstraction by means of a set of models, which describes both the system functionality and the context information. From these models, an automated code generation strategy is applied. This strategy allows us to generate the system Java code that provides the system functionality and as well as an OWL specification that represents the context information and allows us to manage this information without additional burden. Furthermore, this specification is used by a reasoner at runtime to infer context knowledge that is not directly observable, and it is also used by machine learning algorithms to give support to the system adaptation according to the context information.  相似文献   

14.
There are many paradigms being promoted and explored for programming parallel computers, including modified sequential languages, new imperative languages and applicative languages. SISAL is an applicative language which has been designed by a consortium of industrial and research organizations for the specification and execution of parallel programs. It allows programs to be written with little concern for the structure of the underlying machine, thus the programmer is free to explore different ways of expressing the parallelism. A major problem with applicative languages has been their poor efficiency at handling large data structures. To counter this problem SISAL includes some advanced memory management techniques for reducing the amount of data copying that occurs. In this paper we discuss the implementation of some image processing benchmarks in SISAL and C to evaluate the effectiveness of the memory management code. In general, the SISAL program was easier to code than the C (augmented with the PARMACS macros) because we were not concerned with the parallel implementation details. We found that the SISAL performance was in general comparable to C, and that it could be brought in line with an efficient parallel C implementation by some programmer-specified code transformations.  相似文献   

15.
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects and easily allows various formulas to model execution and communication times of particular blocks of code. A simulator engine within the MERPSYS environment simulates execution of the application that consists of processes with various codes, to which distinct labels are assigned. The simulator runs one Java thread per label and scales computations and communication times adequately. This approach allows fast coarse-grained simulation of large applications on large-scale systems. We have performed tests and verification of results from the simulator for three real parallel applications implemented with C/MPI and run on real HPC clusters: a master-slave code computing similarity measures of points in a multidimensional space, a geometric single program multiple data parallel application with heat distribution and a divide-and-conquer application performing merge sort. In all cases the simulator gave results very similar to the real ones on configurations tested up to 1000 processes. Furthermore, it allowed us to make predictions of execution times on configurations beyond the hardware resources available to us.  相似文献   

16.
We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.  相似文献   

17.
郑回青  林嘉宇  张镔 《微处理机》2010,31(1):105-108,111
由于DSP器件的特殊结构,使得该平台上C编译器的效率较低,编译生成的汇编代码含有大量冗余,无法充分发挥DSP强大的运算能力,且不能满足实际的需求,因而对C语言程序进行汇编优化就成为DSP软件开发和移植中常用的方法。DM642是TI公司推出的新一代并行处理器中性能较优的定点DSP芯片,笔者结合在该芯片上优化实现G.726语音编码压缩算法的经验,详细探讨了TMS320C64xDSP汇编优化过程中使用的优化策略并给出相应的实例。  相似文献   

18.
《Computers & Fluids》2007,36(5):961-973
A two-dimensional (2D) magneto-hydrodynamics (MHD) code which has visualization and parallel processing capability is presented in this paper. The code utilizes a fluctuation splitting (FS) scheme that runs on structured or unstructured triangular meshes. First FS scheme which included the wave model: Model-A had been developed by Roe [Roe PL. Discrete models for the numerical analysis of time-dependent multi-dimensional gas dynamics. J Comp Phys 1986;63:458-76.] for the solutions of Euler’s equations. The first 2D-MHD wave model: MHD-A, was then developed by Balci and Aslan [Balci ?. The numerical solutions of two dimensional MHD equations by fluctuation splitting scheme on triangular meshes, Ph.D. Thesis, University of Marmara, Science-Art Faculty, Physics Dept Istanbul, Turkey; 2000; Aslan N. MHD-A: A fluctuation splitting wave model for planar magnetohydrodynamics. J Comp Phys 1999;153:437-66.] to solve MHD problems including shocks and discontinuities. It was then shown in [Balci S, Aslan N. Two dimensional MHD solver by fluctuation splitting and dual time stepping. Int J Numer Meth Fluids, in press.] that this code was capable of producing reliable results in compressible and nearly incompressible limits and under the effect of gravitational fields and that it was able to identically reduce to model-A of Roe in Euler limit with no sonic problems at rarefaction fans (Balci and Aslan, in press). An important feature of this code is its ability to run time dependent or steady problems on structured or unstructured triangular meshes that can be generated automatically by the code for specified domains. In order to use the parallel processing capability of the code, the triangular meshes are decomposed into different blocks in order to share the workload among a number of processors (here personal computers) which are connected by Ethernet. Due to the compact nature of the FS scheme, only one set of data transfer is required between neighbor processors. As it will be shown, this phenomenon results in minimum amount of communication loss and makes the scheme rather robust for parallel processing. The other important feature of the new code is its visual capability. As the code is running, colorful images of scalar quantities (density, pressure, Mach number, etc.) or vector graphics of vectoral quantities (velocity, magnetic field, etc.) can be followed on the screen. The extended code, called PV-MHDA, also allows following the trajectories of the particles in time by means of a recently included particle in cell (PIC) algorithm. Because the numerical dissipation embedded in its wave model reflects real physical viscosity and resistivity, it is able to run accurately for compressible flows (including shocks) as well as nearly incompressible flows (e.g., Kelvin-Helmholtz instability). The user-friendly visual and large-scale computation capability of the code allow the user more thorough analysis of MHD problems in two-dimensional complex domains.  相似文献   

19.
The arrival of multicore systems, along with the speed‐up potential available in graphics processing units, has given us unprecedented low‐cost computing power. These systems address some of the known architecture problems but at the expense of considerably increased programming complexity. Heterogeneity, at both the architectural and programming levels, poses a great challenge to programmers. Many proposals have been put forth to facilitate the job of programmers. Leaving aside proposals based on the development of new programming languages because of the effort this represents for the user (effort to learn and reuse code), the remaining proposals are based on transforming sequential code into parallel code, or on transforming parallel code designed for one architecture into parallel code designed for another. A different approach relies on the use of skeletons. The programmer has available set of parallel standards that comprise the basis for developing parallel code while programming sequential code. In this context, we propose a methodology for developing an automatic source‐to‐source transformation in a specific domain. This methodology is instantiated in a framework aimed at solving dynamic programming problems. Using this framework, the final user (a physician, mathematician, biologist, etc.) can express her problem using an equation in Latex, and the system will automatically generate the optimal parallel code for homogeneous or heterogeneous architectures. This approach allows for great portability toward these new emerging architectures and for great productivity, as evidenced by the computational results.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

20.
Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号