期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantum computer simulation using the CUDA programming model

Eladio Gutiérrez Sergio Romero Emilio L. Zapata 《Computer Physics Communications》2010,181(2):283-85

Quantum computing emerges as a field that captures a great theoretical interest. Its simulation represents a problem with high memory and computational requirements which makes advisable the use of parallel platforms. In this work we deal with the simulation of an ideal quantum computer on the Compute Unified Device Architecture (CUDA), as such a problem can benefit from the high computational capacities of Graphics Processing Units (GPU). After all, modern GPUs are becoming very powerful computational architectures which is causing a growing interest in their application for general purpose. CUDA provides an execution model oriented towards a more general exploitation of the GPU allowing to use it as a massively parallel SIMT (Single-Instruction Multiple-Thread) multiprocessor. A simulator that takes into account memory reference locality issues is proposed, showing that the challenge of achieving a high performance depends strongly on the explicit exploitation of memory hierarchy. Several strategies have been experimentally evaluated obtaining good performance results in comparison with conventional platforms. 相似文献

2.

Performance analysis of direct N-body algorithms for astrophysical simulations on distributed systems

《Parallel Computing》2007,33(3):159-173

We discuss the performance of direct summation codes used in the simulation of astrophysical stellar systems on highly distributed architectures. These codes compute the gravitational interaction among stars in an exact way and have an O(N²) scaling with the number of particles. They can be applied to a variety of astrophysical problems, like the evolution of star clusters, the dynamics of black holes, the formation of planetary systems, and cosmological simulations. The simulation of realistic star clusters with sufficiently high accuracy cannot be performed on a single workstation but may be possible on parallel computers or grids. We have implemented two parallel schemes for a direct N-body code and we study their performance on general purpose parallel computers and large computational grids. We present the results of timing analyzes conducted on the different architectures and compare them with the predictions from theoretical models. We conclude that the simulation of star clusters with up to a million particles will be possible on large distributed computers in the next decade. Simulating entire galaxies however will in addition require new hybrid methods to speedup the calculation. 相似文献

3.

Design issues for high performance simulation

《Simulation Practice and Theory》1998,6(3):221-242

One of the key issues in designing new simulation models for parallel execution, or in the migration of existing models to parallel platforms, is the mapping of the application architecture to the parallel system architecture. In this mapping process, we can easily loose track of the inherent locality present in the different architecture layers. In this paper, we present an overview of these issues and examine, by means of several case-studies, the consequences of the design and implementation choices for the various mapping processes. We will show that the potential for high performance simulation comes from a holistic approach, taking into account all aspects from the application to the underlying hardware. 相似文献

4.

Simulating artificial neural networks on parallel architectures

Serbedzija N.B. 《Computer》1996,29(3):56-63

Parallelization is necessary to cope with the high computational and communication demands of neuroapplications, but general purpose parallel machines soon reach performance limitations. The article explores two approaches: parallel simulation on general purpose computers, and simulation/emulation on neurohardware. Different parallelization methods are discussed, and the most popular techniques are explained. While the software approach looks for an optimal programming model for neural processing, the hardware approach tries to imitate the neuroparadigm using the best of silicon technology 相似文献

5.

改进的WinSock实现压气站仿真培训器内的通讯

罗建国刘尚明寇可新高喆《计算机仿真》2005,22(6):223-226

通常大型仿真系统由多台计算机组成,各计算机完成系统的一个模块,而各计算机之间的数据是相互共享的。多台计算机间的数据通讯是一个比较普遍的问题,Windows Socket是解决通讯的常用方法。但在该文讨论的燃气轮机——离心式压缩机组仿真培训器仿真系统中,客户端的输气管道系统是由iFix操作系统完成的,但iFix不能很好地支持Socket,因此该文结合Socket和DDE技术通过Excel中转解决了服务器到与iFix管道系统的通讯问题。而且,改进了Socket发送和接收数据函数,解决了Socket自身存在的通讯堵塞问题。相似文献

6.

OpenMP parallelization of agent-based models

Federico Massaioli Filippo Castiglione Massimo Bernaschi 《Parallel Computing》2005,31(10-12):1066

Agent-based models, an emerging paradigm of simulation of complex systems, appear very suitable to parallel processing. However, during the parallelization of a simulator of financial markets, we found that some features of these codes highlight non-trivial issues of the present hardware/software platforms for parallel processing. Here we present the results of a series of tests, on different platforms, of simplified codes that reproduce such problems and can be used as a starting point in the search of a possible solution. 相似文献

7.

Design issues in parallel simulation languages

Rajaei H. Ayani R. 《Design & Test of Computers, IEEE》1993,10(4):52-63

The authors address several key issues in designing languages for parallel discrete-event simulation and survey the state-of-the-art techniques aimed at solving these problems. Attention is given to issues that are specific to parallel simulation, e.g., the parallel synchronization schemes, or issues that have not previously been a problem for sequential simulation, e.g., termination. Various specialized PSLs (parallel simulation languages) may also have quite different design issues. The problem of achieving transparency is addressed. In particular it is observed that a major difficulty in achieving the design criteria is the overhead introduced by the methods for solving the problems considered. In some cases making the design criteria less constrained appears to be unavoidable. The authors also propose several useful high-level language constructs to facilitate modeling in order to have the simulation system deal with the low-level details transparently. They show that extending the capability of an existing programming language is the simplest available technique for designing a PSL 相似文献

8.

Characterizing the parallel performance of a large-scale,particle-in-cell plasma simulation code

David W. Walker 《Concurrency and Computation》1990,2(4):257-288

The efficient implementation of particle-in-cell (PIC) plasma simulation codes on distributed memory concurrent computers is made difficult by the conflicting aims of balancing the computational load and minimizing interprocessor communication. This paper surveys previous work on PIC plasma simulation codes on advanced architecture computers, identifies the main issues in parallelizing such codes, and discusses different decomposition schemes. For MIMD concurrent computers the adaptive Eulerian (AE) decomposition scheme is attractive because it seeks to maintain approximate load balance dynamically while avoiding excessive non-local communication. The load balance and communication characteristics of a large-scale AE/PIC code on the hypercube are investigated by simulating the behavior of the parallel code on a Cray-2. The results show that for the three-dimensional problems studied, in which the particle distribution is highly inhomogeneous, the communication of data between the particle and mesh decompositions dominates the communication overhead. Performing the load balancing may also make a significant contribution to the concurrent overhead if the grain size is sufficiently small. The advantages of the simulation approach in investigating the behavior of concurrent large-scale applications are discussed, together with portability and software engineering issues. 相似文献

9.

The programming model of ASSIST, an environment for parallel and distributed portable applications 总被引：4，自引：0，他引：4

Marco Vanneschi 《Parallel Computing》2002,28(12):595-1732

A software development system based upon integrated skeleton technology (ASSIST) is a proposal of a new programming environment oriented to the development of parallel and distributed high-performance applications according to a unified approach. The main goals are: high-level programmability and software productivity for complex multidisciplinary applications, including data-intensive and interactive software; performance portability across different platforms, in particular large-scale platforms and grids; effective reuse of parallel software; efficient evolution of applications through versions that scale according to the underlying technologies.

The purpose of this paper is to show the principles of the proposed approach in terms of the programming model (successive papers will deal with the environment implementation and with performance evaluation). The features and the characteristics of the ASSIST programming model are described according to an operational semantics style and using examples to drive the presentation, to show the expressive power and to discuss the research issues.

According to our previous experience in structured parallel programming, in ASSIST we wish to overcome some limitations of the classical skeletons approach to improve generality and flexibility, expressive power and efficiency for irregular, dynamic and interactive applications, as well as for complex combinations of task and data parallelism. A new paradigm, called “parallel module” (parmod), is defined which, in addition to expressing the semantics of several skeletons as particular cases, is able to express more general parallel and distributed program structures, including both data-flow and nondeterministic reactive computations. ASSIST allows the programmer to design the applications in the form of generic graphs of parallel components. Another distinguishing feature is that ASSIST modules are able to utilize external objects, including shared data structures and abstract objects (e.g. CORBA), with standard interfacing mechanisms. In turn, an ASSIST application can be reused and exported as a component for other applications, possibly expressed in different formalisms. 相似文献

10.

Integrating task parallelism in data parallel languages for parallel programming on NOWs

K. J. Binu D. Janaki Ram 《Concurrency and Computation》2000,12(13):1291-1315

A number of high‐level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They do not allow expressibility of applications as a collection of tasks along with their precedence relationships. As a result, the control or task parallelism in an application cannot be expressed or exploited. The current work aims at integrating the notion of task parallelism and precedence relationships among constituting tasks to such high‐level data parallel platforms for NOWs. Our model of integration provides for arbitrary nesting of data and task parallel modules. Also, the precedence relationships are clearly reflected from the program structure. The model relieves the programmer from the need to design applications for non‐determinism in the order of completion of constituting tasks. The design of the runtime support as well as system‐level book keeping is discussed. The model is general enough to be applied to a wide range of data parallel platforms. A specific case of integrating the model into anonymous remote computing (ARC), a data parallel programming platform, is presented. The performance related aspects are also discussed. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

11.

Concepts and implementation of parallel finite element analysis 总被引：1，自引：0，他引：1

K. N. Chiang R. E. Fulton 《Computers & Structures》1990,36(6):1039-1046

The design of complex engineering systems such as advanced aircraft structures and offshore platforms requires continually increasing levels of detail in supporting analysis. The finite element method is widely used as a computational method with which to model physical systems in various engineering problems. For detailed analyses of complex designs, structural models composed of several thousands of degrees of freedom are no longer uncommon. Such design activities require large order finite element and/or finite difference models and excessive computation demands in both calculation speed and information management. The computer simulation of the nonlinear dynamic response of structures and the implementation of parallel FEM systems on a high speed multiprocessor have received considerable attention in recent years. The driving forces of these activities included the reliable simulation of automotive and aircraft crash phenomena, and the increased performance of computers. Most existing major structural analysis software systems were designed 10–20 years ago and have been optimized for current sequential computers. Such systems often are not well structured to take maximum advantage of the recent and continuing revolution in parallel vector computing capabilities. These parallel vector computer architectures not only occur in the form of large supercomputers, but are now also occurring for minicomputers and even engineering workstations. To benefit from advances in parallel computers, software must be developed which takes maximum advantage of the parallel processing feature. 相似文献

12.

Performance‐steered design of software architectures for embedded multicore systems

Alessio Bechini Cosimo Antonio Prete 《Software》2002,32(12):1155-1173

相似文献

13.

Large-scale homo- and heterogeneous parallel paradigm design based on CFD application PHengLEI

YunBo Wan Zhong Zhao Jie Liu Laiping Zhang Yong Zhang Jianqiang Chen 《Concurrency and Computation》2024,36(5):e7933

The development of computational fluid dynamics (CFD) highly depends on high-performance computers. Computer hardware has evolved rapidly, yet scalable CFD parallel software remains scarce. In this article, we design a highly scalable CFD parallel paradigm for both homogeneous and heterogeneous supercomputers. The paradigm achieves the separation of communication and computation and automatically adapts to various solvers and hardware environments, thus reducing programming difficulties and increasing automatic parallelization. Meanwhile, the number of communications is greatly reduced and the scalability of the program is improved through implementing centralized communication and two-level partitioning techniques. Complex flow problems for real aircraft were then computed on different hardware platforms with a grid size of ten billion. The homogeneous computer hardware includes Intel Xeon Gold 6258R and Phytium 2000+ processors, and the heterogeneous computer platforms include NVIDIA Tesla V100 and SW26010 processors. High parallel efficiency was obtained on all computer platforms, verifying that the paradigm has good automatic parallelization, scalability, and stability. The paradigm in this article has an important reference value for CFD massively parallel computing and can promote the development and application of CFD technology. 相似文献

14.

Parallel models of computation: an introductory survey

M. Leoncini 《Calcolo》1989,26(2-4):209-236

The paper gives an overview of some models of computation which have proved successful in laying a foundation for a general theory of parallel computation. We present three models of parallel computation, namelyboolean andarithmetic circuit families, andParallel Random Access Machines. They represent different viewpoints on parallel computing: boolean circuit families are useful for in-depth theoretical studies on the power and limitations of parallel computers; Parallel Random Access Machines are the most general vehicles for designing highly parallel algorithms; arithmetic circuit families are an important tool for undertaking studies related to one of the most active areas in parallel computing, i.e. parallel algebraic complexity. 相似文献

15.

Legacy code and parallel computing: updating and parallelizing a numerical model

Tinetti Fernando G. Perez Maximiliano J. Fraidenraich Ariel Altenberg Adolfo E. 《The Journal of supercomputing》2020,76(7):5636-5654

In this paper, we present several important details in the process of legacy code parallelization, mostly related to the problem of maintaining numerical output of a legacy code while obtaining a balanced workload for parallel processing. Since we maintained the non-uniform mesh imposed by the original finite element code, we have to develop a specially designed data distribution among processors so that data restrictions are met in the finite element method. In particular, we introduce a data distribution method that is initially used in shared memory parallel processing and obtain better performance than the previous parallel program version. Besides, this method can be extended to other parallel platforms such as distributed memory parallel computers. We present results including several problems related to performance profiling on different (development and production) parallel platforms. The use of new and old parallel computing architectures leads to different behavior of the same code, which in all cases provides better performance in multiprocessor hardware.

相似文献

16.

A simple method to solve the forward displacement analysis of the general six-legged parallel manipulator

Jaime Gallardo-Alvarado 《Robotics and Computer》2014

In this work a simple method to solve the forward displacement analysis of the general 6-6 fully parallel manipulator is applied. The method is based on generating closure equations upon the unknown coordinates of three points embedded to the moving platform. The method is easy to follow and it is available for both, planar and three-dimensional moving platforms. Numerical examples are included with the purpose to show the application of the method. 相似文献

17.

Implementing an ODE code on distributed memory computers

K. Burrage B. Pohl 《Computers & Mathematics with Applications》1994,28(10-12)

In this paper, it is shown how to adapt an existing package (VODE) for solving systems of ordinary differential equations on serial computers to distributed memory parallel computers. The approach taken is based on waveform relaxation in which the problem is decomposed into a sequence of subproblems which are then solved independently using VODE on each processor. Communication between subtasks is provided by a generic software environment p4. This approach allows the development of general purpose parallel software for ODEs which is both reliable and portable. 相似文献

18.

Hidden Surface Elimination on Parallel Processors

Julian C. Highfield Helmut E. Bez 《Computer Graphics Forum》1992,11(5):293-307

相似文献

19.

Modelling parallel databases with process algebra

C. S. Pua M. H. Williams D. H. Marwick 《Parallel Computing》2000,26(13-14)

With the current interest in using parallel computers as database servers to provide a scaleable parallel application which satisfies a real commercial need, there is a corresponding interest in performance prediction of parallel database systems. Both analytical and simulation approaches have been used and reported in the literature. This paper reports on an investigation into how a stochastic extension to classical process algebra (performance evaluation process algebra, PEPA) may be used for this purpose. This paradigm has a small but powerful set of elements which offers great flexibility for performance modelling. The paper describes how the approach has been adapted to handle database models, including the development of a technique, the decompositional approach, to handle the state-space explosion of parallel database models. It concludes with a comparison between the results obtained using this approach and those obtained using a different analytical approach. 相似文献

20.

An efficient method for inverse dynamics of kinematically defective parallel platforms

Jianfeng Li Jinsong Wang Xinjun Liu 《野外机器人技术杂志》2002,19(2):45-61

In addition the general six‐degree‐of‐freedom parallel platforms, parallel platforms with fewer than six DOF can also be used in the structural design of robotic manipulators. The common property of these parallel platforms is that the motion parameters used to describe the position and orientation of the movable platform are six, but fewer than six are independent. In their general configurations, arbitrary six‐dimensional motion of the platform cannot be achieved by the actuators mounted on the legs, therefore they are kinematically defective. Because of this defect, the inverse dynamic analysis method, which is applicable to the general six‐DOF parallel platforms, cannot be directly used for the kinematically defective parallel platforms (KDPPs). In this paper, an effective method for formulating the inverse dynamics of KDPPs is presented. Using the proposed method, three different KDPPs are studied and their inverse dynamic formulas are derived. © 2002 John Wiley & Sons, Inc. 相似文献