期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

General meta-model based co-simulations applied to mechanical systems

Alexander Siemers Dag Fritzson Iakov Nakhimovski 《Simulation Modelling Practice and Theory》2009,17(4):612-624

A fully functional meta-model co-simulation environment that supports integration of many different simulation tool specific models into a co-simulation is described in this paper.The continuously increasing performance of modern computer systems has a large influence on simulation technologies. It results in more and more detailed simulation models. Different simulation models typically focus on different parts (sub-systems) of the complete system, e.g., the gearbox of a car, the driveline, or even a single bearing inside the gearbox. To fully understand the complete system it is necessary to investigate several or all parts simultaneously. This is especially true for transient (dynamic) simulation models with several interconnected parts. One solution for a more complete and accurate system analysis is to couple different simulation models into one coherent simulation, also called a co-simulation. This also allows existing simulation models to be reused and preserves the investment in these models.Existing co-simulation applications are often capable of interconnecting two specific simulators where a unique interface between these tools is defined. However, a more general solution is needed to make co-simulation modelling applicable for a wider range of tools. Any such solution must also be numerically stable and easy to use in order to be functional for a larger group of people.The presented approach for mechanical system co-simulations is based upon a general framework for co-simulation and meta-modelling [9]. Several tool specific simulation models can be integrated and connected by means of a meta-model. A platform independent, centralised, meta-model simulator is presented that executes and monitors the co-simulation. All simulation tools that participate in the co-simulation implement a single, well defined, external interface that is based on a numerically stable method for force/moment interaction. 相似文献

2.

Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs

V. Galiano O. López M. P. Malumbres H. Migallón 《The Journal of supercomputing》2013,64(1):4-16

In this work, we analyze the behavior of several parallel algorithms developed to compute the two-dimensional discrete wavelet transform using both OpenMP over a multicore platform and CUDA over a GPU. The proposed parallel algorithms are based on both regular filter-bank convolution and lifting transform with small implementations changes focused on both the memory requirements reduction and the complexity reduction. We compare our implementations against sequential CPU algorithms and other recently proposed algorithms like the SMDWT algorithm over different CPUs and the Wippig&Klauer algorithm over a GTX280 GPU. Finally, we analyze their behavior when algorithms are adapted to each architecture. Significant execution times improvements are achieved on both multicore platforms and GPUs. Depending on the multicore platform used, we achieve speed-ups of 1.9 and 3.4 using two and four processes, respectively, when compared to the sequential CPU algorithm, or we obtain speed-ups of 7.1 and 8.9 using eight and ten processes. Regarding GPUs, the GPU convolution algorithm using the GPU shared memory obtains speed-ups up to 20 when compared to the CPU sequential algorithm. 相似文献

3.

Parsec: a parallel simulation environment for complex systems

Bagrodia R. Meyer R. Takai M. Yu-An Chen Xiang Zeng Martin J. Ha Yoon Song 《Computer》1998,31(10):77-85

Design and development costs for extremely large systems could be significantly reduced if only there were efficient techniques for evaluating design alternatives and predicting their impact on overall system performance metrics. Due to the systems' analytical intractability, simulation is the most common performance evaluation technique for such systems. However, the long execution times needed for sequential simulation models often hampers evaluation. The slow speeds of sequential model execution have led to growing interest in the use of parallel execution for simulating large-scale systems. Widespread use of parallel simulation, however; has been significantly hindered by a lack of tools for integrating parallel model execution into the overall framework of system simulation. Another drawback to widespread use of simulations is the cost of model design and maintenance. The simulation environment the authors developed at UCLA attempts to address some of these issues. It consists of three primary components: a parallel simulation language called Parsec (parallel simulation environment for complex systems), its GUI, called Pave, and the portable runtime system that implements the simulation algorithms 相似文献

4.

The simulation model partitioning problem: An adaptive solution based on self-Clustering

《Simulation Modelling Practice and Theory》2017

This paper is about partitioning in parallel and distributed simulation. That means decomposing the simulation model into a number of components and to properly allocate them on the execution units. An adaptive solution based on self-clustering, that considers both communication reduction and computational load-balancing, is proposed. The implementation of the proposed mechanism is tested using a simulation model that is challenging both in terms of structure and dynamicity. Various configurations of the simulation model and the execution environment have been considered. The obtained performance results are analyzed using a reference cost model. The results demonstrate that the proposed approach is promising and that it can reduce the simulation execution time in both parallel and distributed architectures. 相似文献

5.

Performance evaluation of fork and join synchronization primitives 总被引：1，自引：0，他引：1

Andrzej Duda Tadeusz Czachórski 《Acta Informatica》1987,24(5):525-553

Summary The paper presents a performance model of fork and join synchronization primitives. The primitives are used in parallel programs executed on distributed systems. Three variants of the execution of parallel programs with fork and join primitives are considered and queueing models are proposed to evaluate their performance on a finite number of processors. Synchronization delays incurred by the programs are represented by a state-dependent server with service rate depending on a particular synchronization scheme. Closed form results are presented for the two processor case and a numerical method is proposed for many processors. Fork-join queueing networks having more complex structure i.e., processors arranged in series and in parallel, are also analyzed in the same manner. The networks can model the execution of jobs with a general task precedence graph corresponding to a nested structure of the fork-join primitives. Some performance indices of the parallel execution of programs are studied. The results show that the speedup which can be obtained theoretically in a parallel system may be decreased significantly by synchronization constraints.This research we carried out while the author was visiting ISEM, Université de Paris-Sud, France 相似文献

6.

A parallel multigrid solver for incompressible flows on computing architectures with accelerators

Vassilios G. Mandikas Emmanuel N. Mathioudakis 《The Journal of supercomputing》2017,73(11):4931-4956

An efficient parallel multigrid pressure correction algorithm is proposed for the solution of the incompressible Navier–Stokes equations on computing architectures with acceleration devices. The pressure correction procedure is based on the numerical solution of a Poisson-type problem, which is discretized using a fourth-order finite difference compact scheme. Since this is the most time-consuming part of the solver, we propose a parallel pressure correction algorithm using an iterative method based on a block cyclic reduction solution method combined with a multigrid technique. The grid points are numbered with respect to the red–black ordering scheme for the parallel Gauss–Seidel smoother. These parallelization techniques allow the execution of the entire simulation computations on the acceleration device, minimizing memory communication costs. The realization is developed using the OpenACC API, and the numerical method is demonstrated for the solution of two classical incompressible flow test problems. The first is the two-dimensional lid-driven cavity problem over equal mesh sizes while the other is the Stokes boundary layer, which is a decent benchmark problem for unequal mesh spacing. The effect of several multigrid components on modern and legacy acceleration architectures is examined. Eventually the performance investigation demonstrates that the proposed parallel multigrid solver achieves an acceleration of more than 10\(\times \) over the sequential solver and more than 4\(\times \) over multi-core CPU only realizations for all tested accelerators. 相似文献

7.

基于Modelica的多领域建模与联合仿真

赵建军吴紫俊《计算机辅助工程》2011,20(1):168-172

为实现多领域建模仿真环境与其他仿真环境的联合仿真,提出基于Modelica多领域建模的联合仿真方案.该方案基于Modelica多领域模型的连接机制,通过Modelica模型与Simulink模块的转换机理,实现在S-Function联合仿真框架下的联合仿真.基于Modelica的多领域物理系统建模仿真工具MWorks与... 相似文献

8.

Parallel adaptive mesh generation

《Computing Systems in Engineering》1991,2(1):75-101

Unstructured meshes have proved to be a powerful tool for adaptive remeshing of finite element idealizations. This paper presents a transputer-based parallel algorithm for two dimensional unstructured mesh generation. A conventional mesh generation algorithm for unstructured meshes is reviewed by the authors, and some program modules of sequential C source code are given. The concept of adaptivity in the finite element method is discussed to establish the connection between unstructured mesh generation and adaptive remeshing.After these primary concepts of unstructured mesh generation and adaptivity have been presented, the scope of the paper is widened to include parallel processing for un-structured mesh generation. The hardware and software used is described and the parallel algorithms are discussed. The Parallel C environment for processor farming is described with reference to the mesh generation problem. The existence of inherent parallelism within the sequential algorithm is identified and a parallel scheme for unstructured mesh generation is formulated. The key parts of the source code for the parallel mesh generation algorithm are given and discussed. Numerical examples giving run times and the consequent “speed-ups” for the parallel code when executed on various numbers of transputers are given. Comparisons between sequential and parallel codes are also given. The “speed-ups” achieved when compared with the sequential code are significant. The “speed-ups” achieved when networking further transputers is not always sustained. It is demonstrated that the consequent “speed-up” depends on parameters relating to the size of the problem. 相似文献

9.

Component-Based Derivation of a Parallel Stiff ODE Solver Implemented in a Cluster of Computers

Mantas Ruiz Jose M. Lopera Julio Ortega Carrillo de la Plata Jose A. 《International journal of parallel programming》2002,30(2):99-148

A component-based methodological approach to derive distributed implementations of parallel ODE solvers is proposed. The proposal is based on the incorporation of explicit constructs for performance polymorphism into a methodology to derive group parallel programs of numerical methods from SPMD modules. These constructs enable the structuring of the derivation process into clearly defined steps, each one associated with a different type of optimization. The approach makes possible to obtain a flexible tuning of a parallel ODE solver for several execution contexts and applications. Following this methodological approach, a relevant parallel numerical scheme for solving stiff ODES has been optimized and implemented on a PC cluster. This numerical scheme is obtained from a Radau IIA Implicit Runge–Kutta method and exhibits a high degree of potential parallelism. Several numerical experiments have been performed by using several test problems with different structural characteristics. These experiments show satisfactory speedup results. 相似文献

10.

Parallel distributed seismic migration

G. S. Almasi D. Hale T. McLuckie J. Bell A. Gordon 《Concurrency and Computation》1993,5(2):105-131

We report significant speed-up for seismic migration running in parallel on networkconnected IBM RISC/6000 workstations, A sustained performance of 15 MFLOP is obtained on a single-entry-level model 320, and speed-ups as high as 5 are obtained for six workstations connected by Ethernet or token ring. Our parallel software uses remote procedure calls provided by NCS (Network Computing System). We have run over a dozen workstations in parallel, but speed-ups become limited by network data rate. Fiber-optic communication should allow much greater speed-ups, and we describe our preliminary results with the fiberoptic serial link adapter of the RISC/6000. We also present a simple theoretical model that agrees well with our measurements and allows speed-up to be predicted from a knowledge of the ratio of computation to communication, which can be determined empirically before the program is parallellzed. We conclude with a brief discussion of alternative software approaches and programming models for network-connected parallel systems. In particular, our program was recently ported to PVM and Linda, and preliminary measurements yield speed-ups very close to those described here. 相似文献

11.

Multi-Level Parallelism for the Cardiac Bidomain Equations

Carolina Ribeiro Xavier Rafael Sachetto Oliveira Vinicius da Fonseca Vieira Rodrigo Weber dos Santos Wagner MeiraJr. 《International journal of parallel programming》2009,37(6):572-592

Cardiovascular diseases are associated with high mortality rates in the globe. The development of new drugs, new medical equipment and non-invasive techniques for the heart demand multidisciplinary efforts towards the characterization of cardiac anatomy and function from the molecular to the organ level. Computational modeling has demonstrated to be a useful tool for the investigation and comprehension of the complex biophysical processes that underlie cardiac function. The set of Bidomain equations is currently one of the most complete mathematical models for simulating the electrical activity in cardiac tissue. Unfortunately, large scale simulations, such as those resulting from the discretization of an entire heart, remain a computational challenge. In order to reduce simulation execution times, parallel implementations have traditionally exploited data parallelism via numerical schemes based on domain-decomposition. However, it has been verified that the parallel efficiency of these implementations severely degrades as the number of processors increases. In this work we propose and implement a new parallel algorithm for the solution of cardiac models. By relaxing the coherence of the execution, a new level of parallelism could be identified and exploited: pipelining. A synchronous parallel algorithm that uses both pipelining and data decomposition techniques was implemented and used the MPI library for communication. Numerical tests were performed in two different cluster configurations. Our preliminary results indicated that the proposed algorithm is able to increase the parallel efficiency up to 20% on an 8-core cluster. On a 32-core cluster the multi-level algorithm was 1.7 times faster than the traditional domain decomposition algorithm. In addition, the numerical precision was kept under control (relative errors under 6%) when the relaxed coherence execution was adopted. 相似文献

12.

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Tong Zhou Jingfei Jiang 《The Journal of supercomputing》2016,72(11):4181-4203

Solving large-scale sparse linear systems over GF(2) plays a key role in fluid mechanics, simulation and design of materials, petroleum seismic data processing, numerical weather prediction, computational electromagnetics, and numerical simulation of unclear explosions. Therefore, developing algorithms for this issue is a significant research topic. In this paper, we proposed a hyper-scale custom supercomputer architecture that matches specific data features to process the key procedure of block Wiedemann algorithm and its parallel algorithm on the custom machine. To increase the computation, communication, and storage performance, four optimization strategies are proposed. This paper builds a performance model to evaluate the execution performance and power consumption for our custom machine. The model shows that the optimization strategies result in a considerable speedup, even three times faster than the fastest supercomputer, TH2, while consuming less power. 相似文献

13.

MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems

《Simulation Modelling Practice and Theory》2017

In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects and easily allows various formulas to model execution and communication times of particular blocks of code. A simulator engine within the MERPSYS environment simulates execution of the application that consists of processes with various codes, to which distinct labels are assigned. The simulator runs one Java thread per label and scales computations and communication times adequately. This approach allows fast coarse-grained simulation of large applications on large-scale systems. We have performed tests and verification of results from the simulator for three real parallel applications implemented with C/MPI and run on real HPC clusters: a master-slave code computing similarity measures of points in a multidimensional space, a geometric single program multiple data parallel application with heat distribution and a divide-and-conquer application performing merge sort. In all cases the simulator gave results very similar to the real ones on configurations tested up to 1000 processes. Furthermore, it allowed us to make predictions of execution times on configurations beyond the hardware resources available to us. 相似文献

14.

无线网络环境下控制系统的仿真研究

彭丽萍彭晨岳东《工业控制计算机》2007,20(1):43-45

由于无线网络控制系统的复杂化,在分析和设计的过程中,联合仿真是研究无线网络控制系统必须解决的关键问题之一.着重介绍一种基于MATLAB/SIMULINK的联合仿真工具--TRUETIME工具箱.首先介绍工具箱的组成,然后利用该工具箱,以直流电机为对象,建立一个无线网络控制系统仿真模型,最后数值仿真说明该方法的有效性. 相似文献

15.

SCAC: Weakly-coupled execution model for massively parallel systems

《Microprocessors and Microsystems》2019

This work proposes an execution model for massively parallel systems aiming at ensuring the communications overlap by the computations. This model is named SCAC : Synchronous Communication Asynchronous Computation. This weakly-coupled model separates the execution of communication phases from those of computation in order to facilitate their overlapping, thus covering the data transfer time. To allow the simultaneous execution of these phases, we propose an approach based on three levels : two globally-centralized/locally-distributed hierarchical control levels and a parallel computation level. A generic and parametric implementation of the SCAC model was performed to fit different applications. This implementation allows the designer to choose the system components (from pre-designed ones) and to set its parameters in order to build the adequate SCAC configuration for the target application. An analytical estimation is proposed to predict the execution time of an application running in SCAC mode, in order to facilitate the parallel program design and the SCAC architecture configuration. The SCAC model was validated by simulation, synthesis and implementation on an FPGA platform, with different examples of parallel computing applications. The comparison of the results obtained by the SCAC model with other models has shown its effectiveness in terms of flexibility and speed-up. 相似文献

16.

MWorks与FLUENT的协同仿真方法和实现

刘炜陈立平丁建完《计算机辅助工程》2016,25(6):68-72

Modelica技术难以仿真动态特性在空间上非定常分布的流动、传热和燃烧等复杂问题,因此提出一种基于MWorks与FLUENT的协同仿真方法,设计协同仿真的耦合方式、数据交换机制和仿真架构,利用Modelica外部函数、MWorks仿真用户接口和FLUENT UDF编写协同仿真程序.以某单向阀为研究对象对此协同仿真方法的有效性进行验证.所提出的方法可实现Modelica技术与CFD技术的优势互补,能够为多领域系统仿真提供新的解决方法. 相似文献

17.

Modeling message-passing programs with a Performance Evaluating Virtual Parallel Machine

D.A. P.D. 《Performance Evaluation》2005,60(1-4):165-187

We present a new performance modeling system for message-passing parallel programs that is based around a Performance Evaluating Virtual Parallel Machine (PEVPM). We explain how to develop PEVPM models for message-passing programs using a performance directive language that describes a program’s serial segments of computation and message-passing events. This is a novel bottom-up approach to performance modeling, which aims to accurately model when processing and message-passing occur during program execution. The times at which these events occur are dynamic, because they are affected by network contention and data dependencies, so we use a virtual machine to simulate program execution. This simulation is done by executing models of the PEVPM performance directives rather than executing the code itself, so it is very fast. The simulation is still very accurate because enough information is stored by the PEVPM to dynamically create detailed models of processing and communication events. Another novel feature of our approach is that the communication times are sampled from probability distributions that describe the performance variability exhibited by communication subject to contention. These performance distributions can be empirically measured using a highly accurate message-passing benchmark that we have developed. This approach provides a Monte Carlo analysis that can give very accurate results for the average and the variance (or even the probability distribution) of program execution time. In this paper, we introduce the ideas underpinning the PEVPM technique, describe the syntax of the performance modeling language and the virtual machine that supports it, and present some results, for example, parallel programs to show the power and accuracy of the methodology. 相似文献

18.

Implementation of the subspace iteration method for eigenproblems on a symmetric multiprocessor system

《Computing Systems in Engineering》1993,4(1):87-98

The popular subspace iteration method for eigenproblems in science and engineering is reviewed briefly. Its suitability for execution on a parallel computer is then discussed. The algorithm was programmed for a shared memory symmetric multiprocessor system. The implementation of the algorithm on the parallel system is described. Timings, speed-ups and efficiencies for the parallel version of the program are given. It is concluded that this is a highly parallelizable algorithm, and high efficiency, in terms of processor utilization, was predicted and achieved. 相似文献

19.

Hierarchical parallel approach for GSM mobile network design

《Journal of Parallel and Distributed Computing》2006,66(2):274-290

Cellular network design is a major issue in second generation GSM mobile telecommunication systems. In this paper, a new model of the problem in its full practical complexity, based on multiobjective constrained combinatorial optimization, has been used. We propose an evolutionary algorithm that aims at approximating the Pareto frontier of the problem, which removes the need for a cellular network designer to rank or weight objectives a priori. Specific coding scheme and genetic operators have been designed. Advanced intensification and diversification search techniques, such as elitism and adaptive sharing, have been used.Three complementary hierarchical parallel models have been designed to improve the solution quality and robustness, to speed-up the search and to solve large instances of the problem. The obtained Pareto fronts and speed-ups on different parallel architectures show the efficiency and the scalability of the parallel model. Performance evaluation of the algorithm has been carried out on different realistic benchmarks. The obtained results show the impact of the proposed parallel models and the introduced search mechanisms. 相似文献

20.

Truncated Integration for Simultaneous Simulation of Sintering Using a Separated Representation

B. Sarbandi S. Cartel J. Besson D. Ryckelynck 《Archives of Computational Methods in Engineering》2010,17(4):455-463

Recent developments of multidimensional solvers using separated representation make it possible to account for the multidimensionality of mechanical models in materials science when doing numerical simulations. This paper aims to extend the separated representation to inseparable equations using an efficient integration scheme. It focuses on the dependence of constitutive equations on material coefficients. Although these coefficients can be optimized using few experimental results, they are not very well known because of the natural variability of material properties. Therefore, the mechanical state can be viewed as a function depending not only on time and space variables but also on material coefficients. This is illustrated in this paper by a sensitivity analysis of the response of a sintering model with respect to variations of material coefficients. The considered variations are defined around an optimized value of coefficients adjusted by experimental results. The proposed method is an incremental method using an extension of the integration scheme developed for the Hyper Reduction method. During the incremental solution, before the adaptation of the representation, an assumed separation representation is used as a reduced-order model. We claim that a truncated integration scheme enables to forecast the reduced-state variables related to the assumed separated representation. The fact that the integrals involved in the formulation can not be written as a sum of products of one-dimensional integrals, this approach reduces the extent of the integration domain. 相似文献