首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes two viable computing strategies for distributed parallel systems: domain division with sub-domain overlapping and asynchronous communication. We have implemented a parallel computing procedure for simulation of Ti thin film growing process of a system with 1000 x 1000 atoms by means of the Monte Carlo (MC) method. This approach greatly reduces the computation time for simulation of large-scale thin film growth under realistic deposition rates. The multi-lattice MC model of deposition comprises two basic events: deposition, and surface diffusion. Since diffusion constitutes more than 90% of the total simulation time of the whole deposition process at high temperature, we concentrated on implementing a new parallel diffusion simulation that reduces communication time during simulation. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. The parallel algorithms we propose can simulate the thin  相似文献   

2.
Metal organic chemical vapor deposition (MOCVD) technology is a very efficient way to uniformly grow multi-chip, multilayer and a big area thin film. Kinetic Monte Carlo (KMC) method is one of the important research tools that carry out dynamic simulation of atomic thin films growth. Based on the method of KMC, this paper proposes an algorithm of the process of GaInP thin film grown by MOCVD. KMC simulation and the visualization emulation of GaInP thin film growth in MOCVD reactor are realized. The results of simulation and visualization truly and intuitively displayed process of GaInP thin film growth in MOCVD reactor. The simulation results with this paper’s algorithm well coincide with experimental results. This visualization results provide the optimizations of processing parameters which grow GaInP thin film by MOCVD with theoretical basis.  相似文献   

3.
This study presents and demonstrates an algorithm for computing a dynamic model for a thin film deposition process. The proposed algorithm is used on high dimensional Kinetic Monte Carlo (KMC) simulations and consists of applying principal component analysis (PCA) for reducing the state dimension, a self organizing map (SOM) for grouping similar surface configurations and simple cell mapping (SCM) for identifying the transitions between different surface configuration groups. The error associated with this model reduction approach is characterized by running more than 1000 test simulations with highly dynamic and random input profiles. The global error, which is the normalized Euclidean distance between the simulated and predicted states, is found to be less than 1% on average relative to the test simulation results. This indicates that our reduced order dynamic model, which was developed using a rather small simulation set, was able to accurately predict the evolution of the film microstructure for much larger simulation sets and a wide range of process conditions. Minimization of the deposition time to reach a desired film structure has also been achieved using this model. Hence, our study showed that the proposed algorithm is useful for extracting dynamic models from high dimensional and noisy molecular simulation data.  相似文献   

4.
蒙特卡罗MC方法是核反应堆设计和分析中重要的粒子输运模拟方法.MC方法能够模拟复杂几何形状且计算结果精度高,缺点是需要耗费大量时间进行上亿规模粒子模拟.如何提高蒙特卡罗程序的性能成为大规模蒙特卡罗数值模拟的挑战.基于堆用蒙特卡罗分析程序RM C,先后开展了基于TCMalloc动态内存分配优化、OpenMP线程调度策略优...  相似文献   

5.
A parallel implementation of a Monte Carlo algorithm for modeling the scattering of electrons in solids and the resulting X-ray production is described. Two important issues for accurate and fast parallel simulation are discussed-random number generation and load-balancing. Timing results for the parallel simulation are given which show even modest-sized parallel machines can be competitive with conventional vector supercomputers for Monte Carlo trajectory simulations. Examples of parallel calculations performed to analyze specimen composition data and to characterize electron microscope performance are briefly highlighted.  相似文献   

6.
Thanks to the dramatic decrease of computer costs and the no less dramatic increase in those same computer's capabilities and also thanks to the availability of specific free software and libraries that allow the set up of small parallel computation installations the scientific community is now in a position where parallel computation is within easy reach even to moderately budgeted research groups. The software package PMCD (Parallel Monte Carlo Driver) was developed to drive the Monte Carlo simulation of a wide range of user supplied models in parallel computation environments. The typical Monte Carlo simulation involves using a software implementation of a function to repeatedly generate function values. Typically these software implementations were developed for sequential runs. Our driver was developed to enable the run in parallel of the Monte Carlo simulation, with minimum changes to the original code that implements the function of interest to the researcher. In this communication we present the main goals and characteristics of our software, together with a simple study its expected performance. Monte Carlo simulations are informally classified as “embarrassingly parallel”, meaning that the gains in parallelizing a Monte Carlo run should be close to ideal, i.e. with speed ups close to linear. In this paper our simple study shows that without compromising the easiness of use and implementation, one can get performances very close to the ideal.  相似文献   

7.
This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.  相似文献   

8.
We test a recent proposal to use approximate trivializing maps in a field theory to speed up Hybrid Monte Carlo simulations. Simulating the CPN−1 model, we find a small improvement with the leading order transformation, which is however compensated by the additional computational overhead. The scaling of the algorithm towards the continuum is not changed. In particular, the effect of the topological modes on the autocorrelation times is studied.  相似文献   

9.
The Monte Carlo method is frequently used to simulate light transport in turbid media because of its simplicity and flexibility, allowing to analyze complicated geometrical structures. Monte Carlo simulations are, however, time consuming because of the necessity to track the paths of individual photons. The time consuming computation is mainly associated with the calculation of the logarithmic and trigonometric functions as well as the generation of pseudo-random numbers. In this paper, the Monte Carlo algorithm was developed and optimized, by approximation of the logarithmic and trigonometric functions. The approximations were based on polynomial and rational functions, and the errors of these approximations are less than 1% of the values of the original functions. The proposed algorithm was verified by simulations of the time-resolved reflectance at several source-detector separations. The results of the calculation using the approximated algorithm were compared with those of the Monte Carlo simulations obtained with an exact computation of the logarithm and trigonometric functions as well as with the solution of the diffusion equation. The errors of the moments of the simulated distributions of times of flight of photons (total number of photons, mean time of flight and variance) are less than 2% for a range of optical properties, typical of living tissues. The proposed approximated algorithm allows to speed up the Monte Carlo simulations by a factor of 4. The developed code can be used on parallel machines, allowing for further acceleration.  相似文献   

10.
Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.  相似文献   

11.
A robust and efficient methodology is presented for treating large-scale reliability-based structural optimization problems. The optimization is performed with evolution strategies, while the reliability analysis is carried out with the Monte Carlo simulation method incorporating the importance sampling technique to reduce the sample size. Efficient hybrid methods are implemented to solve the reanalysis-type problems that arise in the optimization phase with evolution strategies and in the reliability analysis with Monte Carlo simulations. These hybrid solution methods are based on the preconditioned conjugate gradient algorithm using efficient preconditioning schemes. The numerical tests presented demonstrate the computational advantages of the proposed methods, which become more pronounced for large-scale optimization problems.  相似文献   

12.
The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained.  相似文献   

13.
We present a new algorithm, called linked neighbour list (LNL), useful to substantially speed up off-lattice Monte Carlo simulations of fluids by avoiding the computation of the molecular energy before every attempted move. We introduce a few variants of the LNL method targeted to minimise memory footprint or augment memory coherence and cache utilisation. Additionally, we present a few algorithms which drastically accelerate neighbour finding. We test our methods on the simulation of a dense off-lattice Gay-Berne fluid subjected to periodic boundary conditions observing a speedup factor of about 2.5 with respect to a well-coded implementation based on a conventional link-cell. We provide several implementation details of the different key data structures and algorithms used in this work.  相似文献   

14.
We implemented a GPU-based parallel code to perform Monte Carlo simulations of the two-dimensional q-state Potts model. The algorithm is based on a checkerboard update scheme and assigns independent random number generators to each thread. The implementation allows to simulate systems up to ~109 spins with an average time per spin flip of 0.147 ns on the fastest GPU card tested, representing a speedup up to 155×, compared with an optimized serial code running on a high-end CPU.The possibility of performing high speed simulations at large enough system sizes allowed us to provide a positive numerical evidence about the existence of metastability on very large systems based on Binder?s criterion, namely, on the existence or not of specific heat singularities at spinodal temperatures different of the transition one.  相似文献   

15.
Peigin  S.  Epstein  B.  Rubin  T.  Seror  S. 《The Journal of supercomputing》2004,27(1):49-68
We present a highly scalable parallelization of a high-accuracy 3D serial multiblock Navier-Stokes solver. The code solves the full Navier-Stokes equations and is capable of performing large-scale computations for practical configurations in an industrial enviroment. The parallelization strategy is based on the geometrical domain decomposition principle, and on the overlapped communication and computation concept. The important advantage of the strategy is that the suggested type of message-passing ensures a very high scalability of the algorithm from the network point of view, because, on the average, the communication work per processor is not increased if the number of processors is increased. The parallel multiblock-structured Navier-Stokes solver based on the parallel virtual machine (PVM) routines was implemented on 106-processors distributed memory cluster managed by the MOSIX software package. Analysis of the results demonstrated a high level of parallel efficiency (speed up) of the computational algorithm. This allowed the reduction of the execution time for large-scale computations employing 10 million of grid points, from an estimated 46 days on the SGI ORIGIN 2000 computer (in the serial single-user mode) to 5–6 hours on 106-processors cluster. Thus, the parallel multiblock full Navier-Stokes code can be successfully used for large-scale practical aerodynamic simulations of a complete aircraft on millions-points grids on a daily basis, as needed in industry.  相似文献   

16.
In this work we have studied the dynamic scaling behavior of two scaling functions and we have shown that scaling functions obey the dynamic finite size scaling rules. Dynamic finite size scaling of scaling functions opens possibilities for a wide range of applications. As an application we have calculated the dynamic critical exponent (z) of Wolff's cluster algorithm for 2-, 3- and 4-dimensional Ising models. Configurations with vanishing initial magnetization are chosen in order to avoid complications due to initial magnetization. The observed dynamic finite size scaling behavior during early stages of the Monte Carlo simulation yields z for Wolff's cluster algorithm for 2-, 3- and 4-dimensional Ising models with vanishing values which are consistent with the values obtained from the autocorrelations. Especially, the vanishing dynamic critical exponent we obtained for d=3 implies that the Wolff algorithm is more efficient in eliminating critical slowing down in Monte Carlo simulations than previously reported.  相似文献   

17.
该文采用蒙特卡罗方法对欧式期权定价问题进行模拟,并用可移植消息传递标准MPI在分布式存储结构的机群系统上设计并实现了并行算法。该算法有效的解决了金融计算中巨大计算量的问题,在很大程度上提高了计算效率,缩短了计算时间,获得了很好的性能。  相似文献   

18.
该文采用蒙特卡罗方法对欧式期权定价问题进行模拟,并用可移植消息传递标准MPI在分布式存储结构的机群系统上设计并实现了并行算法。该算法有效的解决了金融计算中巨大计算量的问题,在很大程度上提高了计算效率,缩短了计算时间,获得了很好的性能。  相似文献   

19.
《Parallel Computing》1997,23(9):1249-1260
A parallel algorithm for direct simulation Monte Carlo calculation of diatomic molecular rarefied gas flows is presented. For reliable simulation of such flow, an efficient molecular collision model is required. Using the molecular dynamics method, the collision of N2 molecules is simulated. For this molecular dynamics simulation, the parameter decomposition method is applied for parallel computing. By using these results, the statistical collision model of diatomic molecule is constructed. For validation this model is applied to the direct simulation Monte Carlo method to simulate the energy distribution at equilibrium condition and the structure of normal shock wave. For this DSMC calculation, the domain decomposition is applied. It is shown that the collision process of diatomic molecules can be calculated precisely and the parallel algorithm can be efficiently implemented on the parallel computer.  相似文献   

20.
A three-dimensional electromagnetic particle-in-cell code with Monte Carlo collision (PIC-MCC) is developed for MIMD parallel supercomputers. This code uses a standard relativistic leapfrog scheme incorporating Monte Carlo calculations to push plasma particles and to include collisional effects on particle orbits. A local finite-difference time-domain method is used to update the self-consistent electromagnetic fields. The code is implemented using the General Concurrent PIC (GCPIC) algorithm, which uses domain decomposition to divide the computation among the processors. Particles must be exchanged between processors as they move among subdomains. Message passing is implemented using the Express Cubix library and the PVM. We evaluate the performance of this code using a 512-processor Intel Touchstone Delta, a 512-processor Intel Paragon, and a 256-processor CRAY T3D. It is shown that a high parallel efficiency exceeding 95% has been achieved on all three machines for large problems. We have run PIC-MCC simulations using several hundred million particles with several million collisions per time step. For these large-scale simulations the particle push time achieved is in the range of 90–115 ns/particle/time step, and the collision calculation time in the range of a few hundred nanoseconds per collision.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号