共查询到20条相似文献,搜索用时 0 毫秒
1.
Peter Michielse 《Parallel Computing》1990,13(3):359-368
In oil-industry it is common use to simulate the exploitation of an oil-reservoir by means of some numerical method. Such a numerical method may use the concept of dynamical local grid refinement, in order to mark fronts of water and oil, which move through the reservoir. In this paper, we discuss a domain decomposition method, which may be used to parallelize reservoir simulation. The parallel algorithm and timing experiments on a hypercube-type parallel computer are considered. 相似文献
2.
Particle-in-cell (PIC) simulation is widely used in many branches of physics and engineering. In this paper, we give an analysis of the particle-field decomposition method and the domain decomposition method in parallel particle-in-cell beam dynamics simulation. The parallel performance of the two decomposition methods was studied on the Cray XT4 and the IBM Blue Gene/P Computers. The domain decomposition method shows better scalability but is slower than the particle-field decomposition in most cases (up to a few thousand processors) for macroparticle dominant applications. The particle-field decomposition method also shows less memory usage than the domain decomposition method due to its use of perfect static load balance. For applications with a smaller ratio of macroparticles to grid points, the domain decomposition method exhibits better scalability and faster speed. Application of the particle-field decomposition scheme to high-resolution macroparticle-dominant parallel beam dynamics simulation for a future light source linear accelerator is presented as an example. 相似文献
3.
Equations of motion based on an atomic group scaling scheme are described for a molecular system with bond constraints. The NPT ensemble extended system method is employed along with a numerical integration scheme using an operator technique. For parallelization of the integration scheme, a domain decomposition scheme is employed based on a group of atoms which share common constraints. This decomposition scheme fits well into the integration scheme and involves no extra inter-processor communication during the SHAKE/RATTLE procedures. An example is given for a solvated protein system containing a total of 23 558 atoms on 64 processors. 相似文献
4.
A parallel molecular dynamics simulation method, designed for large-scale problems, employing dynamic spatial domain decomposition for short-ranged molecular interactions is proposed. In this parallel cellular molecular dynamics (PCMD) simulation method, the link-cell data structure is used to reduce the searching time required for forming the cut-off neighbor list as well as for domain decomposition, which utilizes the multi-level graph-partitioning technique. A simple threshold scheme (STS), in which workload imbalance is monitored and compared with some threshold value during the runtime, is proposed to decide the proper time for repartitioning the domain. The simulation code is implemented and tested on the memory-distributed parallel machine, e.g., PC-cluster system. Parallel performance is studied using approximately one million L-J atoms in the condensed, vaporized and supercritical states. Results show that fairly good parallel efficiency at 49 processors can be obtained for the condensed and supercritical states (∼60%), while it is comparably lower for the vaporized state (∼40%). 相似文献
5.
In this paper, we address three issues concerning data replica placement in hierarchical Data Grids that can be presented as tree structures. The first is how to ensure load balance among replicas. To achieve this, we propose a placement algorithm that finds the optimal locations for replicas so that their workload is balanced. The second issue is how to minimize the number of replicas. To solve this problem, we propose an algorithm that determines the minimum number of replicas required when the maximum workload capacity of each replica server is known. Finally, we address the issue of service quality by proposing a new model in which each request must be given a quality-of-service guarantee. We describe new algorithms that ensure both workload balance and quality of service simultaneously. 相似文献
6.
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe for parallel coarsening, parallel greedy k-way refinement and parallel multi-phase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a state-of-the-art serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multi-phase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost. 相似文献
7.
由高性能PC机通过网络互联构成的集群(COW)并行计算系统上应用基于消息传递(Message Passing)的方式实现FDTD的并行算法,获得了足够的加速比,有效地解决了传统的FDTD方法计算电大尺寸目标电磁散射问题时的不足.通过区域分割,各个子区域在边界处与其相邻的子区域进行场值的数据传递,从而实现了FDTD算法的并行化.利用并行FDTD方法研究了电磁波的介质层的散射,结果表明并行算法和串行计算结果的一致性,并有效提高计算效率.最后还给出了对算法进行通信隐藏的优化方法,进一步提高了并行计算的效率. 相似文献
8.
本文应用区域分解算法进行油藏模拟的并行计算研究,寻求可高效并行求解三维三相数值模拟问题的最优算法。在对流行的预处理共轭梯度算法及GMRES算法进行对比研究的基础上,提出了改进的GMRES算法,这种算法具有迭代参数不需优化、收敛快、可得到较精确解等优点。应用该解法对三维三相黑油模型软件进行并行化改造。通过模型及实际油藏计算,比软件原算法及GMRES算法的计算速度得到大幅度提高。并行效率较高,并行化后的模拟软件可以有效地解决大型整装构造油藏的数值模拟问题。 相似文献
9.
10.
We present a new parallel semiconductor device simulation using the dynamic load balancing approach. This semiconductor device
simulation based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface
library. A constructive monotone iterative technique is also applied for solution of the system of nonlinear algebraic equations.
Two different parallel versions of the algorithm to perform a complete device simulation are proposed. The first is a dynamic
parallel domain decomposition approach, and the second is a parallel current-voltage characteristic points simulation. This
implementation shows that a well-designed load balancing simulation can significantly reduce the execution time up to an order
of magnitude. Compared with the measured data, numerical results on various submicron VLSI devices are presented, to show
the accuracy and efficiency of the method. 相似文献
11.
《Journal of Computer and System Sciences》2016,82(2):282-309
Cloud computing has gained popularity in recent years delivering various services as cost-effective platforms. However, the increasing energy consumption needs to be addressed in order to preserve the cost-effectiveness of these systems. In this work, we target the storage infrastructure in a cloud system and introduce several energy efficient storage node allocation methods by exploiting the metadata heterogeneity of cloud users. Our proposed methods preserve load balance on demand and switch inactive nodes into low-energy modes to save energy. We provide a mathematical model to estimate the outcome of proposed methods and conduct theoretical and simulative analyses using real-world workloads. 相似文献
12.
PARAMICS—Parallel microscopic simulation of road traffic 总被引:1,自引:0,他引:1
This paper describes work done on the original PARAMICS project, which was developed for the Edinburgh Parallel Computing Centre to examine parallel microscopic road traffic simulation. The simulator, constructed originally for a Thinking Machines Connection Machine (CM-200), uses a data-parallel approach to simulate approximately 200,000 vehicles on 20,000 miles of roadway. More recent work has focused on the use of a message-passing paradigm, with a 256-node CRAY T3D as the target machine. The message-passing version of PARAMICS, PARAMICS-MP, is inherently scalable and can model many smaller networks on a broad range of platforms.An earlier version of this paper was presented at Supercomputing '94. 相似文献
13.
为了利用细观力学方法研究复合固体推进剂材料的力学性能,需要建立具有代表性的推进剂细观胞元模型,针对当前算法普遍存在的计算效率低下问题,依据分子动力学思想生成颗粒堆积模型的性能特性,通过分析负载均衡和消息通信,提出了并行模型的三个准则,设计了区域分解的并行策略,并利用共享存储并行和分布式存储并行两级并行手段实现了并行算法。最后在IBMBladeCenter集群平台上通过实例证明算法可以缓解负载均衡并缩减通信开销,上述试验数据验证了算法的高效性,达到了提高胞元生成效率的目的。 相似文献
14.
Nuttita Pophet Narongrit Kaewbanjak Jack Asavanant Mansour Ioualalen 《Computers & Fluids》2011,40(1):258-268
Numerical simulation of tsunami propagation in large basin across the ocean demands significantly high computational capability in terms of CPU time and memory allocation. Due to this limitation, the use of sequential codes in a single scientific workstation is possible only for small-scale tsunami problem. To overcome this difficulty, a parallel Boussinesq wave model is developed based on the original FUNWAVE sequential model for efficient simulation of long wave propagation, coastal inundation and runup. The numerical resolution is decomposed into small sub-domains using domain decomposition technique for each processor to perform the calculations. The wave information is exchanged between processors via message passing interface (MPI). We show the effectiveness of this parallel code on distributed- and shared-memory computer clusters in simulating two tsunami events: the 2004 Indian Ocean and the 1999 Vanuatu tsunamis. Communication in the overlapping domains and load balancing in the partitioned domains are considered to ensure the efficiency of this method. It is found that the performance of the parallel model for both large- and small-scale tsunami problems is very satisfactory. Finally, the parallel model is applied to a spatial hierarchical grids methodology for a location-specific numerical simulation. Grid sensitivity and improved simulation results for runups along Phang Nga coastline from Takua Thung to Khao Lak are presented. 相似文献
15.
16.
《Ergonomics》2012,55(10):1413-1423
An electromyographic (EMG) study of the lumbar paraspinal muscles during load carrying was undertaken in a group of 24 healthy subjects, 12 male and 12 female. Two different magnitude loads (10% and 20% of the subject's body weight) and four different carrying positions were compared with walking without an external load. Results indicated changes in back muscle activity showing a significant interaction between load magnitude and carrying position. Compared to walking without an external load, lumbar paraspinal EMG activity showed slight decreases when loads were carried in a backpack position or in the hand ipsilateral to the muscle. EMG activity contralateral to the hand carrying the load was significantly increased. Significant increases occurred when loads were carried anterior to the chest with the arms and a significant difference was found between male and female subjects for this carrying position. These findings have implications for the selection of carrying methods. 相似文献
17.
The fast Fourier transform (FFT) is undoubtedly an essential primitive that has been applied in various fields of science and engineering. In this paper, we present a decomposition method for the parallelization of multi-dimensional FFTs with the smallest communication amounts for all ranges of the number of processes compared to previously proposed methods. This is achieved by two distinguishing features: adaptive decomposition and transpose order awareness. In the proposed method, the FFT data is decomposed based on a row-wise basis that maps the multi-dimensional data into one-dimensional data, and translates the corresponding coordinates from multi-dimensions into one dimension so that the one-dimensional data can be divided and allocated equally to the processes using a block distribution. As a result and different from previous works that have the dimensions of decomposition pre-defined, our method can adaptively decompose the FFT data on the lowest possible dimensions depending on the number of processes. In addition, this row-wise decomposition provides plenty of alternatives in data transpose, and different transpose order results in different amounts of communication. We identify the best transpose orders with the smallest communication amounts for the 3-D, 4-D, and 5-D FFTs by analyzing all possible cases. We also develop a general parallel software package for the most popular 3-D FFT based on our method using the 2-D domain decomposition. Numerical results show good performance and scaling properties of our implementation in comparison with other parallel packages. Given both communication efficiency and scalability, our method is promising in the development of highly efficient parallel packages for the FFT. 相似文献
18.
19.
《Parallel Computing》2014,40(10):646-660
Monte Carlo (MC) neutral particle transport codes are considered the gold-standard for nuclear simulations, but they cannot be robustly applied to high-fidelity nuclear reactor analysis without accommodating several terabytes of materials and tally data. While this is not a large amount of aggregate data for a typical high performance computer, MC methods are only embarrassingly parallel when the key data structures are replicated for each processing element, an approach which is likely infeasible on future machines. The present work explores the use of spatial domain decomposition to make full-scale nuclear reactor simulations tractable with Monte Carlo methods, presenting a simple implementation in a production-scale code. Good performance is achieved for mesh-tallies of up to 2.39 TB distributed across 512 compute nodes while running a full-core reactor benchmark on the Mira Blue Gene/Q supercomputer at the Argonne National Laboratory. In addition, the effects of load imbalances are explored with an updated performance model that is empirically validated against observed timing results. Several load balancing techniques are also implemented to demonstrate that imbalances can be largely mitigated, including a new and efficient way to distribute extra compute resources across finer domain meshes. 相似文献
20.
《国际计算机数学杂志》2012,89(2):165-177
The iterative Multilevel Averaging Weight (MAW) algorithm presented in paper [1] is modified to solve the dynamic load imbalance problems arising from the two-dimensional short-range parallel molecular dynamics simulations in this paper. Firstly, five types of load balancing models are given which allows detailed studies of the algorithm. In particular, it shows that for strip decomposition, the number of iteration needs for the system to converge from an initially unbalanced state to a well balanced state is bounded by 2 log P , where P is the number of processors. This result can permit the algorithm to efficiently track fluctuations in the molecular density as the simulation progresses, and is much better than that of the Cellular Automaton Diffusion (CAD) scheme presented in paper [2] . Secondly, we apply MAW algorithm to solve the load imbalance problem in the parallel molecular dynamics simulation for higher speed wall collisions. At last, the numerical experimental results and parallel computing performance with MPI-1.2 under a PC-Cluster consists of 64 Pentium-III 500 MHz nodes connected by 100 Mbps Switches are given in this paper. 相似文献