首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The present paper studies two particle management strategies for dynamically adaptive Cartesian grids at hands of a particle-in-cell code. One holds the particles within the grid cells, the other within the grid vertices. The fundamental challenge for the algorithmic strategies results from the fact that particles may run through the grid without velocity constraints. To facilitate this, we rely on multiscale grid representations. They allow us to lift and drop particles between different spatial resolutions. We call this cell-based strategy particle in tree (PIT). Our second approach assigns particles to vertices describing a dual grid (PIDT) and augments the lifts and drops with multiscale linked cells.Our experiments validate the two schemes at hands of an electrostatic particle-in-cell code by retrieving the dispersion relation of Langmuir waves in a thermal plasma. They reveal that different particle and grid characteristics favour different realisations. The possibility that particles can tunnel through an arbitrary number of grid cells implies that most data is exchanged between neighbouring ranks, while very few data is transferred non-locally. This constraints the scalability as the code potentially has to realise global communication. We show that the merger of an analysed tree grammar with PIDT allows us to predict particle movements among several levels and to skip parts of this global communication a priori. It is capable to outperform several established implementations based upon trees and/or space-filling curves.  相似文献   

2.

The most widely used technique to allow for parallel simulations in molecular dynamics is spatial domain decomposition, where the physical geometry is divided into boxes, one per processor. This technique can inherently produce computational load imbalance when either the spatial distribution of particles or the computational cost per particle is not uniform. This paper shows the benefits of using a hybrid MPI+OpenMP model to deal with this load imbalance. We consider LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator), a prototypical molecular dynamics simulator that provides its own balancing mechanism and an OpenMP implementation for many of its modules, allowing for a hybrid setup. In this work, we extend the current OpenMP implementation of LAMMPS and optimize it and evaluate three different setups: MPI-only, MPI with the LAMMPS balance mechanism, and hybrid setup using our improved OpenMP version. This comparison is made using the five standard benchmarks included in the LAMMPS distribution plus two additional test cases. Results show that the hybrid approach can deal with load balancing problems better and more effectively (50% improvement versus MPI-only for a highly imbalanced test case) than the LAMMPS balance mechanism (only 43% improvement) and improve simulations with issues other than load imbalance.

  相似文献   

3.
为了实现小尺度范围风沙运动的真实感模拟,采用基于拉格朗日力学无网格形式的光滑粒子流体动力学(smooth particle hydrodynamics,SPH)方法解决了基于欧拉网格法因网格大变形或者变形边界等引起的各种问题,并克服了不能用固定欧拉网格追踪任意单颗粒子运动轨迹的困难,因此该方法在研究风沙运动方面有着独特的优势。然而,随着风沙流动中SPH粒子数目的增加,该方法计算效率低,计算规模大的缺陷在风沙模拟过程中尤为明显。为了提高其计算效率,在CUDA软硬件平台上,建立SPH-GPU并行加速的二维气沙两相耦合模型,对串行的热点程序进行分析,找出最耗时且适合并行的热点程序;其次对GPU并行计算模型进行验证,宏观上得到了沙粒群运动的时空变化规律,微观上得到了典型沙粒的跃移轨迹和变异的尖角轨迹;最后对比了三种不同粒子数下CPU与GPU的计算效率。模拟结果证明SPH-GPU并行计算方法能够进一步应用在风沙流的数值模拟研究中。  相似文献   

4.
提出了一种基于粒子聚合重采样的移动机器人聚合蒙特卡洛定位(Merge Monte Carlo localization,Merge-MCL)方法.首先将移动机器人作业空间划分为离散栅格,建立栅格集,然后提出一种基于粒子空间相近性的粒子聚合技术, 在保证粒子空间分布合理性的同时自适应调整粒子集规模.提出的粒子聚合重采样方法能够缓解粒子权值退化问题, 并避免了传统重采样方法导致的多样性匮乏问题.仿真结果表明,粒子聚合重采样方法能够有效控制粒子集规模, 聚合蒙特卡洛定位方法是鲁棒、有效的.  相似文献   

5.
A micro-digital holographic particle tracking velocimetry with high-speed system is constructed by a PC grid environment that employs Windows XP with AD-POWERs as parallel tool. Two algorithms for high-speed system are evaluated under the same PC grid environment. Both methods are based on a computer-generated hologram algorithm. One method is a division algorithm based on time development for the measurements, while the other is a division algorithm based on spatial reconstruction for the measurement. In case of the former, the performance is increased by a factor of 3.3 by using 4 PCs. The present system can compute huge hologram images and output them “on-site” at an experimental facility.  相似文献   

6.
Collective communication operations are widely used in MPI applications and play an important role in their performance. However, the network heterogeneity inherent to grid environments represent a great challenge to develop efficient high performance computing applications. In this work we propose a generic framework based on communication models and adaptive techniques for dealing with collective communication patterns on grid platforms. Toward this goal, we address the hierarchical organization of the grid, selecting the most efficient communication algorithms at each network level. Our framework is also adaptive to grid load dynamics since it considers transient network characteristics for dividing the nodes into clusters. Our experiments with the broadcast operation on a real-grid setup indicate that an adaptive framework allows significant performance improvements on MPI collective communications.  相似文献   

7.
Particle filtering and mean shift (MS) are two successful approaches to visual tracking. Both have their respective strengths and weaknesses. In this paper, we propose to integrate advantages of the two approaches for improved tracking. By incorporating the MS optimization into particle filtering to move particles to local peaks in the likelihood, the proposed mean shift embedded particle filter (MSEPF) improves the sampling efficiency considerably. Our work is conducted in the context of developing a hand control interface for a robotic wheelchair. We realize real-time hand tracking in dynamic environments of the wheelchair using MSEPF. Extensive experimental results demonstrate that MSEPF outperforms the MS tracker and the conventional particle filter in hand tracking. Our approach produces reliable tracking while effectively handling rapid motion and distraction with roughly 85% fewer particles. We also present a simple method for dynamic gesture recognition. The hand control interface based on the proposed algorithms works well in dynamic environments of the wheelchair.  相似文献   

8.
Numerical grid generation techniques play an important role in the numerical solution of partial differential equations on arbitrarily shaped regions. For coastal ocean modeling, in particular, a one-block grid covering the region under study is commonly used. Most bodies of water of interest have complicated coastlines; e.g., the Persian Gulf and Mediterranean Sea. Since such one-block grids are not boundary conforming, the number of unused grid points can be a relatively large portion of the entire domain space. Other disadvantages of using a one block grid include large memory requirements and long computer processing time. Multiblock grid generation and dual-level parallel techniques are used to overcome these problems. Message Passing Interface (MPI) is used to parallelize the Multiblock Grid Princeton Ocean Model (MGPOM) such that each grid block is assigned to a unique processor. Since not all grid blocks are of the same size, the workload varies between MPI processes. To alleviate this, OpenMP dynamic threading is used to improve load balance. Performance results from the MGPOM model on a one-block grid, a twenty block grid, and a forty-two block grid after a 90-day simulation for the Persian Gulf demonstrate the efficacy of the dual-level parallel code version.  相似文献   

9.
在千万亿次计算能力的驱动下,数值软件的发展进入了一个以海量并行为基本特征的历史转折期,可扩展和可容错成为大规模数值模拟的两大关键技术.petaPar模拟程序是以对传统数值技术形成优势互补的无网格类方法为切入点,面向千万亿次级计算而开发的下一代新兴通用数值模拟程序.petaPar在统一架构下实现了光滑粒子动力学(smoothed particle hydrodynamics, SPH)和物质点法(material point method, MPM)两种最为成熟和有效的无网格/粒子算法,支持多种强度、失效模型和状态方程;其中MPM支持改进的接触算法,可以处理上百万离散物体的非连续变形和相互作用计算.系统具有以下特点:1)高可扩展.实现单核单Patch极端情形下计算和通信的完全重叠,支持动态负载均衡;2)可容错.支持无人值守变进程重启动,在系统硬件出现局部热故障时可以不中止计算;3)适应硬件体系结构异构架构的变化趋势,同时支持flat MPI和MPI+Pthreads并行模型.程序在Titan千万亿次超级计算机上进行了全系统规模的可扩展性测试,结果表明该代码可线性扩展到26万个CPU核,SPH和MPM的并行效率分别为100%和96%.  相似文献   

10.
This paper reports on a parallel implementation of a general 3D multi-block CFD code. The parallelization is achieved by using three strategies. Firstly, it is done on dual-processor PC-clusters where Windows NT systems are running. A multi-thread programming model is adopted for the multi-block code, where one thread corresponds to a block. Shared-memory is used for the exchange of inner-boundaries between neighboring blocks (threads) on the same node, while WinSockets are employed for those on different nodes. Secondly, the parallelization is extended to UNIX operating system. MPI is applied for all the message passing between different processors, including those on the same node. Thirdly, Pthreads (POSIX threads), a standardized application interface for threads, are adopted to take the advantage of the shared-memory feature of the SMP nodes, while MPI is only applied for the message passing between processors on different nodes. In all the strategies, a static load-balancing method is employed for equitable distribution of computational work to specified nodes. The parameters of the present code is studied in detail to facilitate the explanation of the speedup results. Two examples are provided to show the speedup and load balancing of the parallel calculation. Detailed comparison is made to evaluate the efficiency of different strategies.  相似文献   

11.

This paper is about enhancing the smart grid by proposing a new hybrid feature-selection method called feature selection-based ranking (FSBR). In general, feature selection is to exclude non-promising features out from the collected data at Fog. This could be achieved using filter methods, wrapper methods, or a hybrid. Our proposed method consists of two phases: filter and wrapper phases. In the filter phase, the whole data go through different ranking techniques (i.e., relative weight ranking, effectiveness ranking, and information gain ranking) The results of these ranks are sent to a fuzzy inference engine to generate the final ranks. In the wrapper phase, data is being selected based on the final ranks and passed on three different classifiers (i.e., Naive Bayes, Support Vector Machine, and neural network) to select the best set of the features based on the performance of the classifiers. This process can enhance the smart grid by reducing the amount of data being sent to the cloud, decreasing computation time, and decreasing data complexity. Thus, the FSBR methodology enables the user load forecasting (ULF) to take a fast decision, the fast reaction in short-term load forecasting, and to provide a high prediction accuracy. The authors explain the suggested approach via numerical examples. Two datasets are used in the applied experiments. The first dataset reported that the proposed method was compared with six other methods, and the proposed method was represented the best accuracy of 91%. The second data set, the generalization data set, reported 90% accuracy of the proposed method compared to fourteen different methods.

  相似文献   

12.
This paper presents a comparative study of three different strategies to improve the performance of particle filters, in the context of visual contour tracking: the unscented particle filter, the Rao-Blackwellized particle filter, and the partitioned sampling technique. The tracking problem analyzed is the joint estimation of the global and local transformation of the outline of a given target, represented following the active shape model approach. The main contributions of the paper are the novel adaptations of the considered techniques on this generic problem, and the quantitative assessment of their performance in extensive experimental work done.  相似文献   

13.
The paper describes a particle-resolved simulation method for turbulent flow laden with finite size particles. The method is based on the multiple-relaxation-time lattice Boltzmann equation. The no-slip boundary condition on the moving particle boundaries is handled by a second-order interpolated bounce-back scheme. The populations at a newly converted fluid lattice node are constructed by the equilibrium distribution with non-equilibrium corrections. MPI implementation details are described and the resulting code is found to be computationally efficient with a good scalability. The method is first validated using unsteady sedimentation of a single particle and sedimentation of a random suspension. It is then applied to a decaying isotropic turbulence laden with particles of Kolmogorov to Taylor microscale sizes. At a given particle volume fraction, the dynamics of the particle-laden flow is found to depend mainly on the effective particle surface area and particle Stokes number. The presence of finite-size inertial particles enhances dissipation at small scales while reducing kinetic energy at large scales. This is in accordance with related studies. The normalized pivot wavenumber is found to not only depend on the particle size, but also on the ratio of particle size to flow scales and particle-to-fluid density ratio.  相似文献   

14.
The particle filter technique has been used extensively over the past few years to track objects in challenging environments. Due to its nonlinear nature and the fact that it does not assume a Gaussian probability density function it tends to outperform other available tracking methods. A novel adaptive sample count particle filter (ASCPF) tracking method is presented in this paper for which the main motivation is to accurately track an object in crowded scenes using fewer particles and hence with reduced computational overhead. Instead of taking a fixed number of particles, a particle range technique is used where an upper and lower bound for the range is initially identified. Particles are made to switch between an active and inactive state within this identified range. The idea is to keep the number of active particles to a minimum and only to increase this as and when required. Active contours are also utilized to determine a precise area of support around the tracked object from which the color histograms used by the particle filter can be accurately calculated. This, together with the variable particle spread, allows a more accurate proposal distribution to be generated while using less computational resource. Experimental results show that the proposed method not only tracks the object with comparable accuracy to existing particle filter techniques but is up to five times faster.  相似文献   

15.
In cloud computing, cost optimization is a prime concern for load scheduling. The swarm based meta-heuristics are prominently used for load scheduling in distributed computing environment. The conventional load scheduling approaches require a lot of resources and strategies which are non-adaptive and static in the computation, thereby increasing the response time, waiting time and the total cost of computation. The swarm intelligence-based load scheduling is adaptive, intelligent, collective, random, decentralized, self-collective, stochastic and is based on biologically inspired mechanisms than the other conventional mechanisms. The genetic algorithm schedules the particles based on mutation and crossover techniques. The force and acceleration acting on the particle helps in the finding the velocity and position of the next particle. The best position of the particles is assigned to cloudlets to be executed on the virtual machines in the cloud. The paper proposes a new load scheduling technique, Hybrid Genetic-Gravitational Search Algorithm (HG-GSA) for reducing the total cost of computation. The total computational cost includes cost of execution and transfer. It works on hybrid crossover technique based gravitational search algorithm for searching the best position of the particle in the search space. The best position of the particle is used calculating the force. The HG-GSA is compared to the existing approaches in the CloudSim simulator. By the convergence and statistical analysis of the results, the proposed HG-GSA approach reduces the total cost of computation considerably as compared to existing PSO, Cloudy-GSA and LIGSA-C approaches.  相似文献   

16.
Interacting and annealing are two powerful strategies that are applied in different areas of stochastic modelling and data analysis. Interacting particle systems approximate a distribution of interest by a finite number of particles where the particles interact between the time steps. In computer vision, they are commonly known as particle filters. Simulated annealing, on the other hand, is a global optimization method derived from statistical mechanics. A recent heuristic approach to fuse these two techniques for motion capturing has become known as annealed particle filter. In order to analyze these techniques, we rigorously derive in this paper two algorithms with annealing properties based on the mathematical theory of interacting particle systems. Convergence results and sufficient parameter restrictions enable us to point out limitations of the annealed particle filter. Moreover, we evaluate the impact of the parameters on the performance in various experiments, including the tracking of articulated bodies from noisy measurements. Our results provide a general guidance on suitable parameter choices for different applications.
Jürgen GallEmail:
  相似文献   

17.
In particle transport simulations, radiation effects are often described by the discrete ordinates (Sn) form of Boltzmann equation. In each ordinate direction, the solution is computed by sweeping the radiation flux across the grid. Parallel Sn sweep on an unstructured grid can be explicitly modeled as topological traversal through an equivalent directed acyclic graph (DAG), which is a data-driven algorithm. Its traditional design using MPI model results in irregular communication of massive short messages which cannot be effciently handled by MPI runtime. Meanwhile, in high-end HPC cluster systems, multicore has become the standard processor configuration of a single node. The traditional data-driven algorithm of Sn sweeps has not exploited potential advantages of multi-threading of multicore on shared memory. These advantages, however, as we shall demonstrate, could provide an elegant solution resolving problems in the previous MPI-only design. In this paper, we give a new design of data-driven parallel Sn sweeps using hybrid MPI and Pthread programming, namely Sweep-H, to exploit hierarchical parallelism of processes and threads. With special multi-threading techniques and vertex schedule policy, Sweep-H gets more effcient communication and better load balance. We further present an analytical performance model for Sweep-H to reveal why and when it is advantageous over former MPI counterpart. On a 64-node multicore cluster system with 12 cores per node, 768 cores in total, Sweep-H achieves nearly linear scalability for moderate problem sizes, and better absolute performance than the previous MPI algorithm on more than 16 nodes (by up to two times speedup on 64 nodes).  相似文献   

18.
19.
针对区间量测下目标的实时检测与跟踪问题,提出基于无迹变换的伯努利粒子滤波算法(Bernoulli- Upf).该算法在伯努利粒子滤波算法(Bernoulli-pf)的基础上融合无迹卡尔曼滤波(UKF),融合后的算法在预测步骤产生持续存活粒子时,充分考虑到当前时刻的量测,从而引导粒子向高似然区域移动,使得粒子分布更加接近真实状态的后验分布.仿真实验表明,Bernoulli-Upf算法的估计精度优于Bernoulli-pf算法.  相似文献   

20.
Visualizing dynamic participating media in particle form by fully solving equations from the light transport theory is a computationally very expensive process. In this paper, we present a computational pipeline for particle volume rendering that is easily accelerated by the current GPU. To fully harness its massively parallel computing power, we transform input particles into a volumetric density field using a GPU-assisted, adaptive density estimation technique that iteratively adapts the smoothing length for local grid cells. Then, the volume data is visualized efficiently based on the volume photon mapping method where our GPU techniques further improve the rendering quality offered by previous implementations while performing rendering computation in acceptable time. It is demonstrated that high quality volume renderings can be easily produced from large particle datasets in time frames of a few seconds to less than a minute.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号