首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Ordinal optimization has emerged as an efficient technique for simulation and optimization. Exponential convergence rates can be achieved in many cases. In this paper, we present a new approach that can further enhance the efficiency of ordinal optimization. Our approach intelligently determines the optimal number of simulation replications (or samples) and significantly reduces the total simulation cost. Numerical illustrations are included. The results indicate that our approach can obtain an additional 74% computation time reduction above and beyond the reduction obtained through the use of ordinal optimization for a 10-design example  相似文献   

2.
In this paper we apply the ideas of ordinal optimization and the technique of Standard Clock (SC) simulation to the voice-call admission-control problem in integrated voice/data multihop radio networks. This is an important problem in networking that is not amenable to exact analysis by means of the usual network modeling techniques. We first describe the use of the SC approach on sequential machines, and quantify the speedup in simulation time that is achieved by its use in a number of queueing examples. We then develop an efficient simulation model for wireless integrated networks based on the use of the SC approach, which permits the parallel simulation of a large number of admission-control policies, thereby reducing computation time significantly. This model is an extension of the basic SC approach in that it incorporates fixed-length data packets, whereas SC simulation is normally limited to systems with exponentially distributed interevent times. Using this model, we demonstrate the effectiveness of ordinal-optimization techniques, which provide a remarkably good ranking of admission-control policies after relatively short simulation runs, thereby facilitating the rapid determination of good policies. Moreover, we demonstrate that the use of crude, inaccurate analytical and simulation models can provide highly accurate policy rankings that can be used in conjunction with ordinal-optimization methods, provided that they incorporate the key aspects of system operation.  相似文献   

3.
Ordinal optimization has emerged as an efficient technique for simulation optimization. A good allocation of simulation samples across designs can further dramatically improve the efficiency of ordinal optimization. We investigate the efficiency gains of using dynamic simulation allocation for ordinal optimization by comparing the sequential version of the optimal computing budget allocation (OCBA) method with optimal static and one-step look-ahead dynamic allocation schemes with "perfect information" on the sampling distribution. Computational results indicate that this sequential version of OCBA, which is based on estimated performance, can easily outperform the optimal static allocation derived using the true sampling distribution. These results imply that the advantage of sequential allocation often outweighs having accurate estimates of the means and variances in determining a good simulation budget allocation. Furthermore, the performance of the perfect information dynamic scheme can be viewed as an approximate upper bound on the performance of different sequential schemes, thus providing a target for further achievable efficiency improvements using dynamic allocations  相似文献   

4.
Collective behaviour of winged insects is a wondrous and familiar phenomenon in the real world. In this paper, we introduce a highly efficient field‐based approach to simulate various insect swarms. Its core idea is to construct a smooth yet noise‐aware governing velocity field that can be further decomposed into two sub‐fields: (i) a divergence‐free curl‐noise field to model noise‐induced movements of individual insects in a swarm, and (ii) an enhanced global velocity field to control navigational paths in a complex environment along which all the insects in a swarm fly. Through simulation experiments and comparisons with existing crowd simulation approaches, we demonstrate that our approach is effective to simulate various insect swarm behaviours including aggregation, positive phototaxis, sedation, mass‐migrating, and so on. Besides its high efficiency, our approach is very friendly to parallel implementation on GPUs (e.g. the speedup achieved through GPU acceleration is higher than 50 if the number of simulated insects is more than 10 000 on an off‐the‐shelf computer). Our approach is the first multi‐agent modelling system that introduces curl‐noise into agents' velocity field and uses its non‐scattering nature to maintain non‐colliding movements in 3D crowd simulation.  相似文献   

5.
Cycle accurate simulation of processors is extremely time consuming. Sampling can greatly reduce simulation time while retaining good accuracy. Previous research on sampled simulation has been focusing on the accuracy of CPI. However, most simulations are used to evaluate the benefit of some microarchitectural enhancement, in which the speedup is a more important metric than CPI. We employ the ratio estimator from statistical sampling theory to design efficient sampling to measure speedup and to quantify its error. We show that to achieve a given relative error limit for speedup, it is not necessary to estimate CPI to the same accuracy. In our experiment, estimating speedup requires about 9X fewer instructions to be simulated in detail in comparison to estimating CPI for the same relative error limit. Therefore using the ratio estimator to evaluate speedup is much more cost-effective and offers great potential for reducing simulation time. We also discuss the reason for this interesting and important result.  相似文献   

6.
In this paper, we articulate a novel plastic phase-field (PPF) method that can tightly couple the phase-field with plastic treatment to efficiently simulate ductile fracture with GPU optimization. At the theoretical level of physically-based modeling and simulation, our PPF approach assumes the fracture sensitivity of the material increases with the plastic strain accumulation. As a result, we first develop a hardening-related fracture toughness function towards phase-field evolution. Second, we follow the associative flow rule and adopt a novel degraded von Mises yield criterion. In this way, we establish the tight coupling of the phase-field and plastic treatment, with which our PPF method can present distinct elastoplasticity, necking, and fracture characteristics during ductile fracture simulation. At the numerical level towards GPU optimization, we further devise an advanced parallel framework, which takes the full advantages of hierarchical architecture. Our strategy dramatically enhances the computational efficiency of preprocessing and phase-field evolution for our PPF with the material point method (MPM). Based on our extensive experiments on a variety of benchmarks, our novel method's performance gain can reach 1.56× speedup of the primary GPU MPM. Finally, our comprehensive simulation results have confirmed that this new PPF method can efficiently and realistically simulate complex ductile fracture phenomena in 3D interactive graphics and animation.  相似文献   

7.
闭排队网络基于并行仿真的灵敏度估计和优化算法   总被引:2,自引:0,他引:2  
基于Markov性能势理论,对一类闭排队网络的灵敏度估计和优化,建立了一种行之有效的并行仿真算法。采用公共随机数,使所有的处理器使用相同的样本轨道,以减少各个处理器之间的通讯时间。在一台SPMD并行计算机上的仿真实例表明,该并行仿真算法对于闭排队网络的优化能显著地提高运算速度。  相似文献   

8.
Ordinal Comparison via the Nested Partitions Method   总被引:7,自引:0,他引:7  
We analyze a new approach for simulation-based optimization of discrete event systems that draws on two recent stochastic optimization methods: an adaptive sampling approach called the nested partitions method and ordinal optimization. The ordinal optimization perspectives provides new insights into the convergence of the nested partitions method and guidelines for its implementation. We also use this approach to show that global convergence requires relatively simulation runs and propose new effective variants of the algorithm. Simulation results are presented to demonstrate the key results.  相似文献   

9.
本文研究六边形区域上快速傅里叶变换(FFTH)的CUDA-MPI算法及其实现.首先,我们通过充分利用CUDA的层次化并行机制及其库函数,设计了FFTH的高效率的CUDA算法.对于规模为3×2048~2的双精度复数类型数据,我们设计的CUDA程序与CPU串行程序相比可以达到12倍加速比,如果不计内存和显存之间的数据传输,则加速比可达40倍;其计算效率与CUFFT所提供的二维方形区域FFT程序的效率基本一致.在此基础上,我们通过研究GPU上分布式并行数据的转置与排序算法,优化设计了FFTH的CUDA-MPI算法.在3×8192~2的数据规模、10节点×6GPU的计算环境下,我们的CUDA-MPI程序与CPU串行程序相比达到了55倍的加速;其效率比MPI并行版FFTW以及基于CUFFT本地计算和FFTW并行转置的方形区域并行FFT的效率都要高出很多.FFTH的CUDA-MPI算法研究和测试为大规模CPU+GPU异构计算机系统的可扩展新型算法的探索提供了参考.  相似文献   

10.
This contribution describes a parallel approach for determining the collision state of a large collection of ellipsoids. Collision detection is required in granular dynamics simulation where it can combine with a differential variational inequality solver or discrete element method to approximate the time evolution of a collection of rigid bodies interacting through frictional contact. The approach proposed is structured on three levels. At the lowest level, the collision information associated with two colliding ellipsoids is obtained as the solution of a two-variable unconstrained optimization problem for which first and second order sensitivity information is derived analytically. Although this optimization approach suffices to resolve the collision problem between any two arbitrary ellipsoids, a less versatile but more efficient approach precedes it to gauge whether two ellipsoids are actually in contact and require the more costly optimization approach. This intermediate level draws on the analytical solution of a 3rd order polynomial obtained from the characteristic equation of two arbitrary ellipsoids. Finally, this intermediate level is invoked by the outer level only when a 3D spatial binning algorithm indicates that two ellipsoids share the same bin (box) and therefore could potentially collide. This multi-level approach is implemented in parallel and when executed on a ubiquitous Graphics Processing Unit (GPU) card scales linearly and yields a two orders of magnitude speedup over a similar algorithm executed on the Central Processing Unit (CPU). The GPU-based ellipsoid contact detection algorithm yields a 14-fold speedup over a CPU-based sphere contact detection algorithm implemented in the third party open source Bullet Physics Library (BPL). The proposed methodology provides the efficiency demanded by granular dynamics applications, which routinely handle scenarios with millions of collision events.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号