首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Graphics-processing units (GPUs) suitable for general-purpose numerical computation are now available with performances in excess of 1 Teraflops, faster by one to two orders of magnitude than conventional desktop CPUs. Monte Carlo particle transport algorithms are ideally suited to parallel processing architectures and so are good candidates for acceleration using a GPU. We have developed a general-purpose code that computes the transport of high energy (>1 keV) photons through arbitrary 3-dimensional geometry models, simulates their physical interactions and performs tallying and variance reduction. We describe a new algorithm, the particle-per-block technique, that provides a good match with the underlying GPU multiprocessor hardware design. Benchmarking against an existing CPU-based simulation running on a single-core of a commodity desktop CPU demonstrates that our code can accurately model X-ray transport, with an approximately 35-fold speed-up factor.  相似文献   

2.
3.
A Graphics Processing Unit (GPU)-CUDA C and (Multi-core)-OpenMP versions of the Reaction Ensemble Monte Carlo method (REMC) are presented. The REMC algorithm is a powerful tool to investigate the equilibrium behavior of chemically reacting systems in highly non-ideal conditions. Both the GPU and the Multi-core versions of the code are particularly efficient when the total potential energy of the system must be calculated, as in the constant-pressure systems. Results, obtained in the case of Helium plasma at high pressure, show differences between real and ideal cases.  相似文献   

4.
Statistical tests are often performed to discover which experimental variables are reacting to specific treatments. Time-series statistical models usually require the researcher to make assumptions with respect to the distribution of measured responses which may not hold. Randomization tests can be applied to data in order to generate null distributions non-parametrically. However, large numbers of randomizations are required for the precise p-values needed to control false discovery rates. When testing tens of thousands of variables (genes, chemical compounds, or otherwise), significant q-value cutoffs can be extremely small (on the order of 10−5 to 10−8). This requires high-precision p-values, which in turn require large numbers of randomizations. The NVIDIA® Compute Unified Device Architecture® (CUDA®) platform for General Programming on the Graphics Processing Unit (GPGPU) was used to implement an application which performs high-precision randomization tests via Monte Carlo sampling for quickly screening custom test statistics for experiments with large numbers of variables, such as microarrays, Next-Generation sequencing read counts, chromatographical signals, or other abundance measurements. The software has been shown to achieve up to more than 12 fold speedup on a Graphics Processing Unit (GPU) when compared to a powerful Central Processing Unit (CPU). The main limitation is concurrent random access of shared memory on the GPU. The software is available from the authors.  相似文献   

5.
一种基于OPENACC的GPU加速实现高斯模糊算法   总被引:1,自引:0,他引:1  
针对使用底层API进行GPU加速时存在的编码复杂以及效率低下等缺陷,文中试图利用基于中间层的OPENACC加速技术对传统的串行代码进行改写,从而达到改善开发效率,简化代码之目的.文中以传统的串行高斯模糊算法为处理对象,在其中添加OPENACC指令,提出基于OPENACC指令的GPU加速算法,并对算法流程进行了分析和说明.通过与原生CUDA和串行高斯的结果对比之后,发现随着处理像素数量的增加,串行高斯性能呈指数变化,而CUDA和OPENAC则呈线性变化.结果表明,该算法能在不改变原有非并行代码结构的基础上,通过增加高效的OPENACC指令即可获得与CUDA近似的图像处理质量和处理性能,且较CUDA具有更高的代码开发效率.  相似文献   

6.
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.  相似文献   

7.
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468-4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message Parsing Interface (MPI) on the CPU level, a single Ising lattice can be updated by a cluster of GPUs in parallel. For large systems, the computation time scales nearly linearly with the number of GPUs used. As proof of concept we reproduce the critical temperature of the 2D Ising model using finite size scaling techniques.  相似文献   

8.
An algorithm is described for the Monte Carlo simulation of neutron diffusion, including the treatment of fission neutrons, which is efficient on a computer with SIMD architecture. Tests carried out on the ICL DAP give results which compare favourably for timing with those obtained using standard UKAEA serial programs.  相似文献   

9.
The processor evolution has reached a critical moment in time where it will soon be impossible to increase the frequency much further. Processor designers such as Motorola, Intel and IBM have all realised that the only way to improve the FLOP/Watt ratio is to develop multi-core devices. One of the most current examples of multi-core processors is the new Sony/Toshiba/IBM Cell/B.E. multi-core processor. For the suitability to run in parallel, Monte Carlo methods are often considered embarrassingly parallel. This paper describes how a common Monte Carlo based financial simulation can be calculated in parallel using the Cell/B.E. multi-core processor. The measured performance with the achieved multi-core speed-up is also presented. With the recent availability of this increasingly available technology, financial simulations can now be performed in a fraction of the time it used to. This can also be achieved with a limited power and volume budget using commercially available technology. The main challenge with multi-core devices is clearly the programmability. The work presented here describes how this challenge could be dealt with.A basic MPI library has been developed to handle the partitioning and communication of data. The thread creation follows a POSIX thread creation model. MPI together with POSIX make the application portable in between various multi-processor systems and multi-core devices. The conclusions made indicate that a function offload MPI implementation on the Cell/B.E. multi-core processor can efficiently be used to speed-up the Monte Carlo solution of financial simulations. The conclusions made herein are also applicable to other situations where an algorithm can be easily parallelized.  相似文献   

10.
Quantum Monte Carlo methods enable us to determine the ground-state properties of atomic or molecular clusters. Here, we present a reconfigurable computing architecture using Field Programmable Gate Arrays (FPGAs) to accelerate two computationally intensive kernels of a Quantum Monte Carlo (QMC) application applied to N-body systems. We focus on two key kernels of the QMC application: acceleration of potential energy and wave function calculations. We compare the performance of our application on two reconfigurable platforms. Firstly, we use a dual-processor 2.4 GHz Intel Xeon augmented with two reconfigurable development boards consisting of Xilinx Virtex-II Pro FPGAs. Using this platform, we achieve a speedup of 3× over a software-only implementation. Following this, the chemistry application is ported to the Cray XD1 supercomputer equipped with Xilinx Virtex-II Pro and Virtex-4 FPGAs. The hardware-accelerated application on one node of the high performance system equipped with a single Virtex-4 FPGA yields a speedup of approximately 25× over the serial reference code running on one node of the dual-processor dual-core 2.2 GHz AMD Opteron. This speedup is mainly attributed to the use of pipelining, the use of fixed-point arithmetic for all calculations and the fine-grained parallelism using FPGAs. We can further enhance the performance by operating multiple instances of our design in parallel.  相似文献   

11.
由于GPU(图形处理器)性能的大幅提高和可编程性的发展,基于GPU的光线追踪算法逐渐成为研究热点。光线追踪算法需要的计算量大,基于此,分析了光线追踪算法的基本原理,在NVIDIA公司的CUDA(计算统一设备体系结构)环境下采用均匀栅格法作为加速结构实现了光线追踪算法。实验结果表明,该计算模式相对于传统基于CPU的光线追踪算法具有更快的整体运算速度,GPU适合处理高密度数据计算。  相似文献   

12.
蒙特卡罗中子-光子输运程序MCNP的并行化   总被引:2,自引:0,他引:2  
1.引 言 随着并行计算机的问世,并行算法和并行系统也不断发展,如 PVM(Parallel VirturalMachine),SMP(Sharae Memory Processors);MPI(Message Passing Interface)和 HPF(High Power Fortran)等,这些并行系统原理基本相同,差异主要是并行指令和数据传递方式.在这些并行系统中,PVM和 MPI系统具有通用性强、系统规模小、使用方便和可移植性强的优点,且安装、测试、编程与实现相对要容易一些,它是当前国际卜公认…  相似文献   

13.
Many highly developed Monte Carlo tools for the evaluation of cross sections based on tree matrix elements exist and are used by experimental collaborations in high energy physics. As the evaluation of one-loop matrix elements has recently been undergoing enormous progress, the combination of one-loop matrix elements with existing Monte Carlo tools is on the horizon. This would lead to phenomenological predictions at the next-to-leading order level. This note summarises the discussion of the next-to-leading order multi-leg (NLM) working group on this issue which has been taking place during the workshop on Physics at TeV Colliders at Les Houches, France, in June 2009. The result is a proposal for a standard interface between Monte Carlo tools and one-loop matrix element programs.Dedicated to the memory of, and in tribute to, Thomas Binoth, who led the effort to develop this proposal for Les Houches 2009. Thomas led the discussions, set up the subgroups, collected the contributions, and wrote and edited this paper. He made a promise that the paper would be on the arXiv the first week of January, and we are faithfully fulfilling his promise. In his honour, we would like to call this the Binoth Les Houches Accord.  相似文献   

14.
提供了一种新的贷款组合决策优化方法,该模型用更能反映贷款组合信用风险特征的CVaR作为风险度量。由于在实际中很难获取各笔贷款的历史数据,为此给出了一种基于Matlab语言的Monte Carlo仿真方法。从而使谊模型可以通过线性规划技术有效的进行求解。最后给出了一个例子。  相似文献   

15.
针对具有三级维修机构保障的复杂设备,通过对设备使用维修流程分析,给出了设备整个使用寿命期内的使用与维修状态转移图,建立了设备整个使用寿命期内的维修周期与平均可用度关系模型。并应用蒙特卡洛仿真方法,结合算例分析得到了使平均可用度达到最大的最佳维修周期,说明了模型的适用性与灵敏性,可为设备维修决策提供依据。  相似文献   

16.
We report on Monte Carlo simulations of a single coarse-grained polystyrene chain in spherical confinement. To this end we employ a variant of the freely rotating chain model, the parameters of which are chosen to mimic polystyrene in good solvent conditions. Entanglements are analyzed as a function of molecular weight and capsid radius to provide an educated guess about the structure of a single polystyrene chain in a miniemulsion droplet. We also show that significant knotting occurs first when the radius of the confining sphere falls below the chain?s radius of gyration.  相似文献   

17.
介绍了Monte Carlo方法,提出其在模拟Buffer问题时存在的一个问题,并给出改进的方法;提出了用Monte Carlo方法产生任意分布随机变量的原理及方法,并对Beta分布和标准正态分布随机变量进行了计算机模拟和效果检验。  相似文献   

18.
We discuss the advantages of parallelization by multithreading on graphics processing units (GPUs) for parallel tempering Monte Carlo computer simulations of an exemplified bead-spring model for homopolymers. Since the sampling of a large ensemble of conformations is a prerequisite for the precise estimation of statistical quantities such as typical indicators for conformational transitions like the peak structure of the specific heat, the advantage of a strong increase in performance of Monte Carlo simulations cannot be overestimated. Employing multithreading and utilizing the massive power of the large number of cores on GPUs, being available in modern but standard graphics cards, we find a rapid increase in efficiency when porting parts of the code from the central processing unit (CPU) to the GPU.  相似文献   

19.
The allocation of design and manufacturing tolerances has a significant effect on both manufacturing cost and quality. This paper considers nonlinearly constrained tolerance allocation problems. The purpose is to minimize the ratio between the sum of the manufacturing costs (tolerances costs) and the risk (probability of the respect of geometrical requirements). The techniques of Monte Carlo simulation and genetic algorithm are adopted to solve these problems. As the simplest and the popular method for non-linear statistical tolerance analysis, the Monte Carlo simulation is introduced into the frame. Moreover, in order to make the frame efficient, the genetic algorithm is improved according to the features of the Monte Carlo simulation. An illustrative example (hyperstatic mechanism) is given to demonstrate the efficiency of the proposed approach.  相似文献   

20.
基于蒙特卡洛方法的主动声纳信号检测性能分析   总被引:1,自引:0,他引:1  
主动声纳信号检测性能的分析上,目前在计算机仿真中一般假定混响包络的统计特性符合瑞利分布模型。基于此模型,已经有了较完善的理论。然而,在现代高分辨声纳系统中,混响包络的统计特性并不符合瑞利分布模型。此时在接收机工作特性分析时存在大量繁琐的公式推导。因此该文采用蒙特卡洛(Monte Carlo)统计试验方法,实现对瑞利分布混响背景下的主动声纳信号检测性能分析。结合对接收机工作特性曲线的仿真,得出了检测概率的理论值和仿真结果的误差曲线。误差曲线表明,蒙特卡洛方法在主动声纳信号检测的性能评估中是可行的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号