首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Refined models and nonlinear time-history analysis have been important developments in the field of urban regional seismic damage simulation. However, the application of refined models has been limited because of their high computational cost if they are implemented on traditional central processing unit (CPU) platforms. In recent years, graphics processing unit (GPU) technology has been developed and applied rapidly because of its powerful parallel computing capability and low cost. Hence, a coarse-grained parallel approach for seismic damage simulations of urban areas based on refined models and GPU/CPU cooperative computing is proposed. The buildings are modeled using a multi-story concentrated-mass shear (MCS) model, and their seismic responses are simulated using nonlinear time-history analysis. The benchmark cases demonstrate the performance-to-price ratio of the proposed approach can be 39 times as great as that of a traditional CPU approach. Finally, a seismic damage simulation of a medium-sized urban area is implemented to demonstrate the capacity and advantages of the proposed method.  相似文献   

2.
Most computational fluid dynamics (CFD) simulations require massive computational power which is usually provided by traditional High Performance Computing (HPC) environments. Although interactivity of the simulation process is highly appreciated by scientists and engineers, due to limitations of typical HPC environments, present CFD simulations are usually executed non interactively. A recent trend is to harness the parallel computational power of graphics processing units (GPUs) for general purpose applications. As an alternative to traditional massively parallel computing, GPU computing has also gained popularity in the CFD community, especially for its application to the lattice Boltzmann method (LBM). For instance, Tölke and others presented very efficient implementations of the LBM for 2D as well as 3D space (Toelke J, in Comput Visual Sci. (2008); Toelke J and Krafczk M, in Int J Comput Fluid Dyn 22(7): 443–456 (2008)). In this work we motivate the use of GPU computing to facilitate interactive CFD simulations. In our approach, the simulation is executed on multiple GPUs instead of traditional HPC environments, which allows the integration of the complete simulation process into a single desktop application. To demonstrate the feasibility of our approach, we show a fully bidirectional fluid-structure-interaction for self induced membrane oscillations in a turbulent flow. The efficiency of the approach allows a 3D simulation close to realtime.  相似文献   

3.
We port a high-order finite-element application that performs the numerical simulation of seismic wave propagation resulting from earthquakes in the Earth on NVIDIA GeForce 8800 GTX and GTX 280 graphics cards using CUDA. This application runs in single precision and is therefore a good candidate for implementation on current GPU hardware, which either does not support double precision or supports it but at the cost of reduced performance. We discuss and compare two implementations of the code: one that has maximum efficiency but is limited to the memory size of the card, and one that can handle larger problems but that is less efficient. We use a coloring scheme to handle efficiently summation operations over nodes on a topology with variable valence. We perform several numerical tests and performance measurements and show that in the best case we obtain a speedup of 25.  相似文献   

4.
图形处理器(graphic processing unit,GPU)的最新发展已经能够以低廉的成本提供高性能的通用计算。基于GPU的CUDA(compute unified device architecture)和OpenCL(open computing language)编程模型为程序员提供了充足的类似于C语言的应用程序接口(application programming interface,API),便于程序员发挥GPU的并行计算能力。采用图形硬件进行加速计算,通过一种新的GPU处理模型——并行时间空间模型,对现有GPU上的N-body实现进行了分析,从而提出了一种新的GPU上快速仿真N-body问题的算法,并在AMD的HD Radeon 5850上进行了实现。实验结果表明,相对于CPU上的实现,获得了400倍左右的加速;相对于已有GPU上的实现,也获得了2至5倍的加速。  相似文献   

5.
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.  相似文献   

6.
We present a General-purpose computing on graphics processing units (GPGPU) based computational program and framework for the electronic dynamics of atomic systems under intense laser fields. We present our results using the case of hydrogen, however the code is trivially extensible to tackle problems within the single-active electron (SAE) approximation. Building on our previous work, we introduce the first available GPGPU based implementation of the Taylor, Runge–Kutta and Lanczos based methods created with strong field ab-initio simulations specifically in mind; CLTDSE. The code makes use of finite difference methods and the OpenCL framework for GPU acceleration. The specific example system used is the classic test system; Hydrogen. After introducing the standard theory, and specific quantities which are calculated, the code, including installation and usage, is discussed in-depth. This is followed by some examples and a short benchmark between an 8 hardware thread (i.e. logical core) Intel Xeon CPU and an AMD 6970 GPU, where the parallel algorithm runs 10 times faster on the GPU than the CPU.  相似文献   

7.
大规模稀疏矩阵的主特征向量计算优化方法   总被引:1,自引:0,他引:1  
矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。  相似文献   

8.
在多核中央处理器(CPU)—图形处理器(GPU)异构并行体系结构上,采用OpenMP和计算统一设备架构(CUDA)编程实现了基于AMBER力场的蛋白质分子动力学模拟程序。通过合理地将程序划分为CPU单线程、CPU多线程和GPU多线程执行部分,高效地利用了计算机的处理能力。性能测试结果表明,相对于优化后的CPU串行计算,多核CPU-GPU异构并行计算模型有强大的性能优势,特别是将占整个程序执行时间90%的作用力的计算移植到GPU上执行,获得了最高可达12倍的计算加速比。  相似文献   

9.
In the last years, graphics processing units (GPUs) witnessed ever growing applications for a wide range of computational analyses in the field of life sciences. Despite its large potentiality, GPU computing risks remaining a niche for specialists, due to the programming and optimization skills it requires. In this work we present cupSODA, a simulator of biological systems that exploits the remarkable memory bandwidth and computational capability of GPUs. cupSODA allows to efficiently execute in parallel large numbers of simulations, which are usually required to investigate the emergent dynamics of a given biological system under different conditions. cupSODA works by automatically deriving the system of ordinary differential equations from a reaction-based mechanistic model, defined according to the mass-action kinetics, and then exploiting the numerical integration algorithm, LSODA. We show that cupSODA can achieve a \(86 \times \) speedup on GPUs with respect to equivalent executions of LSODA on the CPU.  相似文献   

10.
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with an approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle–particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.  相似文献   

11.
The computing power of graphics processing units (GPU) has increased rapidly, and there has been extensive research on general‐purpose computing on GPU (GPGPU) for cryptographic algorithms such as RSA, Elliptic Curve Cryptosystem (ECC), NTRU, and Advanced Encryption Standard. With the rise of GPGPU, commodity computers have become complex heterogeneous GPU+CPU systems. This new architecture poses new challenges and opportunities in high‐performance computing. In this paper, we present high‐speed parallel implementations of the rainbow method based on perfect tables, which is known as the most efficient time‐memory trade‐off, in the heterogeneous GPU+CPU system. We give a complete analysis of the effect of multiple checkpoints on reducing the cost of false alarms and take advantage of it for load balancing between GPU and CPU. For GTX460, our implementation is about 1.86 and 3.25 times faster than other GPU‐accelerated implementations, RainbowCrack and Cryptohaze, respectively, and for GTX580, 1.53 and 2.40 times faster. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
When a tsunami occurred on a sea area, prediction of its arrival time is critical for evacuating people from the coastal area. There are many problems related to tsunami to be solved for reducing negative effects of this serious disaster. Numerical modeling of tsunami wave propagation is a computationally intensive problem which needs to accelerate its calculations by parallel processing. The method of splitting tsunami (MOST) is one of the well-known numerical solvers for tsunami modeling. We have developed a tsunami propagation code based on MOST algorithm and implemented different parallel optimizations for GPU and FPGA. In the latest study, we have the best performance of OpenCL kernel which is implemented tsunami simulation on AMD Radeon 280X GPU. This paper targets on design and evaluation on FPGA using OpenCL. The performance on FPGA design generated automatically by Altera offline compiler follows the results of GPU by several kernel modifications.  相似文献   

13.
In this work, a parallel graphics processing units (GPU) version of the Monte Carlo stochastic grid bundling method (SGBM) for pricing multi-dimensional early-exercise options is presented. To extend the method's applicability, the problem dimensions and the number of bundles will be increased drastically. This makes SGBM very expensive in terms of computational costs on conventional hardware systems based on central processing units. A parallelization strategy of the method is developed and the general purpose computing on graphics processing units paradigm is used to reduce the execution time. An improved technique for bundling asset paths, which is more efficient on parallel hardware is introduced. Thanks to the performance of the GPU version of SGBM, a general approach for computing the early-exercise policy is proposed. Comparisons between sequential and GPU parallel versions are presented.  相似文献   

14.
The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarz’s, Cimmino’s, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradient-accelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method.  相似文献   

15.
The large processing requirements of seismic wave propagation simulations make High Performance Computing (HPC) architectures a natural choice for their execution. However, to keep both the current pace of performance improvements and the power consumption under a strict power budget, HPC systems must be more energy efficient than ever. As a response to this need, energy-efficient and low-power processors began to make their way into the market. In this paper we employ a novel low-power processor, the MPPA-256 manycore, to perform seismic wave propagation simulations. It has 256 cores connected by a NoC, no cache-coherence and only a limited amount of on-chip memory. We describe how its particular architectural characteristics influenced our solution for an energy-efficient implementation. As a counterpoint to the low-power MPPA-256 architecture, we employ Xeon Phi, a performance-centric manycore. Although both processors share some architectural similarities, the challenges to implement an efficient seismic wave propagation kernel on these platforms are very different. In this work we compare the performance and energy efficiency of our implementations for these processors to proven and optimized solutions for other hardware platforms such as general-purpose processors and a GPU. Our experimental results show that MPPA-256 has the best energy efficiency, consuming at least 77% less energy than the other evaluated platforms, whereas the performance of our solution for the Xeon Phi is on par with a state-of-the-art solution for GPUs.  相似文献   

16.
叠前逆时偏移(RTM)方法是目前地震勘探领域最为精确的一种地震数据成像方法,其运用双程声波方程进行波场延拓,可实现对复杂构造介质的准确成像.文中采用互相关成像条件对震源波场与检波点波场在同时刻相关成像.针对RTM方法计算量大的问题,将图形处理器(GPU)引入到RTM计算中,充分挖掘GPU的众核结构优势,利用基于CUDA架构的并行加速算法取代传统CPU的串行运算,对逆时偏移算法中较为耗时的波场延拓和相关成像过程进行加速.复杂模型测试结果表明,在确保RTM成像精度的前提下,相比于传统CPU计算,GPU并行加速算法可大幅度地提高计算效率,进而实现基于GPU加速的叠前逆时偏移算法对复杂介质的高效率、高精度成像.  相似文献   

17.
Molecular dynamics (MD) is an important research tool extensively applied in materials science. Running MD on a graphics processing unit (GPU) is an attractive new approach for accelerating MD simulations. Currently, GPU implementations of MD usually run in a one-host-process-one-GPU (OHPOG) scheme. This scheme may pose a limitation on the system size that an implementation can handle due to the small device memory relative to the host memory. In this paper, we present a one-host-process-multiple-GPU (OHPMG) implementation of MD with embedded-atom-model or semi-empirical tight-binding many-body potentials. Because more device memory is available in an OHPMG process, the system size that can be handled is increased to a few million or more atoms. In comparison with the serial CPU implementation, in which Newton’s third law is applied to improve the computational efficiency, our OHPMG implementation has achieved a 28.9x–86.0x speedup in double precision, depending on the system size, the cut-off ranges and the number of GPUs. The implementation can also handle a group of small simulation boxes in one run by combining the small boxes into a large box. This approach greatly improves the GPU computing efficiency when a large number of MD simulations for small boxes are needed for statistical purposes.  相似文献   

18.
We study the use of massively parallel architectures for computing a matrix inverse. Two different algorithms are reviewed, the traditional approach based on Gaussian elimination and the Gauss–Jordan elimination alternative, and several high performance implementations are presented and evaluated. The target architecture is a current general-purpose multicore processor (CPU) connected to a graphics processor (GPU). Numerical experiments show the efficiency attained by the proposed implementations and how the computation of large-scale inverses, which only a few years ago would have required a distributed-memory cluster, take only a few minutes on a hybrid architecture formed by a multicore CPU and a GPU.  相似文献   

19.
This paper presents a deep and extensive performance analysis of the particle filter (PF) algorithm for a very compute intensive 3D multi-view visual tracking problem. We compare different implementations and parameter settings of the PF algorithm in a CPU platform taking advantage of the multithreading capabilities of the modern processors and a graphics processing unit (GPU) platform using NVIDIA CUDA computing environment as developing framework. We extend our experimental study to each individual stage of the PF algorithm, and evaluate the quality versus performance trade-off among different ways to design these stages. We have observed that the GPU platform performs better than the multithreaded CPU platform when handling a large number of particles, but we also demonstrate that hybrid CPU/GPU implementations can run almost as fast as only GPU solutions.  相似文献   

20.
GRAPES(Global and Regional Assimilation and Prediction System)是由中国气象科学研究院自主研究开发的中国新一代数值天气预报系统,由于其处理的数据量非常庞大以及对实时性的要求较高,因此一直是并行计算领域研究的热点。首次运用GPU(图形处理器)通用计算及CUDA技术对CRAPES_Meso。模式中物理过程的RRTM(快速辐射传输模式)长波辐射模块进行并行化处理。在性能分析的基础上,针对GPU体系结构的特点,从代码优化、存储器优化、编译选项等方面对程序性能进行优化,并取得了14X倍的加速比。经过测试表明,长波辐射RRTM模块在GPU上并行计算过程正确、稳定而且有效,并为GRAPES系统未来在GPU平台上的并行化发展奠定了一定的基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号