首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
The simulation of electromagnetic (EM) waves propagation in the dielectric media is presented using Compute Unified Device Architecture (CUDA) implementation of finite‐difference time‐domain (FDTD) method on graphic processing unit (GPU). The FDTD formulation in the dielectric media is derived in detail, and GPU‐accelerated FDTD method based on CUDA programming model is described in the flowchart. The accuracy and speedup of the presented CUDA‐implemented FDTD method are validated by the numerical simulation of the EM waves propagating into the lossless and lossy dielectric media from the free space on GPU, by comparison with the original FDTD method on CPU. The comparison of the numerical results of CUDA‐implemented FDTD method on GPU and original FDTD method on CPU demonstrates that the CUDA‐implemented FDTD method on GPU can obtain better application speedup performance with reasonable accuracy. © 2016 Wiley Periodicals, Inc. Int J RF and Microwave CAE 26:512–518, 2016.  相似文献   

2.
The one‐step leapfrog alternative‐direction‐implicit finite‐difference time‐domain (ADI‐FDTD), free from the Courant‐Friedrichs‐Lewy (CFL) stability condition and sub‐step computations, is efficient when dealing with fine grid problems. However, solution of the numerous tridiagonal systems still imposes a great computational burden and makes the method hard to execute in parallel. In this paper, we proposed an efficient graphic processing unit (GPU)‐based parallel implementation of the one‐step leapfrog ADI‐FDTD for the far‐field EM scattering simulation of objects, in which we present and analyze the manners of calculation area division and thread allocation and a data layout transformation of z components is proposed to achieve better memory access mode, which is a key factor affecting GPU execution efficiency. The simulation experiment is carried out to verify the accuracy and efficiency of the GPU‐based implementation. The simulation results show that there is a good agreement between the proposed one‐step leapfrog ADI‐FDTD method and Yee's FDTD in solving the far‐field scattering problem and huge benefits in performance were encountered when the method was accelerated using GPU technology.  相似文献   

3.
为解决时域有限差分(FDTD)算法应用于电大尺寸目标仿真的巨大耗时问题,应用FDTD算法的并行特性和通用图形处理器(GPGPU)技术,实现了一种基于计算统一设备架构(CUDA)的三维FDTD并行计算方法,采用了时域卷积完全匹配层(CPML)吸收边界条件模拟开域空间,对不同网格数目标仿真计算。进一步结合FDTD算法和CUDA的特点进行了优化,当计算空间元胞数在十万数量级及以上时,优化前后GPU运算相对于同时期的CPU分别可获得10和25倍以上的加速,结果表明该方法较适合用于实际电磁问题的仿真。  相似文献   

4.
本文研究六边形区域上快速傅里叶变换(FFTH)的CUDA-MPI算法及其实现.首先,我们通过充分利用CUDA的层次化并行机制及其库函数,设计了FFTH的高效率的CUDA算法.对于规模为3×2048~2的双精度复数类型数据,我们设计的CUDA程序与CPU串行程序相比可以达到12倍加速比,如果不计内存和显存之间的数据传输,则加速比可达40倍;其计算效率与CUFFT所提供的二维方形区域FFT程序的效率基本一致.在此基础上,我们通过研究GPU上分布式并行数据的转置与排序算法,优化设计了FFTH的CUDA-MPI算法.在3×8192~2的数据规模、10节点×6GPU的计算环境下,我们的CUDA-MPI程序与CPU串行程序相比达到了55倍的加速;其效率比MPI并行版FFTW以及基于CUFFT本地计算和FFTW并行转置的方形区域并行FFT的效率都要高出很多.FFTH的CUDA-MPI算法研究和测试为大规模CPU+GPU异构计算机系统的可扩展新型算法的探索提供了参考.  相似文献   

5.
CUDA并行技术与数字图像几何变换   总被引:2,自引:0,他引:2  
CUDA是GPU通过并发执行多个线程以实现大规模快速并行计算能力的技术,它能使对GPU编程变得更容易。介绍了CUDA基本特性及主要编程模型,在此基础上,提出并实现了基于NVIDIA CUDA技术的图像快速几何变换。采用位置偏移增量代替原变换算法中大量乘法运算,并把CUDA技术的快速并行计算能力应用到数字图像几何变换中,解决了基于CPU的传统图像几何变换运算效率低下的问题。实验结果证明使用CUDA技术,随着处理图像尺寸的增加,对数字图像几何变换处理效率最高能够提高到近100倍。  相似文献   

6.
The hybrid implicit‐explicit (HIE) finite‐difference time‐domain (FDTD) method with the convolutional perfectly matched layer (CPML) is extended to a full three‐dimensional scheme in this article. To demonstrate the application of the CPML better, the entire derivation process is presented, in which the fine scale structure is changed from y‐direction to z‐direction of the propagation innovatively. The numerical examples are adopted to verify the efficiency and accuracy of the proposed method. Numerical results show that the HIE‐FDTD with CPML truncation has the similar relative reflection error with the FDTD with CPML method, but it is much better than the methods with Mur absorbing boundary. Although Courant‐Friedrich‐Levy number climbs to 8, the maximum relative error of the proposed HIE‐CPML remains more below than ?71 dB, and CPU time is nearly 72.1% less than the FDTD‐CPML. As an example, a low‐pass filter is simulated by using the FDTD‐CPML and HIE‐CPML methods. The curves obtained are highly fitted between two methods; the maximum errors are lower than ?79 dB. Furthermore, the CPU time saved much more, accounting for only 26.8% of the FDTD‐CPML method while the same example simulated.  相似文献   

7.
To verify the effect of artificial anisotropy parameters in one‐step leapfrog hybrid implicit‐explicit finite‐difference time‐domain (FDTD) method, we calculated several microwave components with different characteristics. Introduced auxiliary field variable can reduce the program difficulty and improve the computational efficiency without additional computational time and memory cost. Analyses of the numerical results are proved that the calculation time is reduced to about one‐sixth compared to the traditional FDTD method for the same example simulated. The memory cost and relative error are remained at a good level. The numerical experiments for microwave circuit and antenna have been well demonstrated the method available.  相似文献   

8.
In this article, a hybrid algorithm based on traditional finite‐difference time‐domain (FDTD) and weakly conditionally stable finite‐difference time‐domain (WCS‐FDTD) algorithm is proposed. In this algorithm, the calculation domain is divided into fine‐grid region and coarse‐grid region. The traditional FDTD method is used to calculate the field value in the coarse‐grid region, while the WCS‐FDTD method is used in the fine‐grid region. The spatial interpolation scheme is applied to the interface of the coarse grid region and fine grid region to insure the stability and precision of the presented hybrid algorithm. As a result, a relatively large time step size, which is only determined by the spatial cell sizes in the coarse grid region, is applied to the entire calculation domain. This scheme yields a significant reduction both of computation time and memory requirement in comparison with the conventional FDTD method and WCS‐FDTD method, which are validated by using numerical results.  相似文献   

9.
This article presents a new design of multiband planar inverted‐F antenna with slotted ground plane and S‐etched slot on the radiation patch. The proposed antenna is optimized using an efficient global hybrid optimization method combining bacterial swarm optimization and Nelder‐Mead (BSO‐NM) algorithm to cover a very important six service bands including GSM900, GPS1575, DCS1800, PCS1900, ISM2450, and 4G5000 MHz with enhanced bandwidths. The BSO‐NM algorithm in Matlab code is linked to the CST Microwave studio software to simulate the antenna. To validate the results, the antenna is analyzed using the finite difference time domain (FDTD) method. A good agreement is achieved between the results of EM simulation and that produced from the FDTD method. © 2012 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2013.  相似文献   

10.
Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General‐purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU‐based implementations of the quicksort were presented in literature: the GPU‐quicksort, a compute‐unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA‐quicksort an iterative GPU‐based implementation of the sorting algorithm. CUDA‐quicksort has been designed starting from GPU‐quicksort. Unlike GPU‐quicksort, it uses atomic primitives to perform inter‐block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA‐quicksort is up to four times faster than GPU‐quicksort and up to three times faster than CDP‐quicksort. An in‐depth analysis of the performance between CUDA‐quicksort and GPU‐quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA‐quicksort. Experimental results show that CUDA‐quicksort is faster than the CDP‐quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

11.
针对目前基于普通DSP的FIR算法速度低、扩展性差的缺点,提出并实现基于CUDA平台实现的FIR滤波算法。由于在CUDA中程序可以直接操作数据而无需借助于图形系统的API,使开发者能够在GPU 强大计算能力的基础上建立起一种效率更高的密集数据计算解决方案。该算法将CUDA用于FIR滤波器输入输出关系计算,采用矩阵乘法的并行运算技术,在GPU上建立并行滤波模型,并对算法进行了优化。实验结果表明,在Tesla C1060平台上,和传统的基于DSP的FIR滤波算法计算速度相比,基于CUDA平台计算FIR滤波算法时,其加速比可接近30,解决了传统基于DSP计算FIR滤波算法速度较慢、扩展性差的问题。  相似文献   

12.
由于Madab软件的网络通信局限,使得在并行时域有限差分(FDTD)计算仿真中,难以实现子域间的消息发送与接收操作.针对这个问题,提出一种新的基于磁盘-内存互逆映射的解决方法,在简化并行FDTD算法实现的同时,显著提高了算法执行性能.作为算法实现的应用,对光子晶体光波导的电磁耦合效应进行了数值仿真研究,结果证实:波导耦合区域内不同半径比介质柱所导致的结构变化将造成耦合长度的改变,且其耦合关系曲线具有平稳区与迅变区两类不同特性的变化范围区间.  相似文献   

13.
CUDA并行计算技术在情报信息研判中的应用   总被引:3,自引:0,他引:3  
文章在研究公安情报信息研判技术的基础上,提出了一种基于CUDA并行计算技术的方法,实现对公安情报信息中文本信息快速分类的方法,实现将CUDA技术的快速计算能力应用到公安情报研判工作中。该文从介绍CUDA技术的概况出发,阐述了基于CUDA并行计算技术的文本分类方法,以及该方法的详细实现过程,解决了高效处理海量文本信息的问题。实验结果证明,CUDA并行计算技术在公安情报信息研判工作中卓有成效。  相似文献   

14.
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data‐parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth‐first frustum traversal through a bounding volume hierarchy for the scene, and localized ray‐primitive intersections. We utilize the well known parallel primitives scan and segmented scan in order to process irregular data structures, to remove the need for a stack, and to minimize branch divergence in all stages. Our ray sorting stage is based on applying hash values to individual rays, ray stream compression, sorting and decompression. Our breadth‐first BVH traversal is based on parallel frustum‐bounding box intersection tests and parallel scan per each BVH level. We demonstrate our algorithm with area light sources to get a soft shadow effect and show that our concept is reasonable for GPU implementation. For the same data sets and ray‐primitive intersection routines our pipeline is ~3x faster than an optimized standard depth first ray tracing implemented in one kernel.  相似文献   

15.
We present an implementation approach for Marching Cubes (MC) on graphics hardware for OpenGL 2.0 or comparable graphics APIs. It currently outperforms all other known graphics processing units (GPU)‐based iso‐surface extraction algorithms in direct rendering for sparse or large volumes, even those using the recently introduced geometry shader (GS) capabilites. To achieve this, we outfit the Histogram Pyramid (HP) algorithm, previously only used in GPU data compaction, with the capability for arbitrary data expansion. After reformulation of MC as a data compaction and expansion process, the HP algorithm becomes the core of a highly efficient and interactive MC implementation. For graphics hardware lacking GSs, such as mobile GPUs, the concept of HP data expansion is easily generalized, opening new application domains in mobile visual computing. Further, to serve recent developments, we present how the HP can be implemented in the parallel programming language CUDA (compute unified device architecture), by using a novel 1D chunk/layer construction.  相似文献   

16.
在计算机断层扫描(CT)图像重建领域,当投影数据不完备或者含有噪声时,相对于滤波反投影(FBP)算法,联合代数重建方法(SART)能重建出质量更高、更符合临床诊断要求的图像。但SART方法非常耗时,而算法的并行实现是解决这一问题的有效途径之一。提出一种基于nVIDIA通用设备计算架构(CUDA)实现的SART并行运算方法。实验结果表明,该方法在不牺牲重建图像质量的基础上,重建时间大为缩减,更有利于临床应用。  相似文献   

17.
韦博文  李涛  李广宇  汪致恒  何沐  师悦龄  刘路遥  张瑞 《计算机科学》2016,43(Z11):167-169, 196
针对海量遥感数据应用中日益显著的处理效率低下和计算瓶颈问题,基于通用计算机图形处理单元的编程开发使用OpenCL并行处理技术对遥感数据处理及其过程进行加速,旨在为遥感影像大数据处理提供一条更为高效的途径。在不同显卡平台上对影像畸变纠正实施并行处理,结果表明,OpenCL技术在提高影像畸变纠正的速度方面作用显著,可取得29.1倍的最高加速效果;与CUDA并行处理技术的交叉验证进一步凸显了OpenCL技术在异构平台上实施并行处理时所具有的通用性的优势。  相似文献   

18.
基于GPU的现代并行优化算法   总被引:2,自引:2,他引:0  
针对现代优化算法在处理相对复杂问题中所面临的求解时间复杂度较高的问题,引入基于GPU的并行处理解决方法。首先从宏观角度阐释了基于计算统一设备架构CUDA的并行编程模型,然后在GPU环境下给出了基于CUDA架构的5种典型现代优化算法(模拟退火算法、禁忌搜索算法、遗传算法、粒子群算法以及人工神经网络)的并行实现过程。通过对比分析在不同环境下测试的实验案例统计结果,指出基于GPU的单指令多线程并行优化策略的优势及其未来发展趋势。  相似文献   

19.
基于CUDA的汇流分析并行算法的研究与实现*   总被引:2,自引:0,他引:2  
针对基于数字高程模型(DEM)生成流域等流时线的快速运算问题,提出了一种基于统一设备计算架构(CUDA)平台同时可发挥图形处理器(GPU)并行运算特性的汇流分析的快速并行算法。采用改进后的归并排序算法进行数据排序及新的内存分配策略和改进的并行算法进行汇流分析。用该并行算法和CPU上的串行算法, 对生成基于DEM的等流时线运算时间和矩阵乘法运算时间进行分析验证。实验结果表明,基于CUDA的汇流分析并行算法能提高系统的计算效率,具有较好的效果。  相似文献   

20.
We present a flexible and highly efficient hardware‐assisted volume renderer grounded on the original Projected Tetrahedra (PT) algorithm. Unlike recent similar approaches, our method is exclusively based on the rasterization of simple geometric primitives and takes full advantage of graphics hardware. Both vertex and geometry shaders are used to compute the tetrahedral projection, while the volume ray integral is evaluated in a fragment shader; hence, volume rendering is performed entirely on the GPU within a single pass through the pipeline. We apply a CUDA‐based visibility ordering achieving rendering and sorting performance of over 6 M Tet/s for unstructured datasets. Furthermore, as each tetrahedron is processed independently, we employ a data‐parallel solution which is neither bound by GPU memory size nor does it rely on auxiliary volume information. In addition, iso‐surfaces can be readily extracted during the rendering process, and time‐varying data are handled without extra burden.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号