期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

谭彩凤马安国邢座程《计算机工程与科学》2009,31(Z1)

CUDA技术方便程序员在GPU上进行通用计算,但并没有提供随机数产生的应用接口。为此,本文提出并实现在CUDA开发平台上并行产生均匀随机数算法,测试证明算法可行。在此基础上优化基本遗传算法,并在GPU上并行实现其所有操作,提高其运行速度和准确度;分析了种群大小和遗传代数对此算法加速比及准确度的影响,并与MAT-LAB工具箱进行比较。实验表明,相比MATLAB遗传算法工具箱,基于CUDA平台实现的遗传算法性能更高,准确度更好。相似文献

2.

Shuffle up and deal: accelerating GPGPU Monte Carlo simulation with application to option pricing

Aurelien Cassagnes Yu Chen Hirotada Ohashi 《Concurrency and Computation》2015,27(17):5203-5213

In this paper, we demonstrate some speedup opportunity regarding Monte Carlo simulation on graphic processing unit architecture, with financial application. We leverage on the possibility of reducing the volume of actually generated random numbers, by replacing the generation phase with some shuffling using Compute Unified Device Architecture's built‐in shuffle instructions. We will study various shuffling patterns and duration, elect the best among them with regard to induced correlation, using Granger causality test. We will then study the accuracy and variance of results actually achieved by our general‐purpose computing on graphic processing unit shuffled Monte‐Carlo, exhibiting a computational time reduced by half while error remains marginal. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

3.

MDx差分攻击算法改进及GPGPU上的有效实现 总被引：1，自引：0，他引：1

周林韩文报祝卫华王政《计算机学报》2010,33(7)

Hash函数广泛应用于商业、安全等领域,其中MDx系列Hash算法应用最为广泛.因此对MDx系列Hash算法的攻击在理论上和实际应用上都有重要的意义.自王小云教授提出差分攻击算法并攻破MD5、MD4等MDx系列算法以来,对该算法的研究日益受到关注.文中以攻击MD5的差分攻击算法为例,改进了Klima提出的MD5隧道差分攻击算法,分析其在GPGPU上实现的可行性和技术要求并在Visual studio 6.0的环境下利用CUDA语言开发完成.算法的CUDA程序在GeForce 9800 GX2平台下运行,平均每1.35s能找到一对MD5碰撞.通过同4核Core 2 Quad Q9000(2.0GHz)PC上的实现相比较,在GeForce 9800 GX2上的实现能达到11.5倍的性价比. 相似文献

4.

Performance evaluation of CUDA programming for 5-axis machining multi-scale simulation

《Computers in Industry》2015

5-Axis milling simulations in CAM software are mainly used to detect collisions between the tool and the part. They are very limited in terms of surface topography investigations to validate machining strategies as well as machining parameters such as chordal deviation, scallop height and tool feed. Z-buffer or N-buffer machining simulations provide more precise simulations but require long computation time, especially when using realistic cutting tools models including cutting edges geometry. Thus, the aim of this paper is to evaluate Nvidia CUDA architecture to speed-up Z-buffer or N-buffer machining simulations. Several strategies for parallel computing are investigated and compared to single-threaded and multi-threaded CPU, relatively to the complexity of the simulation. Simulations are conducted with two different configurations including Nvidia Quadro 4000 and Geforce GTX 560 graphic cards. 相似文献

5.

RSA算法的CUDA高效实现技术 总被引：1，自引：1，他引：0

下载免费PDF全文

孙迎红童元满王志英《计算机工程与应用》2011,47(2):84-87

CUDA（Compute Unified Device Architecture）作为一种支持GPU通用计算的新型计算架构,在大规模数据并行计算方面得到了广泛的应用。RSA算法是一种计算密集型的公钥密码算法,给出了基于CUDA的RSA算法并行化高效实现技术,其关键为引入大量独立并发的Montgomery模乘线程,并给出了具体的线程组织、数据存储结构以及基于共享内存的性能优化实现技术。根据RSA算法CUDA实现方法,在某款GPU上测试了RSA算法的运算性能和吞吐率。实验结果表明,与RSA算法的通用CPU实现方式相比,CUDA实现能够实现超过40倍的性能加速。相似文献

6.

CUDA平台下信息熵多种群遗传算法设计

李正夫王希诚李克秋姚翔董悦丽《计算机工程与应用》2016,52(1):12-16

为了进一步提高信息熵多种群遗传算法的计算效率,缩短计算时间,提出了一种基于CUDA平台的信息熵多种群遗传算法。通过分析原算法的并行因素,结合CUDA开发平台,对原算法进行适合GPU加速的并行化处理,实现了遗传算子、惩罚函数和空间收缩因子等的并行计算,有效地提高了算法效率。例题数值测试表明,在保持了快速收敛特性和计算精度的前提下,CUDA并行算法相对于原算法具有很高的加速效率。相似文献

7.

基于CUDA的BP算法并行化与实例验证

孙香玉冯百明杨鹏斐《计算机工程与应用》2013,(23):31-34,51

CUDA是应用较广的GPU通用计算模型,BP算法是目前应用最广泛的神经网络模型之一。提出了用CUDA模型并行化BP算法的方法。用该方法训练BP神经网络,训练开始前将数据传到GPU,训练开始后计算隐含层和输出层的输入输出和误差,更新权重和偏倚的过程都在GPU上实现。将该方法用于手写数字图片训练练实验,与在四核CPU上的训练相比,加速比为6．12～8．17。分别用在CPU和GPU上训练得到的结果识别相同的测试集图片,GPU上的训练结果对图片的识别率比CPU上的高0．05％～0．22％。相似文献

8.

基于CUDA架构的三维CPML-FDTD并行方法

下载免费PDF全文

胡媛李康孔凡敏杜刘革《计算机工程与应用》2011,47(25):220-223

为解决时域有限差分（FDTD）算法应用于电大尺寸目标仿真的巨大耗时问题,应用FDTD算法的并行特性和通用图形处理器（GPGPU）技术,实现了一种基于计算统一设备架构（CUDA）的三维FDTD并行计算方法,采用了时域卷积完全匹配层（CPML）吸收边界条件模拟开域空间,对不同网格数目标仿真计算。进一步结合FDTD算法和CUDA的特点进行了优化,当计算空间元胞数在十万数量级及以上时,优化前后GPU运算相对于同时期的CPU分别可获得10和25倍以上的加速,结果表明该方法较适合用于实际电磁问题的仿真。相似文献

9.

CUDA下受体评分网格生成并行算法 总被引：1，自引：0，他引：1

李正夫王希诚郭权《计算机应用研究》2013,30(3):814-816

针对分子对接中生成评分网格需要花费很多的计算时间这一问题, 提出了一种基于统一计算设备架构（CUDA）的评分网格生成并行算法。该算法把传统计算方法中三维计算空间中的一维通过在图形处理单元（GPU）上进行并行处理, 使得总生成时间得到了降低, 提高了评分网格的生成效率。实验结果表明, 借助于GPU的浮点计算能力, 提出的并行算法对比传统的计算方法可以显著缩短评分网格的生成时间, 为评分网格的生成提供一种新的方式。相似文献

10.

CUDA架构下大规模稠密线性方程组的并行求解 总被引：1，自引：0，他引：1

下载免费PDF全文

杨梅李志民曹大勇《计算机工程与应用》2011,47(32):27-30

在Gauss-Jordan消去法的基础上,给出了一种适应于CUDA架构的改进Gauss-Jordan消去并行算法。通过分析该方法的处理过程以及CUDA架构的相应限制,在CUDA的grid-block-thread三层组织结构的基础上,从算法构造的角度提出了grid-strip-group-block-thread五层结构,给出了基础行以及全局基础行等概念,并构建了适应于CUDA架构的Gauss-Jordan消去法的并行版本,在最高维数为4 000维的大规模稠密线性方程组的算例求解上与串行Gauss-Jordan消去法进行了比较,实验结果表明,该算法能够充分利用GPU的硬件特性,有效地降低了大规模稠密线性方程组的求解时间。相似文献

11.

A metaheuristic framework for stochastic combinatorial optimization problems based on GPGPU with a case study on the probabilistic traveling salesman problem with deadlines

Dennis Weyland Roberto Montemanni Luca Maria Gambardella 《Journal of Parallel and Distributed Computing》2013

In this work we propose a general metaheuristic framework for solving stochastic combinatorial optimization problems based on general-purpose computing on graphics processing units (GPGPU). This framework is applied to the probabilistic traveling salesman problem with deadlines (PTSPD) as a case study. Computational studies reveal significant improvements over state-of-the-art methods for the PTSPD. Additionally, our results reveal the huge potential of the proposed framework and sampling-based methods for stochastic combinatorial optimization problems. 相似文献

12.

An improved method for the removal of ring artifacts in synchrotron radiation images by using GPGPU computing with compute unified device architecture

Leqing Zhu Dadong Wang Huiyan Wang 《Concurrency and Computation》2014,26(18):2880-2892

Ring artifacts are a common problem in computed tomography, positron emission tomography, magnetic resonance imaging, and synchrotron radiation images. Before further processing the images such as segmentation and quantification, these artifacts have to be removed or suppressed. Otherwise, they may introduce additional errors for the segmentation and subsequent analysis. This paper proposes an improved ring artifact removal method based on biorthogonal wavelet transform, one‐dimensional fast Fourier transform, and Gaussian damping, which is implemented on general‐purpose computing on graphics processing unit with compute unified device architecture. The experimental results show that the proposed algorithms can be speed up several hundred times compared with the previous algorithms on CPU. The significant performance improvement makes the algorithms much more practical in processing large volume of images in real time. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

13.

统一设备计算架构下的栅格河网提取并行算法

王玉着刘修国张唯《计算机应用》2015,35(4):960-963

针对大规模高分辨率数字地形数据提取栅格河网效率低下的问题,提出了基于统一设备计算架构(CUDA)利用淹没模型提取栅格河网的并行算法。使用图形处理器(GPU)将汇流累积量计算分解为独立的多任务并行处理,通过数据异步传输减少数据交换时间,进而加速河网提取的运算。实验结果表明,该算法运行效率明显优于串行河网提取算法,在NVIDIA Geforce GTX660上对数据量为600 MB(网格大小为9784×8507)数字高程模型(DEM)数据提取河网加速比达到62。相似文献

14.

基于CUDA的并行加速渲染算法 总被引：1，自引：1，他引：0

下载免费PDF全文

刘镇郝冬宁梅向东《中国图象图形学报》2013,18(11):1457-1461

GPU可以快速有效的处理海量数据,因此在近些年成为图形图像数据处理领域的研究热点。针对现有GPU渲染中在处理含有大量相同或相似模型场景时存在资源利用率低下和带宽消耗过大的问题,在原有GPU渲染架构的基础上提出了一种基于CUDA的加速渲染方法。在该方法中,根据现有的GPU渲染模式构建对应的模型,通过模型找出其不足,从而引申出常量内存的概念;然后分析常量内存的特性以及对渲染产生的作用,从而引入基于常量内存控制的方法来实现渲染的加速,整个渲染过程可以通过渲染算法进行控制。实验结果表明,该方法对解决上述问题具有较好的效果,最终实现加速渲染。相似文献

15.

基于GPGPU的海量山地地形数据的实时绘制算法

王春马纯永陈戈《计算机应用》2009,29(8)

针对山地地形海量数据的特点,基于GPU的Geometrical Clipmap算法,应用简化的工作流程,结合GPGPU技术,采用了一种更为合理的高程数据组织交换模式,通过引入高程误差数据巧妙地解决不同分辨率之间的裂缝问题,并对高分辨率的遥感影像作为地形纹理的实现方法加以补充,进而实现可应用于虚拟现实系统的海量地形数据的实时可视化. 相似文献

16.

A performance study of general-purpose applications on graphics processors using CUDA 总被引：1，自引：0，他引：1

Shuai Che Michael Boyer Jiayuan Meng David Tarjan Jeremy W. Sheaffer Kevin Skadron 《Journal of Parallel and Distributed Computing》2008

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general-purpose applications compared to contemporary general-purpose processors (CPUs). This paper uses NVIDIA’s C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU. GPU performance is compared to both single-core and multicore CPU performance, with multicore CPU implementations written using OpenMP. The paper also discusses advantages and inefficiencies of the CUDA programming model and some desirable features that might allow for greater ease of use and also more readily support a larger body of applications. 相似文献

17.

基于CUDA的SKINNY加密算法并行实现与分析

解文博韦永壮刘争红《计算机应用》2021,41(4):1136-1141

针对SKINNY加密算法在中央处理器(CPU)下实现效率偏低的问题,提出一种基于图形处理器(GPU)的快速实现方法.首先,结合SKINNY算法的结构特征提出优化方案,将5个分步操作优化整合为1个整体运算;然后,分析该算法的电子密码本(ECB)模式和计数器(CTR)模式的特性,并给出并行粒度、内存分配等并行设计方案.实验... 相似文献

18.

基于CUDA的多尺度Retinex图像增强算法实现

王正宁刘昌忠陈雷霆吴宏刚吴敏《计算机应用》2010,30(9):2441-2443

多尺度Retinex图像增强是一种基于色彩恒定理论的图像增强算法,算法增强效果好,但随着图像分辨率的提高计算时间显著增加。分析并利用计算统一设备架构(CUDA)图形处理器(GPU)的并行处理特性,提出了一种基于CUDA的多尺度Retinex图像增强并行算法,将多尺度高斯滤波、对数空间差分和动态范围压缩等计算非常耗时的模块采用并行方式放在GPU中进行计算。实验结果表明所提算法能显著提高计算速度,随着图像分辨率的增加,最大加速比超过100倍。相似文献

19.

栅元有效共振积分的CUDA算法设计与实现

任成磊蒲鹏韩定定《计算机工程与科学》2016,38(2):224-230

核反应堆中需要实时精确地计算堆芯和增殖材料的有效共振积分或群截面来实现反应堆的安全控制。整个计算过程因为涉及大量的积分运算和庞大的核素截面数据,采用常规的计算方法,计算时耗相当大。基于统一计算设备架构(CUDA)平台,利用图形处理器(GPU)的计算能力,对整个计算过程进行并行化分解,多线程同时运算,大幅度提升计算速度,降低时耗。实验结果表明,在GPU上并行计算所得结果与原始数据没有明显差异,且加速效果显著。相似文献

20.

基于CUDA的并行粒子群优化算法的设计与实现 总被引：1，自引：0，他引：1

蔡勇李光耀王琥《计算机应用研究》2013,30(8):2415-2418

针对处理大量数据和求解大规模复杂问题时粒子群优化(PSO)算法计算时间过长的问题, 进行了在显卡(GPU)上实现细粒度并行粒子群算法的研究。通过对传统PSO算法的分析, 结合目前被广泛使用的基于GPU的并行计算技术, 设计实现了一种并行PSO方法。本方法的执行基于统一计算架构（CUDA）, 使用大量的GPU线程并行处理各个粒子的搜索过程来加速整个粒子群的收敛速度。程序充分使用CUDA自带的各种数学计算库, 从而保证了程序的稳定性和易写性。通过对多个基准优化测试函数的求解证明, 相对于基于CPU的串行计算方法, 在求解收敛性一致的前提下, 基于CUDA架构的并行PSO求解方法可以取得高达90倍的计算加速比。相似文献