期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王聪冯衍秋《计算机工程与应用》2011,47(17):203-206

利用GPGPU（General Purpose GPU）强大的并行处理能力,基于NVIDIA CUDA框架对已有的稀疏磁共振（Sparse MRI）重建算法进行了并行化改造,使其能够适应实际应用的要求。稀疏磁共振成像的重建算法包含大量的浮点运算,计算耗时严重,难以应用于实际,必须对其进行加速和优化。实验结果显示,NVIDIA GTX275 GPU使运算时间从4分多钟缩短到3.4秒左右,与Intel Q8200 CPU相比,达到了76倍的加速。相似文献

2.

基于2阶段同步的GPGPU线程块压缩调度方法

下载免费PDF全文

张军何炎祥沈凡凡江南李清安《计算机研究与发展》2016,53(6):1173-1185

通用图形处理器(general purpose graphics processing unit, GPGPU)在面向高性能计算、高吞吐量的通用计算领域的应用日益广泛,它采用的SIMD(single instruction multiple data)执行模式使其能获得强大的并行计算能力.目前主流的通用图形处理器均通过大量高度并行的线程完成计算任务的高效执行.但是在处理条件分支转移的控制流中,由于通用图形处理器采用串行的方式顺序处理不同的分支路径,使得其并行计算能力受到影响.在分析讨论前人针对分支转移处理低效的线程块压缩重组调度方法的基础上,提出了2阶段同步的线程块压缩重组调度方法TSTBC(two-stage synchronization based thread block compaction scheduling),通过线程块压缩重组适合性判断逻辑部件,分2个阶段对线程块进行压缩重组有效性分析,进一步减少了无效的线程块压缩重组次数.模拟实验结果表明：该方法较好地提高了线程块的压缩重组有效性,相对于其他同类方法降低了对线程组内部数据局部性的破坏,并使得片上一级数据cache的访问失效率得到有效降低;相对于基准体系结构,系统性能提升了19.27%. 相似文献

3.

基于GPGPU的大整数矩阵行列式快速准确计算方法

《计算机工程》2018,(3):47-54

传统计算数值矩阵行列式的方法多数基于串行计算,存在初等变换频繁、计算缓慢等问题。为此,提出基于通用计算图形处理器(GPGPU)的计算方法,以快速准确解决大整数矩阵行列式计算问题。在众核环境下利用GPGPU和模方法并行求解整数矩阵行列式,以加速计算过程并避免浮点运算误差,同时运用中国剩余定理得到准确计算结果。实验结果表明,与常用Maple、NTL等计算软件相比,该方法计算速度快,消耗内存少,可解决计算过程中内存膨胀的问题,对于高阶整数矩阵行列式优势较为明显。相似文献

4.

GPGPU技术及其在医学图像处理中的应用

马千里秦畅卞春华《现代计算机》2010,(8):35-37,46

介绍通用图形处理器技术（GPGPU）技术的基本原理及其特点,分析其在医学影像领域的应用,并对医学图像处理常用的卷积滤波算法和各向异性扩展滤波算法进行优化及实验。结果表明,此技术可以极大地提高图像处理的速度,使得普通计算机上也可实现复杂的医学影像处理与可视化应用。相似文献

5.

基于GPGPU的数字图像并行化预处理 总被引：2，自引：0，他引：2

宋晓丽王庆《计算机测量与控制》2009,17(6):1169-1171

首先简要介绍了统一设备架构CUDA(Compute Unified Device Architecture)技术的背景、特点、内存模型,利用通用计算图形处理单元GPGPU(General Purpose GPU)及CUDA技术,实现了图像直方图均衡化和薄云去除的并行化处理,与传统的基于CPU的方法相比,两个基于GPGPU的图像预处理操作的执行效率分别提高了40倍与80倍左右,在大规模实时性图像处理操作中,有很大的实用价值。相似文献

6.

使用CUDA平台关于并行高斯-约当消去法的研究与比较

毛飞陈智骏梁效斐曹奇英《计算机应用与软件》2011,(9)

使用CUDA平台,提出在通用图形处理器(GPGPU)上实现并行的全选主元、归一和消去等操作,加速实现并行全选主元高斯-约当消去法求解线性方程组的一种基本方法。该方法在CPU上完成解向量的恢复。根据NVIDIA公司最新Fermi架构图形处理器的特点,通过一系列的优化设计,使通用GPGPU相对Intel最新架构CPU的加速比超过了6.5倍,比Intel上一代CPU的加速比超过了10倍。相似文献

7.

基于GPGPU的生物序列快速比对 总被引：1，自引：0，他引：1

马海晨韦刚吴百蜂《计算机工程》2012,38(4):241-244

在CPU-GPU异构平台下,提出一种高效的生物序列比对方案。该方案利用GPU的并行处理能力,通过对读延迟、写延迟、重组函数及数据传输进行优化,在OpenCL框架下重构Smith-Waterman算法,加快生物序列比对速度。实验结果证明,与CPU上传统的串行算法相比,该算法最高可获得约100倍的性能提升。相似文献

8.

GPU并行计算编程技术介绍

王泽寰王鹏《数据与计算发展前沿》2013,4(1):81-87

近年来GPU通用计算蓬勃发展。程序开发者和GPU通用计算应用程序的数量增长很快。针对不同的应用程序的要求和程序开发者不同的使用习惯,围绕着CUDA架构的 GPU,NVIDIA及其合作伙伴共同开发了很多种不同的编程技术。本文详细介绍了它们的特点和适用对象。希望可以帮助广大开发人员针对自己的编程习惯和程序要求选择最为合适的编程技术。相似文献

9.

通用图形处理器GPGPU的并行计算研究

张鹏博郭兵黄义纯曹亚波《单片机与嵌入式系统应用》2017,17(8)

随着图形处理器(GPU)从仅用来进行图形图像渲染,脱离成为并行计算平台通用图形处理器(GPGPU),其计算能力越来越强,本文在研究GPGPU体系结构的基础上对GPGPU并行计算线程调度进行深入研究,阐述了GPU线程调度原理,揭示了SIMT调度模式的不足.通过公式推导阐述了系统功耗与系统运行频率的关系. 相似文献

10.

基于GPGPU的实体表面实时提取

黎柏春杨建宇于天彪王宛山《计算机工程与设计》2014,(12):4273-4277

为实时提取三维实体表面,提出一种基于GPGPU并行计算的实体表面实时提取方法。在分析深度剥离算法原理和GPU图形绘制管线的基础上,给出在GPU上利用深度剥离算法实现实时提取三维实体表面的算法;通过OpenGL的高级着色语言GLSL控制GPU的图形绘制管线实现了该算法,给出其伪代码。以龙、叶轮和刀具扫描体的模型为应用实例验证了该算法效果良好,特别是对于刀具扫描体表面的提取,可满足实时性要求。相似文献

11.

MDx差分攻击算法改进及GPGPU上的有效实现 总被引：1，自引：0，他引：1

周林韩文报祝卫华王政《计算机学报》2010,33(7)

Hash函数广泛应用于商业、安全等领域,其中MDx系列Hash算法应用最为广泛.因此对MDx系列Hash算法的攻击在理论上和实际应用上都有重要的意义.自王小云教授提出差分攻击算法并攻破MD5、MD4等MDx系列算法以来,对该算法的研究日益受到关注.文中以攻击MD5的差分攻击算法为例,改进了Klima提出的MD5隧道差分攻击算法,分析其在GPGPU上实现的可行性和技术要求并在Visual studio 6.0的环境下利用CUDA语言开发完成.算法的CUDA程序在GeForce 9800 GX2平台下运行,平均每1.35s能找到一对MD5碰撞.通过同4核Core 2 Quad Q9000(2.0GHz)PC上的实现相比较,在GeForce 9800 GX2上的实现能达到11.5倍的性价比. 相似文献

12.

Carlos Reao Federico Silla Adrin Castell Antonio J. Pea Rafael Mayo Enrique S. Quintana‐Ortí Jos Duato 《Concurrency and Computation》2015,27(14):3746-3770

Graphics processing units (GPUs) are being increasingly embraced by the high‐performance computing community as an effective way to reduce execution time by accelerating parts of their applications. remote CUDA (rCUDA) was recently introduced as a software solution to address the high acquisition costs and energy consumption of GPUs that constrain further adoption of this technology. Specifically, rCUDA is a middleware that allows a reduced number of GPUs to be transparently shared among the nodes in a cluster. Although the initial prototype versions of rCUDA demonstrated its functionality, they also revealed concerns with respect to usability, performance, and support for new CUDA features. In response, in this paper, we present a new rCUDA version that (1) improves usability by including a new component that allows an automatic transformation of any CUDA source code so that it conforms to the needs of the rCUDA framework, (2) consistently features low overhead when using remote GPUs thanks to an improved new communication architecture, and (3) supports multithreaded applications and CUDA libraries. As a result, for any CUDA‐compatible program, rCUDA now allows the use of remote GPUs within a cluster with low overhead, so that a single application running in one node can use all GPUs available across the cluster, thereby extending the single‐node capability of CUDA. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

13.

Workflow of the Grover algorithm simulation incorporating CUDA and GPGPU

Xiangwen Lu Jiabin YuanWeiwei Zhang 《Computer Physics Communications》2013

The Grover quantum search algorithm, one of only a few representative quantum algorithms, can speed up many classical algorithms that use search heuristics. No true quantum computer has yet been developed. For the present, simulation is one effective means of verifying the search algorithm. In this work, we focus on the simulation workflow using a compute unified device architecture (CUDA). Two simulation workflow schemes are proposed. These schemes combine the characteristics of the Grover algorithm and the parallelism of general-purpose computing on graphics processing units (GPGPU). We also analyzed the optimization of memory space and memory access from this perspective. We implemented four programs on CUDA to evaluate the performance of schemes and optimization. Through experimentation, we analyzed the organization of threads suited to Grover algorithm simulations, compared the storage costs of the four programs, and validated the effectiveness of optimization. Experimental results also showed that the distinguished program on CUDA outperformed the serial program of libquantum on a CPU with a speedup of up to 23 times (12 times on average), depending on the scale of the simulation. 相似文献

14.

并行区域填充算法研究

安洛生王利敏《电脑与微电子技术》2014,(23):6-8

区域填充是图形处理中常用操作,利用目前多核CPU的优势和NVIDA显卡的通用计算能力,实现对指定区域进行并行填充的方法。算法利用多种子算法,采用多线程技术快速地完成填充,同时算法避免传统算法需要人为设置种子位置的缺点。完成后再对填充结果进行判断,丢弃无效的填充区域最终得到需要的结果。实验证明,对于比较大的图片多核CPU的加速性能明显。相似文献

15.

基于Win32平台上的PVM并行程序设计 总被引：4，自引：0，他引：4

张信一李代平章文《计算机应用研究》2004,21(5):102-104,108

着重介绍了在PVM平台上进行并行程序设计的方法,包括如何构造基于Win32平台上的PVM运行环境,进行任务和数据划分,并提出了一个Master／Slave结构的并行程序设计模式;最后给出一个并行计算在物探处理应用的例子。来对并行程序的设计方法进行概括性的说明。相似文献

16.

一种基于GPGPU的SIFT加速算法 总被引：3，自引：1，他引：3

杨天天鲁云萍张为华《电子技术应用》2015,41(1)

SIFT是目前应用最广泛的基于局部特征的图像特征提取算法之一,针对其运行速度制约其应用范围的问题,提出在图像处理器(GPGPU)上设计并实现将算法各核心模块映射到GPGPU的计算单元并针对GPUPU特性进行优化的SIFT并行加速算法。测试结果表明,基于GPGPU的SIFT并行算法相比于原始串行版本达到了118.2倍的加速,吞吐量达到了76.86图片/s,相比于已有的技术获得了明显的性能提升。相似文献

17.

Parallel multi-dimensional range query processing with R-trees on GPU

Jinwoong KimAuthor Vitae Sul-Gi KimAuthor VitaeBeomseok Nam 《Journal of Parallel and Distributed Computing》2013

The general purpose computing on graphics processing unit (GP-GPU) has emerged as a new cost effective parallel computing paradigm in high performance computing research that enables large amount of data to be processed in parallel. Large scale scientific data intensive applications have been playing an important role in modern high performance computing research. A common access pattern into such scientific data analysis applications is multi-dimensional range query, but not much research has been conducted on multi-dimensional range query on the GPU. Inherently multi-dimensional indexing trees such as R-Trees are not well suited for GPU environment because of its irregular tree traversal. Traversing irregular tree search path makes it hard to maximize the utilization of massively parallel architectures. In this paper, we propose a novel MPTS (Massively Parallel Three-phase Scanning) R-tree traversal algorithm for multi-dimensional range query, that converts recursive access to tree nodes into sequential access. Our extensive experimental study shows that MPTS R-tree traversal algorithm on NVIDIA Tesla M2090 GPU consistently outperforms traditional recursive R-trees search algorithm on Intel Xeon E5506 processors. 相似文献

18.

Real-time stereo on GPGPU using progressive multi-resolution adaptive windows 总被引：1，自引：0，他引：1

Yong Zhao Gabriel Taubin 《Image and vision computing》2011,29(6):420-432

We introduce a new GPGPU-based real-time dense stereo matching algorithm. The algorithm is based on a progressive multi-resolution pipeline which includes background modeling and dense matching with adaptive windows. For applications in which only moving objects are of interest, this approach effectively reduces the overall computation cost quite significantly, and preserves the high definition details. Running on an off-the-shelf commodity graphics card, our implementation achieves a 36 fps stereo matching on 1024 × 768 stereo video with a fine 256 pixel disparity range. This is effectively same as 7200 M disparity evaluations per second. For scenes where the static background assumption holds, our approach outperforms all published alternative algorithms in terms of the speed performance, by a large margin. We envision a number of potential applications such as real-time motion capture, as well as tracking, recognition and identification of moving objects in multi-camera networks. 相似文献

19.

A metaheuristic framework for stochastic combinatorial optimization problems based on GPGPU with a case study on the probabilistic traveling salesman problem with deadlines

Dennis Weyland Roberto Montemanni Luca Maria Gambardella 《Journal of Parallel and Distributed Computing》2013

In this work we propose a general metaheuristic framework for solving stochastic combinatorial optimization problems based on general-purpose computing on graphics processing units (GPGPU). This framework is applied to the probabilistic traveling salesman problem with deadlines (PTSPD) as a case study. Computational studies reveal significant improvements over state-of-the-art methods for the PTSPD. Additionally, our results reveal the huge potential of the proposed framework and sampling-based methods for stochastic combinatorial optimization problems. 相似文献