期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

总被引：1，自引：0，他引：1

M.J. Harvey G. De Fabritiis 《Computer Physics Communications》2011,(4):1093-1099

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, “Swan” for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance.

Program summary

Program title: SwanCatalogue identifier: AEIH_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GNU Public License version 2No. of lines in distributed program, including test data, etc.: 17 736No. of bytes in distributed program, including test data, etc.: 131 177Distribution format: tar.gzProgramming language: CComputer: PCOperating system: LinuxRAM: 256 MbytesClassification: 6.5External routines: NVIDIA CUDA, OpenCLNature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An alternative programming model promoted as an industry standard, OpenCL, provides similar capabilities to CUDA and is also supported on non-NVIDIA hardware (including multicore ×86 CPUs, AMD GPUs and IBM Cell processors). The adaptation of a program from CUDA to OpenCL is relatively straightforward but laborious. The Swan tool facilitates this conversion.Solution method:Swan performs a translation of CUDA kernel source code into an OpenCL equivalent. It also generates the C source code for entry point functions, simplifying kernel invocation from the host program. A concise host-side API abstracts the CUDA and OpenCL APIs. A program adapted to use Swan has no dependency on the CUDA compiler for the host-side program. The converted program may be built for either CUDA or OpenCL, with the selection made at compile time.Restrictions: No support for CUDA C++ featuresRunning time: Nominal 相似文献

2.

改进的主曲线算法在指纹骨架提取中的应用 总被引：1，自引：0，他引：1

马驰张红云苗夺谦《计算机工程与应用》2010,46(16):170-173

在指纹识别系统中,指纹骨架的提取是一个重要而困难的环节,传统的细化算法得到的骨架很容易受到噪声的干扰。因此,在研究指纹数据特点和Kégl的主曲线算法的基础上,提出了一种改进的主曲线算法。实验结果表明：改进算法与传统算法相比有着更好的效率和效果,它所提取的指纹骨架包含更多的信息且具有更高的准确性、可靠性和抗噪声性。相似文献

3.

鲁棒的二值图像并行细化算法 总被引：4，自引：0，他引：4

包建军樊菁《计算机辅助工程》2006,15(4):43-46

通过分析两种典型的并行细化算法,提出一种新的增强并行细化算法（Enhanced Parallel Thinning Algorithm,EPTA）．经过大量对比实验表明,新算法能很好解决斜线信息丢失、冗余像素和多余枝杈问题,且效率高、鲁棒性强．相似文献

4.

基于骨架的三维网格局部编辑 总被引：1，自引：0，他引：1

段德全李俊芬《计算机工程与应用》2006,42(33):88-90,99

复杂网格的编辑是三维动画设计的关键技术,已经出现了多种比较成功的方案和算法。在研究了三维网格的骨架生成及其优化方法的基础上,提出了一种基于骨架的局部网格编辑算法。先由用户确定编辑区域,绘出与变形后的骨架相对应的编辑曲线,再根据网格顶点与骨架曲线及编辑曲线之间的对应关系,实现网格结点的平移和旋转等编辑操作。实验证明,该算法的实现具有编辑直观、易于控制等特点,能够很好地应用于网格的局部特征编辑。相似文献

5.

Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator

Mark Gardner Paul SathreWu-chun Feng Gabriel Martinez 《Parallel Computing》2013

The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-effective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU). 相似文献

6.

基于生成树的图像完全细化算法 总被引：2，自引：3，他引：2

李甦谭永龙《计算机工程与设计》2006,27(21):4006-4007,4070

图像细化是图像处理的重要环节,已有的图像细化算法较多,但都存在一些缺陷,限制了算法的使用范围。提出一种基于生成树的图像细化算法,对原图像运用形态学细化算法预处理,对中间结果的连通分支分别建立生成树,利用树的结构特征对图像中各连通分支逐个细化。对指纹图像的实验结果表明,该算法能使图像得到完全细化并能有效的去除毛刺,减小噪声干扰,还能避免交叉点处连通度冗余现象,有较强的适应性。相似文献

7.

三维图像骨架化方法综述

韩国强田绪红李志垣司徒志远《小型微型计算机系统》2007,28(9):1695-1699

首先介绍了三维（3D）图像骨架化相关问题描述与基本概念,接着从原理、实现技术及详细分类等方面综述了常用的三类3D图像骨架化方法,同时从算法的实现难易程度、骨架特性及运算速度等方面对三类方法进行了比较与分析.最后展望了3D骨架化方法今后的研究方向. 相似文献

8.

Thinning and segmenting handwritten characters by line following

Claude Chouinard Réjean Plamondon 《Machine Vision and Applications》1992,5(3):185-197

This article presents a new thinning algorithm particularly well suited to handwriting characters or engineering drawing images. The line following scheme is used with major improvements to reduce the distortion at intersections. The new algorithm uses a thinning window to detect the shape and type of each intersection it is about to thin. This added information allows for a much more accurate thinning process. The algorithm also segments the skeleton into line segments as it is generated. These lines correlate partly with the strokes that produced the image. To analyze the performance of the new algorithm, a comparative study of the skeletons is performed over speed and quality criteria. This study shows that the algorithm reduces the distortion at the line intersections while remaining fast. 相似文献

9.

复杂带状图像的快速三角剖分与骨架化算法 总被引：3，自引：1，他引：3

杨义军孟祥旭杨承磊曾薇钟声伟《计算机辅助设计与图形学学报》2003,15(10):1270-1274

为了快速准确地计算带状图像的骨架，以便对其进行识别、重建等处理，提出一种基于快速三角剖分的骨架化算法，首先通过对带状图像边界的近似多边形进行三角剖分，生成一系列具有拓扑关系的三角形，然后根据三角形的类型生成局部骨架，最后连接生成整幅带状图像的骨架．该算法充分利用了图像的整体与局部信息，且与分辨率无关。相似文献

10.

一种实用并行细化算法及其实现 总被引：7，自引：0，他引：7

吕岳施鹏飞《计算机工程与设计》2000,21(4):53-56

介绍一种实用并行细化算尘,对细化模板和细化条件作了分析,实验结果表明,该细化算法获得取的图象骨架不仅避免了过度腐蚀,还具有良好的连通性。相似文献

11.

基于GPU的分子动力学模拟并行化及实现

费辉张云泉王可许亚武《计算机科学》2011,38(9):276-278

分子动力学模拟作为获得液体、固体性质的重要计算手段,广泛应用于化学、物理、生物、医药、材料等众多领域。模拟体系的复杂性和精确性的需求,使得计算量巨大,耗费时间长。并行计算是加速大规模分子动力学模拟的霍要途径。GPU以几百GFlops甚至上I}Flops的运算能力,为分子动力学模拟等的计算密集型应用提供了新的加速方案。提出了一种基于GPU的分子动力学模拟并行算法—oApT-AD,并在OpenCL和CUDA框架下加以实现。,r}能测试显示,在Tesla C1060显卡上,该算法在OpcnCL框架下的实现相对于CPU的串行实现,最高达到120倍加遥比。通过对比发现,该算法在CUDA上的性能与()pcnCI、基本相当。同时,该算法还可以扩展到两块及以上的GPU上,具有良好的可扩展性。相似文献

12.

Numerical modeling of gravitational wave sources accelerated by OpenCL

Gaurav Khanna Justin McKennon 《Computer Physics Communications》2010,181(9):1605-1611

In this work, we make use of the OpenCL framework to accelerate an EMRI modeling application using the hardware accelerators - Cell BE and Tesla CUDA GPU. We describe these compute technologies and our parallelization approach in detail, present our performance results, and then compare them with those from our previous implementations based on the native CUDA and Cell SDKs. The OpenCL framework allows us to execute identical source-code on both architectures and yet obtain strong performance gains that are comparable to what can be derived from the native SDKs. 相似文献

13.

光线追踪的OpenCL加速实现研究

黄涛《计算机与现代化》2011,(2):65-69

目前GPU计算能力让kD-Tree划分实时场景光线追踪并行算法的执行变得更具有可行性。图像处理器(GPU)高效应用于多边形的渲染,GPU内部单元的可编程性已经让其广泛应用于多边形渲染以外的领域。本文详细描述使用OpenCL的kD-Tree遍历算法,对运算占主要部分的相交测试作出改进,同时提高了GPU计算能力与存储器的利用率,从而提升了光线追踪算法效率。相似文献

14.

Srgio E.D. Dias Abel J.P. Gomes 《Concurrency and Computation》2011,23(17):2280-2291

Computing the surface of a molecule (e.g., a protein) plays an important role in the analysis of its geometric structure as needed in the study of interactions between proteins, protein folding, protein docking, and so forth. There are a number of algorithms for the computation of molecular surfaces and their triangulations, but only a few take advantage of graphics processing unit computing. This paper describes a graphics processing unit‐based marching cubes algorithm to triangulate molecular surfaces. In the end of the paper, a performance analysis of three implementations (i.e., serial CPU, CUDA, and OpenCL) of the marching cubes‐based triangulation algorithm takes place as a way to realize beforehand how molecular surfaces can be rendered in real‐time in the future. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

15.

A topology-preserving parallel 3D thinning algorithm for extracting the curve skeleton

Wenjie XieAuthor VitaeRobert P. ThompsonAuthor Vitae Renato PerucchioAuthor Vitae 《Pattern recognition》2003,36(7):1529-1544

We introduce a new topology-preserving 3D thinning procedure for deriving the curve voxel skeleton from 3D binary digital images. Based on a rigorously defined classification procedure, the algorithm consists of sequential thinning iterations each characterized by six parallel directional sub-iterations followed by a set of sequential sub-iterations. The algorithm is shown to produce concise and geometrically accurate 3D curve skeletons. The thinning algorithm is also insensitive to object rotation and only moderately sensitive to noise. Although this thinning procedure is valid for curve skeleton extraction of general elongated objects, in this paper, we specifically discuss its application to the orientation modeling of trabecular biological tissues. 相似文献

16.

Fast thinning algorithm for binary images

CJ Ammann AG Sartori-Angus 《Image and vision computing》1985,3(2):71-79

A fast thinning algorithm is proposed which achieves its increase in speed by applying any existing thinning algorithm to a greatly reduced amount of image information. The procedure compacts the image, applies an optimal thresholding routine, thins the result, and then expands the skeleton to its original scale. Results of testing the algorithm on a number of images are shown. 相似文献

17.

使用OpenCL技术的影像快速畸变纠正方法在异构平台上的应用分析

韦博文李涛李广宇汪致恒何沐师悦龄刘路遥张瑞《计算机科学》2016,43(Z11):167-169, 196

针对海量遥感数据应用中日益显著的处理效率低下和计算瓶颈问题,基于通用计算机图形处理单元的编程开发使用OpenCL并行处理技术对遥感数据处理及其过程进行加速,旨在为遥感影像大数据处理提供一条更为高效的途径。在不同显卡平台上对影像畸变纠正实施并行处理,结果表明,OpenCL技术在提高影像畸变纠正的速度方面作用显著,可取得29.1倍的最高加速效果;与CUDA并行处理技术的交叉验证进一步凸显了OpenCL技术在异构平台上实施并行处理时所具有的通用性的优势。相似文献

18.

3D自由曲线的绘制及智能修改算法

韩丽唐棣《计算机工程与设计》2006,27(24):4755-4758

目前有关曲线及曲面的生成，大量的研究致力于从控点和节点的设置到曲率、切线矢量进一步精确的调整．然而其繁琐的几何参数计算、复杂的数学概念局限了设计者的使用，尤其他们更不适合早期的概念设计。此研究描述了基于笔输入的自由3D曲线的绘制．识别及直观的修改算法，此算法支持任意3D自由曲线的绘制，通过优化的采样机制自动识别产生适应性的B样条逼近曲线，它扩展了基于约束的曲线的生成方法，进而提出的简单的局部修改技术，引入了比例因子及区域间距的控制，有效的解决了曲线的光滑性，并通过实践检验了它们的高效性。相似文献

19.

三维可视化技术微型飞行器仿真中的应用

侯宇方宗德刘岚《计算机应用研究》2004,21(5):181-182,188

首先简要介绍了三维可视化技术在微型飞行器飞行仿真中的作用及其意义,重点讨论了基于三维可视化技术的飞行器的运动仿真及基于分形技术的地形仿真,在此基础之上讨论了飞行器虚拟飞行过程中的静态与动态模拟方法。仿真软件中的三维图形实时显示采用Visual C 和OpenGL共同完成。相似文献

20.

基于GPU的快速三维医学图像刚性配准技术* 总被引：3，自引：1，他引：2

秦安徐建冯前进孟晓林陈武凡《计算机应用研究》2010,27(3):1198-1200

自动三维配准将多个图像数据映射到同一坐标系中,在医学影像分析中有广泛的应用。但现有主流三维刚性配准算法(如FLIRT)速度较慢,2563大小数据的刚性配准需要300 s左右,不能满足快速临床应用的需求。为此提出了一种基于CUDA(compute unified device architecture)架构的快速三维配准技术,利用GPU(gra-phic processing unit)并行计算实现配准中的坐标变换、线性插值和相似性测度计算。临床三维医学图像上的实验表明,该技术在保持配准精度的前提下将速度提相似文献