期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast numerical scheme for gradient vector flow computation using a multigrid method

Han X. Xu C. Prince J.L. 《Image Processing, IET》2007,1(1):48-55

The gradient vector flow (GVF) deformable model was introduced by Xu and Prince as an effective approach to overcome the limited capture range problem of classical deformable models and their inability to progress into boundary concavities. It has found many important applications in the area of medical image processing. The simple iterative method proposed in the original work on GVF, however, is slow to converge. A new multigrid method is proposed for GVF computation on 2D and 3D images. Experimental results show that the new implementation significantly improves the computational speed by at least an order of magnitude, which facilitates the application of GVF deformable models in processing large medical images 相似文献

2.

基于GVF的骨架snake模型 总被引：1，自引：0，他引：1

王洪剑孙志宏彭思龙《计算机应用》2004,24(9):1-3,43

提出了一种基于梯度矢量流(GVF)的快速收敛骨架snake算法。首先利用GVF变换后凹腔内外力的特点检测出物体的骨架,然后以骨架作为指引修改其外力的方向和大小,以达到快速收敛。该算法不但能解决GVF不能解决的深凹腔问题,而且在速度上也远远超过GVF。相似文献

3.

积分图像的快速GPU计算 总被引：1，自引：0，他引：1

王志国王贵锦施陈博苗权林行刚《计算机应用研究》2011,28(10):3913-3916

提出了一种在GPU上计算积分图像的方法。积分图像可通过对输入图像的行实行前缀加法后再对列实行前缀加法构建。前缀加法是指对于一个数组,求取起始位置至每一个下标位置的数组元素的和的操作。提出了分段前缀加法原理,当将其运用到GPU图像积分时有如下优点：减少了线程间的数据依赖;降低了内存访问开销;提高了GPU线程的工作效率。提出的算法相对以前算法在速度上提高了约两倍。该算法可运用到使用积分图像的图像处理算法的GPU加速中。相似文献

4.

Almost optimal column-wise prefix-sum computation on the GPU

Hiroki Tokura Toru Fujita Koji Nakano Yasuaki Ito Jacir L. Bordim 《The Journal of supercomputing》2018,74(4):1510-1521

Row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a one-dimensional array can be computed efficiently on the GPU. Hence, row-wise prefix-sums of a matrix can also be computed efficiently on the GPU by executing this prefix-sum algorithm for every row in parallel. However, the same approach does not work well for computing column-wise prefix-sums due to inefficient stride memory access to the global memory is performed. The main contribution of this paper is to present an almost optimal column-wise prefix-sum algorithm on the GPU. Quite surprisingly, experimental results using NVIDIA TITAN X show that our column-wise prefix-sum algorithm runs only 2–6% slower than matrix duplication. Thus, our column-wise prefix-sum algorithm is almost optimal. 相似文献

5.

Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster

Wang Xian Aoki Takayuki 《Parallel Computing》2011,37(9):521-535

GPGPU has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of the lattice Boltzmann method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The GPU code runs on the multi-node GPU cluster TSUBAME of Tokyo Institute of Technology, in which a total of 680 GPUs of NVIDIA Tesla are equipped. For multi-GPU computation, domain partitioning method is used to distribute computational load to multiple GPUs and GPU-to-GPU data transfer becomes severe overhead for the total performance. Comparison and analysis were made among the parallel results by 1D, 2D and 3D domain partitionings. As a result, with 384 × 384 × 384 mesh system and 96 GPUs, the performance by 3D partitioning is about 3-4 times higher than that by 1D partitioning. The performance curve is deviated from the idealistic line due to the long communicational time between GPUs. In order to hide the communication time, we introduced the overlapping technique between computation and communication, in which the data transfer process and computation were done in two streams simultaneously. Using 8-96 GPUs, the performances increase by a factor about 1.1-1.3 with a overlapping mode. As a benchmark problem, a large-scaled computation of a flow around a sphere at Re = 13,000 was carried on successfully using the mesh system 2000 × 1000 × 1000 and 100 GPUs. For such a computation with 2 Giga lattice nodes, 6.0 h were used for processing 100,000 time steps. Under this condition, the computational time (2.79 h) and the data communication time (3.06 h) are almost the same. 相似文献

6.

基于梯度向量流的医学图像自动分割 总被引：3，自引：0，他引：3

何源罗予频胡东成《计算机应用》2007,27(1):149-151

提出了一种基于梯度向量流的自动图像分割算法，该算法首先将梯度向量流场转化为一个标量场，该标量场能够显著简化种子点选取和区域增长的步骤。在得到图像的初始分割后，再使用基于区域邻接图的算法来将相似区域合并得到最终分割结果。试验结果表明，该算法能够有效地解决医学图像中多目标区域的自动分割问题。相似文献

7.

Efficient numerical schemes for gradient vector flow

Djamal Boukerroui 《Pattern recognition》2012,45(1):626-636

Since its publication more than 10 years ago, the gradient vector flow (GVF) technique has been used and adapted to various models and problems. Its effectiveness has greatly contributed to its popularity. The main drawback of GVF and its generalisation, however, is their expensive computation load and its consequence on the capture range. In this work, we propose and compare different efficient numerical schemes to solve the GVF and its generalisations. 相似文献

8.

图形硬件通用计算技术的应用研究 总被引：2，自引：0，他引：2

张杨诸昌钤何太军《计算机应用》2005,25(9):2192-2195

在通用计算的图形硬件加速研究中,综合了在OPENGL体系下的计算模型。通过实验,测试了该计算结构的性能并分析了提高计算性能的一些方法。在此基础上,介绍一种基于GPU的并行计算二维离散余弦变换方法。该方法可在GPU上通过一遍绘制,对一幅图像1至4个颜色通道,同时进行8×8大小像素块的离散余弦变换。实验表明在该实验硬件基础上,采用GPU加速的并行离散余弦变换,可比相同算法的CPU实现提高数百倍。相似文献

9.

An optimized approach to histogram computation on GPU

Juan Gómez-Luna José María González-Linares José Ignacio Benavides Nicolás Guil 《Machine Vision and Applications》2013,24(5):899-908

A histogram is a compact representation of the distribution of data in an image with a full range of applications in diverse fields. Histogram generation is an inherently sequential operation where every pixel votes in a reduced set of bins. This makes finding efficient parallel implementations very desirable but challenging, because on graphics processing units thousands of threads may be atomically updating a short number of histogram bins. Under these circumstances, collisions among threads will be very frequent and such collisions will serialize thread execution, seriously damaging the performance. In this paper we propose a highly optimized approach to histogram calculation, which tackles such performance bottlenecks. It uses histogram replication for eliminating position conflicts, padding to reduce bank conflicts, and an improved access to input data called interleaved read access. Our so-called ${\mathcal{R}}$ -per-block approach to histogram calculation has been successfully compared to the main state-of-the-art works using four histogram-based image processing kernels and two real image databases. Results show that our proposal is between 1.4 and 15.7 faster than every previous implementation for histograms of up to 4,096 bins. 相似文献

10.

基于图形处理器的通用计算模式* 总被引：4，自引：4，他引：0

王磊张春燕《计算机应用研究》2009,26(6):2356-2358

针对GPU图形处理的特点,分析其应用于通用计算的并行处理机制和数据映射,提出了一种GPU通用计算模式的映射机制和一般性设计方法,并针对GPU的吞吐量、数据流处理能力和基本数学运算能力等进行性能测试,为GPU通用计算的算法设计、实现和性能优化提供参考依据。相似文献

11.

Voxelized Minkowski sum computation on the GPU with robust culling

Wei Li Sara McMains 《Computer aided design》2011,43(10):1270-1283

We present a new approach for computing the voxelized Minkowski sum (excluding any enclosed voids) of two polyhedral objects using programmable Graphics Processing Units (GPUs). We first cull out surface primitives that will not contribute to the final boundary of the Minkowski sum, analyzing and adaptively bounding the rounding errors of the culling algorithm to solve the floating point error problem. The remaining surface primitives are then rendered to depth textures along six orthogonal directions to generate an initial solid voxelization of the Minkowski sum. Finally we employ fast flood fill to find all the outside voxels. We generate both solid and surface voxelizations of Minkowski sums without enclosed voids and support high volumetric resolution of 1024³ with low video memory cost. The whole algorithm runs on the GPU and is at least one order of magnitude faster than existing boundary representation (B-rep) based algorithms. It avoids the large number of 3D Boolean operations needed in most existing algorithms and is easy to implement. The voxelized Minkowski sums can be used in a variety of applications including motion planning and penetration depth computation. 相似文献

12.

Mean shift based gradient vector flow for image segmentation

Huiyu Zhou Xuelong Li Gerald Schaefer M. Emre Celebi Paul Miller 《Computer Vision and Image Understanding》2013,117(9):1004-1016

In recent years, gradient vector flow (GVF) based algorithms have been successfully used to segment a variety of 2-D and 3-D imagery. However, due to the compromise of internal and external energy forces within the resulting partial differential equations, these methods may lead to biased segmentation results. In this paper, we propose MSGVF, a mean shift based GVF segmentation algorithm that can successfully locate the correct borders. MSGVF is developed so that when the contour reaches equilibrium, the various forces resulting from the different energy terms are balanced. In addition, the smoothness constraint of image pixels is kept so that over- or under-segmentation can be reduced. Experimental results on publicly accessible datasets of dermoscopic and optic disc images demonstrate that the proposed method effectively detects the borders of the objects of interest. 相似文献

13.

基于GPU的多重网格Navier-Stokes解算器并行优化方法研究

刘冰陆忠华李新亮胡晓东《数据与计算发展前沿》2013,4(3):56-67

随着工业计算需求的激增,计算流体力学 (Computational Fluid Dynamics, CFD) 学科对计算效率问题越来越重视。作者基于自行开发的 Navier-Stokes 解算器,引入多重网格加速收敛算法,并结合NVIDIA GPU 计算平台,从数值方法和高性能计算两个方面为 CFD 实现加速。数值加速算例测试结果表明,基于多重网格算法的 GPU 解算器相对 CPU 版本代码双精度可获得 45 倍以上的加速。相似文献

14.

Automatic tongue image segmentation based on gradient vector flow and region merging

Jifeng Ning David Zhang Chengke Wu Feng Yue 《Neural computing & applications》2012,21(8):1819-1826

This paper presents a region merging-based automatic tongue segmentation method. First, gradient vector flow is modified as a scalar diffusion equation to diffuse the tongue image while preserving the edge structures of tongue body. Then the diffused tongue image is segmented into many small regions by using the watershed algorithm. Third, the maximal similarity-based region merging is used to extract the tongue body area under the control of tongue marker. Finally, the snake algorithm is used to refine the region merging result by setting the extracted tongue contour as the initial curve. The proposed method is qualitatively tested on 200 images by traditional Chinese medicine practitioners and quantitatively tested on 50 tongue images using the receiver operating characteristic analysis. Compared with the previous active contour model-based bi-elliptical deformable contour algorithm, the proposed method greatly enhances the segmentation performance, and it could reliably extract the tongue body from different types of tongue images. 相似文献

15.

Continuous force field analysis for generalized gradient vector flow field

Annupan Rodtook Author Vitae 《Pattern recognition》2010,43(10):3522-159

We propose a modification of the generalized gradient vector flow field techniques based on a continuous force field analysis. At every iteration the generalized gradient vector flow method obtains a new, improved vector field. However, the numerical procedure always employs the original image to calculate the gradients used in the source term. The basic idea developed in this paper is to use the resulting vector field to obtain an improved edge map and use it to calculate a new gradient based source term. The improved edge map is evaluated by new continuous force field analysis techniques inspired by a preceding discrete version. The approach leads to a better convergence and better segmentation accuracy as compared to several conventional gradient vector flow type methods. 相似文献

16.

GPU上实现的向量点积的性能分析

郭雷刘进锋《计算机工程与应用》2012,48(2):201-202

CUDA是一种较为简便的利用GPU进行通用计算的技术。研究了GPU上基于CUDA的几种向量点积算法,比较、分析了每种算法的性能。实验表明,GPU上最快的算法比CPU上的算法快了约7倍。相似文献

17.

基于改进梯度向量流与最大互信息的图像配准

王秀友汪继文孙道德王峰《计算机工程与设计》2007,28(23):5677-5679

基于最大互信息的多模医学图像配准已成为医学图像处理领域的热点.低阶互信息仅关注灰度的统计特性,忽略了空间信息,因此采用图像梯度向量流的空间信息与最大互信息组合的方法来实现医学图像配准.实验表明,该方法可以大大提高配准速度和精度,降低误配准率. 相似文献

18.

Efficient data partitioning for the GPU computation of moment functions

Manuel Jesús Martín Requena Pablo Moscato Manuel Ujaldón 《Journal of Parallel and Distributed Computing》2014

In our previous work, we have provided tools for an efficient characterization of biomedical images using Legendre and Zernike moments, showing their relevance as biomarkers for classifying image tiles coming from bone tissue regeneration studies (Ujaldón, 2009) [24]. As part of our research quest for efficiency, we developed methods for accelerating those computations on GPUs (Martín-Requena and Ujaldón, 2011) and . This new stage of our work focuses on the efficient data partitioning to optimize the execution on many-cores and clusters of GPUs to attain gains up to three orders of magnitude when compared to the execution on multi-core CPUs of similar age and cost using 1 Mpixel images. We deploy a successive and successful chain of optimizations which exploit symmetries in trigonometric functions and access patterns to image pixels which are effectively combined with massive data parallelism on GPUs to enable (1) real-time processing for our set of input biomedical images, and (2) the use of high-resolution images in clinical practice. 相似文献

19.

基于梯度向量流场的颅脑内胼胝体的分割研究

下载免费PDF全文

汤敏《计算机工程与应用》2008,44(25):215-218

介绍一种基于梯度向量流场的医学图像分割方法。无论初始轮廓线位于真实边界以内或以外,变形轮廓都具有较宽的作用范围以及良好的收敛性,经过迭代算法后可以得到与真实图像边界十分接近的最终变形轮廓。此外,该方法对噪声图像也表现出良好的鲁棒性,特别适用于医学图像分割场合。将该方法应用于MRI图像上胼胝体的分割提取,实验结果表明,与传统手工方法相比,应用梯度向量流场方法提取出的胼胝体轮廓清晰,效果良好,而且耗时大为降低,这在临床应用中具有积极意义。相似文献

20.

Motion vector extrapolation for parallel motion estimation on GPU

Yi Gao Jun Zhou 《Multimedia Tools and Applications》2014,68(3):701-715

The powerful parallel computing ability of Graphics Processing Unit (GPU) has shown its striking superiority for motion estimation acceleration in conventional hybrid video encoding process. Unfortunately, the motion information of the neighboring macroblocks is not available for current macroblock, such that parallel motion estimation using GPU is not very favored. To tackle this problem while achieving high acceleration ration, motion vector cost is always ignored in most existing solutions, which inevitably causes severe rate-distortion loss. In this paper, a novel motion vector extrapolation based approach (MVEA) is presented for enhancing rate-distortion performance of parallel motion estimation on GPU, which is based on the study of motion vector recovery strategies for frame loss error concealment. Furthermore, the efficient implementation of MVEA on Computing Unified Device Architecture (CUDA) is also investigated. Simulation results show that MVEA can achieve a maximum peak Signal-to-Noise ratio enhancement of 0.8 dB with ignorable computational cost increase. 相似文献