共查询到20条相似文献,搜索用时 0 毫秒
1.
The gradient vector flow (GVF) deformable model was introduced by Xu and Prince as an effective approach to overcome the limited capture range problem of classical deformable models and their inability to progress into boundary concavities. It has found many important applications in the area of medical image processing. The simple iterative method proposed in the original work on GVF, however, is slow to converge. A new multigrid method is proposed for GVF computation on 2D and 3D images. Experimental results show that the new implementation significantly improves the computational speed by at least an order of magnitude, which facilitates the application of GVF deformable models in processing large medical images 相似文献
2.
3.
4.
Hiroki Tokura Toru Fujita Koji Nakano Yasuaki Ito Jacir L. Bordim 《The Journal of supercomputing》2018,74(4):1510-1521
Row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a one-dimensional array can be computed efficiently on the GPU. Hence, row-wise prefix-sums of a matrix can also be computed efficiently on the GPU by executing this prefix-sum algorithm for every row in parallel. However, the same approach does not work well for computing column-wise prefix-sums due to inefficient stride memory access to the global memory is performed. The main contribution of this paper is to present an almost optimal column-wise prefix-sum algorithm on the GPU. Quite surprisingly, experimental results using NVIDIA TITAN X show that our column-wise prefix-sum algorithm runs only 2–6% slower than matrix duplication. Thus, our column-wise prefix-sum algorithm is almost optimal. 相似文献
5.
GPGPU has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of the lattice Boltzmann method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The GPU code runs on the multi-node GPU cluster TSUBAME of Tokyo Institute of Technology, in which a total of 680 GPUs of NVIDIA Tesla are equipped. For multi-GPU computation, domain partitioning method is used to distribute computational load to multiple GPUs and GPU-to-GPU data transfer becomes severe overhead for the total performance. Comparison and analysis were made among the parallel results by 1D, 2D and 3D domain partitionings. As a result, with 384 × 384 × 384 mesh system and 96 GPUs, the performance by 3D partitioning is about 3-4 times higher than that by 1D partitioning. The performance curve is deviated from the idealistic line due to the long communicational time between GPUs. In order to hide the communication time, we introduced the overlapping technique between computation and communication, in which the data transfer process and computation were done in two streams simultaneously. Using 8-96 GPUs, the performances increase by a factor about 1.1-1.3 with a overlapping mode. As a benchmark problem, a large-scaled computation of a flow around a sphere at Re = 13,000 was carried on successfully using the mesh system 2000 × 1000 × 1000 and 100 GPUs. For such a computation with 2 Giga lattice nodes, 6.0 h were used for processing 100,000 time steps. Under this condition, the computational time (2.79 h) and the data communication time (3.06 h) are almost the same. 相似文献
6.
基于梯度向量流的医学图像自动分割 总被引:3,自引:0,他引:3
提出了一种基于梯度向量流的自动图像分割算法,该算法首先将梯度向量流场转化为一个标量场,该标量场能够显著简化种子点选取和区域增长的步骤。在得到图像的初始分割后,再使用基于区域邻接图的算法来将相似区域合并得到最终分割结果。试验结果表明,该算法能够有效地解决医学图像中多目标区域的自动分割问题。 相似文献
7.
Djamal Boukerroui 《Pattern recognition》2012,45(1):626-636
Since its publication more than 10 years ago, the gradient vector flow (GVF) technique has been used and adapted to various models and problems. Its effectiveness has greatly contributed to its popularity. The main drawback of GVF and its generalisation, however, is their expensive computation load and its consequence on the capture range. In this work, we propose and compare different efficient numerical schemes to solve the GVF and its generalisations. 相似文献
8.
9.
Juan Gómez-Luna José María González-Linares José Ignacio Benavides Nicolás Guil 《Machine Vision and Applications》2013,24(5):899-908
A histogram is a compact representation of the distribution of data in an image with a full range of applications in diverse fields. Histogram generation is an inherently sequential operation where every pixel votes in a reduced set of bins. This makes finding efficient parallel implementations very desirable but challenging, because on graphics processing units thousands of threads may be atomically updating a short number of histogram bins. Under these circumstances, collisions among threads will be very frequent and such collisions will serialize thread execution, seriously damaging the performance. In this paper we propose a highly optimized approach to histogram calculation, which tackles such performance bottlenecks. It uses histogram replication for eliminating position conflicts, padding to reduce bank conflicts, and an improved access to input data called interleaved read access. Our so-called ${\mathcal{R}}$ -per-block approach to histogram calculation has been successfully compared to the main state-of-the-art works using four histogram-based image processing kernels and two real image databases. Results show that our proposal is between 1.4 and 15.7 faster than every previous implementation for histograms of up to 4,096 bins. 相似文献
10.
基于图形处理器的通用计算模式* 总被引:4,自引:4,他引:0
针对GPU图形处理的特点,分析其应用于通用计算的并行处理机制和数据映射,提出了一种GPU通用计算模式的映射机制和一般性设计方法,并针对GPU的吞吐量、数据流处理能力和基本数学运算能力等进行性能测试,为GPU通用计算的算法设计、实现和性能优化提供参考依据。 相似文献
11.
We present a new approach for computing the voxelized Minkowski sum (excluding any enclosed voids) of two polyhedral objects using programmable Graphics Processing Units (GPUs). We first cull out surface primitives that will not contribute to the final boundary of the Minkowski sum, analyzing and adaptively bounding the rounding errors of the culling algorithm to solve the floating point error problem. The remaining surface primitives are then rendered to depth textures along six orthogonal directions to generate an initial solid voxelization of the Minkowski sum. Finally we employ fast flood fill to find all the outside voxels. We generate both solid and surface voxelizations of Minkowski sums without enclosed voids and support high volumetric resolution of 10243 with low video memory cost. The whole algorithm runs on the GPU and is at least one order of magnitude faster than existing boundary representation (B-rep) based algorithms. It avoids the large number of 3D Boolean operations needed in most existing algorithms and is easy to implement. The voxelized Minkowski sums can be used in a variety of applications including motion planning and penetration depth computation. 相似文献
12.
Huiyu Zhou Xuelong Li Gerald Schaefer M. Emre Celebi Paul Miller 《Computer Vision and Image Understanding》2013,117(9):1004-1016
In recent years, gradient vector flow (GVF) based algorithms have been successfully used to segment a variety of 2-D and 3-D imagery. However, due to the compromise of internal and external energy forces within the resulting partial differential equations, these methods may lead to biased segmentation results. In this paper, we propose MSGVF, a mean shift based GVF segmentation algorithm that can successfully locate the correct borders. MSGVF is developed so that when the contour reaches equilibrium, the various forces resulting from the different energy terms are balanced. In addition, the smoothness constraint of image pixels is kept so that over- or under-segmentation can be reduced. Experimental results on publicly accessible datasets of dermoscopic and optic disc images demonstrate that the proposed method effectively detects the borders of the objects of interest. 相似文献
13.
随着工业计算需求的激增,计算流体力学 (Computational Fluid Dynamics, CFD) 学科对计算效率问题越来越重视。作者基于自行开发的 Navier-Stokes 解算器,引入多重网格加速收敛算法,并结合NVIDIA GPU 计算平台,从数值方法和高性能计算两个方面为 CFD 实现加速。数值加速算例测试结果表明,基于多重网格算法的 GPU 解算器相对 CPU 版本代码双精度可获得 45 倍以上的加速。 相似文献
14.
This paper presents a region merging-based automatic tongue segmentation method. First, gradient vector flow is modified as a scalar diffusion equation to diffuse the tongue image while preserving the edge structures of tongue body. Then the diffused tongue image is segmented into many small regions by using the watershed algorithm. Third, the maximal similarity-based region merging is used to extract the tongue body area under the control of tongue marker. Finally, the snake algorithm is used to refine the region merging result by setting the extracted tongue contour as the initial curve. The proposed method is qualitatively tested on 200 images by traditional Chinese medicine practitioners and quantitatively tested on 50 tongue images using the receiver operating characteristic analysis. Compared with the previous active contour model-based bi-elliptical deformable contour algorithm, the proposed method greatly enhances the segmentation performance, and it could reliably extract the tongue body from different types of tongue images. 相似文献
15.
Annupan Rodtook Author Vitae 《Pattern recognition》2010,43(10):3522-159
We propose a modification of the generalized gradient vector flow field techniques based on a continuous force field analysis. At every iteration the generalized gradient vector flow method obtains a new, improved vector field. However, the numerical procedure always employs the original image to calculate the gradients used in the source term. The basic idea developed in this paper is to use the resulting vector field to obtain an improved edge map and use it to calculate a new gradient based source term. The improved edge map is evaluated by new continuous force field analysis techniques inspired by a preceding discrete version. The approach leads to a better convergence and better segmentation accuracy as compared to several conventional gradient vector flow type methods. 相似文献
16.
CUDA是一种较为简便的利用GPU进行通用计算的技术。研究了GPU上基于CUDA的几种向量点积算法,比较、分析了每种算法的性能。实验表明,GPU上最快的算法比CPU上的算法快了约7倍。 相似文献
17.
基于最大互信息的多模医学图像配准已成为医学图像处理领域的热点.低阶互信息仅关注灰度的统计特性,忽略了空间信息,因此采用图像梯度向量流的空间信息与最大互信息组合的方法来实现医学图像配准.实验表明,该方法可以大大提高配准速度和精度,降低误配准率. 相似文献
18.
Manuel Jesús Martín Requena Pablo Moscato Manuel Ujaldón 《Journal of Parallel and Distributed Computing》2014
In our previous work, we have provided tools for an efficient characterization of biomedical images using Legendre and Zernike moments, showing their relevance as biomarkers for classifying image tiles coming from bone tissue regeneration studies (Ujaldón, 2009) [24]. As part of our research quest for efficiency, we developed methods for accelerating those computations on GPUs (Martín-Requena and Ujaldón, 2011) and . This new stage of our work focuses on the efficient data partitioning to optimize the execution on many-cores and clusters of GPUs to attain gains up to three orders of magnitude when compared to the execution on multi-core CPUs of similar age and cost using 1 Mpixel images. We deploy a successive and successful chain of optimizations which exploit symmetries in trigonometric functions and access patterns to image pixels which are effectively combined with massive data parallelism on GPUs to enable (1) real-time processing for our set of input biomedical images, and (2) the use of high-resolution images in clinical practice. 相似文献
19.
汤敏 《计算机工程与应用》2008,44(25):215-218
介绍一种基于梯度向量流场的医学图像分割方法。无论初始轮廓线位于真实边界以内或以外,变形轮廓都具有较宽的作用范围以及良好的收敛性,经过迭代算法后可以得到与真实图像边界十分接近的最终变形轮廓。此外,该方法对噪声图像也表现出良好的鲁棒性,特别适用于医学图像分割场合。将该方法应用于MRI图像上胼胝体的分割提取,实验结果表明,与传统手工方法相比,应用梯度向量流场方法提取出的胼胝体轮廓清晰,效果良好,而且耗时大为降低,这在临床应用中具有积极意义。 相似文献
20.
The powerful parallel computing ability of Graphics Processing Unit (GPU) has shown its striking superiority for motion estimation acceleration in conventional hybrid video encoding process. Unfortunately, the motion information of the neighboring macroblocks is not available for current macroblock, such that parallel motion estimation using GPU is not very favored. To tackle this problem while achieving high acceleration ration, motion vector cost is always ignored in most existing solutions, which inevitably causes severe rate-distortion loss. In this paper, a novel motion vector extrapolation based approach (MVEA) is presented for enhancing rate-distortion performance of parallel motion estimation on GPU, which is based on the study of motion vector recovery strategies for frame loss error concealment. Furthermore, the efficient implementation of MVEA on Computing Unified Device Architecture (CUDA) is also investigated. Simulation results show that MVEA can achieve a maximum peak Signal-to-Noise ratio enhancement of 0.8 dB with ignorable computational cost increase. 相似文献