首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present an efficient algorithm to perform approximate offsetting operations on geometric models using GPUs. Our approach approximates the boundary of an object with point samples and computes the offset by merging the balls centered at these points. The underlying approach uses Layered Depth Images (LDI) to organize the samples into structured points and performs parallel computations using multiple cores. We use spatial hashing to accelerate intersection queries and balance the workload among various cores. Furthermore, the problem of offsetting with a large distance is decomposed into successive offsetting using smaller distances. We derive bounds on the accuracy of offset computation as a function of the sampling rate of LDI and offset distance. In practice, our GPU-based algorithm can accurately compute offsets of models represented using hundreds of thousands of points in a few seconds on a GeForce GTX 580 GPU. We observe more than 100 times speedup over prior serial CPU-based approximate offset computation algorithms.  相似文献   

2.
In this study, we describe a GPU-based filter for image denoising, whose principle rests on Matheron’s level sets theory first introduced in 1975 but rarely implemented because of its high computation cost. We use the fact that, within a natural image, significant contours of objects coincide with parts of the image level-lines. The presented algorithm assumes an a priori knowledge of the corrupting noise type and uses the polygonal level-line modeling constraint to estimate the gray-level of each pixel of the denoised image by local maximum likelihood optimization. Over the 512 × 512 pixel test images, the freely available implementation of the state-of-the-art BM3D algorithm achieves 9.56 dB and 36 % of mean improvement in 4.3 s, respectively, for peak signal to noise ratio and mean structural similarity index. Over the same images, our implementation features a high quality/runtime ratio, with a mean improvement of 7.14 dB and 30 % in 9 ms, which is 470 times as fast and potentially allows processing high-definition video images at 19 fps.  相似文献   

3.
4.
A mathematical formulation for the 3D vortex method has been developed for calculation using a special-purpose computer MDGRAPE-2 that was originally designed for molecular dynamics simulations. We made an assessment of this hardware for a few representative problems and compared the results with and without it. It is found that the generation of appropriate function tables, which are used to call libraries, embedded in MDGRAPE-2 is of primary importance in order to retain accuracy. The error arising from the approximation is evaluated by calculating a pair of vortex rings impinging to themselves. Consequently, acceleration about 50 times greater is achieved by MDGRAPE-2 while the error in the statistical quantities such as kinetic energy and enstrophy remain negligible.  相似文献   

5.
Fast GPU-based Adaptive Tessellation with CUDA   总被引:1,自引:0,他引:1  
  相似文献   

6.
7.
Metaballs are implicit surfaces widely used to model curved objects, represented by the isosurface of a density field defined by a set of points. Recently, the results of particle‐based simulations have been often visualized using a large number of metaballs, however, such visualizations have high rendering costs. In this paper we propose a fast technique for rendering metaballs on the GPU. Instead of using polygonization, the isosurface is directly evaluated in a per‐pixel manner. For such evaluation, all metaballs contributing to the isosurface need to be extracted along each viewing ray, on the limited memory of GPUs. We handle this by keeping a list of metaballs contributing to the isosurface and efficiently update it. Our method neither requires expensive precomputation nor acceleration data structures often used in existing ray tracing techniques. With several optimizations, we can display a large number of moving metaballs quickly.  相似文献   

8.
Conclusions Most image and signal processing algorithms are local algorithms, regardless of the specific physical nature of the images and signals. Because of the strong correlation of operations required to determine two neighboring elements, image processing can be substantially accelerated by using fast algorithms. Some considerations regarding the construction of such algorithms on the basis of recursive computation have been examined in this paper. The development of specific algorithms for fast local image processing requires further work, because we still do not have the explicit form of the algorithm for the determination of a general local function. We have discussed some particular examples of fast algorithms, including linear and median filtering. Linear filtering and algorithms with linear function evaluation are often used for digital image and signal processing in various fields. The general form of linear filtering can be determined using piecewise-linear approximation, which also ensures independence of the number of operations on window size, albeit at a cost of a certain loss of accuracy. By sacrificing some accuracy of linear filtering, we accelerate the processing, whereas higher accuracy is achieved by reducing the processing speed. Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 146–157, January–February, 1994.  相似文献   

9.
Fast inverse offset computation using polygon rendering hardware   总被引:2,自引:0,他引:2  
Mold and die parts are usually fabricated using 3-axis numerically controlled milling machines with ball-end, flat-end or round-end cutters. The cutter location (CL) surface representing a trajectory surface of the cutter's reference point when the cutter is slid over a part is important for preventing the gouging problem. This surface is equivalent to the inverse offset shape of the part, which is the top surface of the swept volume of the inverse cutter moving around the part surface. The author proposes a fast computation method of the inverse offset shape of a polyhedral part using the hidden-surface elimination mechanism of the polygon rendering hardware. In this method, the CL surface is obtained by simply rendering the component objects of the swept volume. An experimental program is implemented and demonstrated.  相似文献   

10.
The semi-classical atomic-orbital close-coupling method is a well-known approach for the calculation of cross sections in ion–atom collisions. It strongly relies on the fast and stable computation of exchange integrals. We present an upgrade to earlier implementations of the Fourier-transform method.For this purpose, we implement an extensive library for symbolic storage of polynomials, relying on sophisticated tree structures to allow fast manipulation and numerically stable evaluation. Using this library, we considerably speed up creation and computation of exchange integrals. This enables us to compute cross sections for more complex collision systems.

Program summary

Program title: TXINTCatalogue identifier: AEHS_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHS_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 12 332No. of bytes in distributed program, including test data, etc.: 157 086Distribution format: tar.gzProgramming language: Fortran 95Computer: All with a Fortran 95 compilerOperating system: All with a Fortran 95 compilerRAM: Depends heavily on input, usually less than 100 MiBClassification: 16.10Nature of problem: Analytical calculation of one- and two-center exchange matrix elements for the close-coupling method in the impact parameter model.Solution method: Similar to the code of Hansen and Dubois [1], we use the Fourier-transform method suggested by Shakeshaft [2] to compute the integrals. However, we heavily speed up the calculation using a library for symbolic manipulation of polynomials.Restrictions: We restrict ourselves to a defined collision system in the impact parameter model.Unusual features: A library for symbolic manipulation of polynomials, where polynomials are stored in a space-saving left-child right-sibling binary tree. This provides stable numerical evaluation and fast mutation while maintaining full compatibility with the original code.Additional comments: This program makes heavy use of the new features provided by the Fortran 90 standard, most prominently pointers, derived types and allocatable structures and a small portion of Fortran 95. Only newer compilers support these features. Following compilers support all features needed by the program.
  • • 
    GNU Fortran Compiler “gfortran” from version 4.3.0
  • • 
    GNU Fortran 95 Compiler “g95” from version 4.2.0
  • • 
    Intel Fortran Compiler “ifort” from version 11.0
Running time: Heavily dependent on input, usually less than one CPU second.References:
  • [1] 
    J.-P. Hansen, A. Dubois, Comput. Phys. Commun. 67 (1992) 456.
  • [2] 
    R. Shakeshaft, J. Phys. B: At. Mol. Opt. Phys. 8 (1975) L134.
  相似文献   

11.
The gradient vector flow (GVF) deformable model was introduced by Xu and Prince as an effective approach to overcome the limited capture range problem of classical deformable models and their inability to progress into boundary concavities. It has found many important applications in the area of medical image processing. The simple iterative method proposed in the original work on GVF, however, is slow to converge. A new multigrid method is proposed for GVF computation on 2D and 3D images. Experimental results show that the new implementation significantly improves the computational speed by at least an order of magnitude, which facilitates the application of GVF deformable models in processing large medical images  相似文献   

12.
We report fast computation of computer-generated holograms (CGHs) using Xeon Phi coprocessors, which have massively x86-based processors on one chip, recently released by Intel. CGHs can generate arbitrary light wavefronts, and therefore, are promising technology for many applications: for example, three-dimensional displays, diffractive optical elements, and the generation of arbitrary beams. CGHs incur enormous computational cost. In this paper, we describe the implementations of several CGH generating algorithms on the Xeon Phi, and the comparisons in terms of the performance and the ease of programming between the Xeon Phi, a CPU and graphics processing unit (GPU).  相似文献   

13.
The Journal of Supercomputing - The lateral interaction in accumulative computation (LIAC) algorithm is a biologically inspired method that allows us to detect moving objects from image sequences...  相似文献   

14.
汤颖  肖廷哲  范菁 《计算机科学》2014,41(2):290-296
图像相似区域查找是很多图形图像应用中的关键问题,也是计算瓶颈。传统加速方法如ANN(Approximate Nearest Neighbor)处理较大图像区域时速度较慢,而且在非度量空间下不支持精确查找。提出基于GPU加速的图像相似区域并行查找的通用计算框架,该框架可以扩展,以支持任意距离函数。特别针对在图像处理中应用广泛的欧氏距离(度量空间)和Chamfer距离(非度量空间)分别提出了基于CUDA的高效相似区域查找算法,比较完备地给出了相似性计算在不同度量空间下的实现。进一步,在设计具体的CUDA加速算法中,结合不同距离计算的特点对并行计算过程进行优化。该方法采用穷举的查找策略,在欧氏距离和Chamfer距离下都能实现精确查找,且大大提高了计算效率。实验结果表明,加速算法在准确查找的基础上执行速度比传统加速方法提升了一至二个数量级,且应用于纹理合成的实例表明,算法可以快速合成高质量的纹理。  相似文献   

15.
Zernike moments have been extensively used and have received much research attention in a number of fields: object recognition, image reconstruction, image segmentation, edge detection and biomedical imaging. However, computation of these moments is time consuming. Thus, we present a fast computation technique to calculate exact Zernike moments by using cascaded digital filters. The novelty of the method proposed in this paper lies in the computation of exact geometric moments directly from digital filter outputs, without the need to first compute geometric moments. The mathematical relationship between digital filter outputs and exact geometric moments is derived and then they are used in the formulation of exact Zernike moments. A comparison of the speed of performance of the proposed algorithm with other state-of-the-art alternatives shows that the proposed algorithm betters current computation time and uses less memory.  相似文献   

16.
Periodic centroidal Voronoi tessellation (CVT) in hyperbolic space provides a nice theoretical framework for computing the constrained CVT on high-genus (genus>1) surfaces. This paper addresses two computational issues related to such a hyperbolic CVT framework: (1) efficient reduction of unnecessary site copies in neighbor domains on the universal covering space, based on two special rules; (2) GPU-based parallel algorithms to compute a discrete version of the hyperbolic CVT. Our experiments show that with the dramatically reduced number of unnecessary site copies in neighbor domains and the GPU-based parallel algorithms, we significantly speed up the computation of CVT for high-genus surfaces. The proposed discrete hyperbolic CVT guarantees to converge and produces high-quality results.  相似文献   

17.
Spatial database operations are typically performed in two steps. In the filtering step, indexes and the minimum bounding rectangles (MBRs) of the objects are used to quickly determine a set of candidate objects. In the refinement step, the actual geometries of the objects are retrieved and compared to the query geometry or each other. Because of the complexity of the computational geometry algorithms involved, the CPU cost of the refinement step is usually the dominant cost of the operation for complex geometries such as polygons. Although many run-time and pre-processing-based heuristics have been proposed to alleviate this problem, the CPU cost still remains the bottleneck. In this paper, we propose a novel approach to address this problem using the efficient rendering and searching capabilities of modern graphics hardware. This approach does not require expensive pre-processing of the data or changes to existing storage and index structures, and is applicable to both intersection and distance predicates. We evaluate this approach by comparing the performance with leading software solutions. The results show that by combining hardware and software methods, the overall computational cost can be reduced substantially for both spatial selections and joins. We integrated this hardware/software co-processing technique into a popular database to evaluate its performance in the presence of indexes, pre-processing and other proprietary optimizations. Extensive experimentation with real-world data sets show that the hardware-accelerated technique not only outperforms the run-time software solutions but also performs as well if not better than pre-processing-assisted techniques.  相似文献   

18.
As today’s standard screening methods frequently fail to detect breast cancer before metastases have developed, early diagnosis is still a major challenge. With the promise of high-quality volume images, three-dimensional ultrasound computer tomography is likely to improve this situation, but has high computational needs. In this work, we investigate the acceleration of the ray-based transmission reconstruction by a GPU-based implementation of the iterative numerical optimization algorithm TVAL3. We identified the regular and transposed sparse-matrix–vector multiply as the performance limiting operations. For accelerated reconstruction we propose two different concepts and devise a hybrid scheme as optimal configuration. In addition we investigate multi-GPU scalability and derive the optimal number of devices for our two primary use-cases: a fast preview mode and a high-resolution mode. In order to achieve a fair estimation of the speedup, we compare our implementation to an optimized CPU version of the algorithm. Using our accelerated implementation we reconstructed a preview 3D volume with 24,576 unknowns, a voxel size of (8 mm)3 and approximately 200,000 equations in 0.5 s. A high-resolution volume with 1,572,864 unknowns, a voxel size of (2mm)3 and approximately 1.6 million equations was reconstructed in 23 s. This constitutes an acceleration of over one order of magnitude in comparison to the optimized CPU version.  相似文献   

19.
Fast algorithms are presented for performing computations in a probabilistic population model. This is a variant of the standard population protocol model, in which finite-state agents interact in pairs under the control of an adversary scheduler, where all pairs are equally likely to be chosen for each interaction. It is shown that when a unique leader agent is provided in the initial population, the population can simulate a virtual register machine with high probability in which standard arithmetic operations like comparison, addition, subtraction, and multiplication and division by constants can be simulated in O(n log5 n) interactions using a simple register representation or in O(n log2 n) interactions using a more sophisticated representation that requires an extra O(n log O(1) n)-interaction initialization step. The central method is the extensive use of epidemics to propagate information from and to the leader, combined with an epidemic-based phase clock used to detect when these epidemics are likely to be complete. Applications include a reduction of the cost of computing a semilinear predicate to O(n log5 n) interactions from the previously best-known bound of O(n 2 log n) interactions and simulation of a LOGSPACE Turing machine using O(n log2 n) interactions per step after an initial O(n log O(1) n)-interaction startup phase. These bounds on interactions translate into polylogarithmic time per step in a natural parallel model in which each agent participates in an expected Θ(1) interactions per time unit. Open problems are discussed, together with simulation results that suggest the possibility of removing the initial-leader assumption. An extended abstract of this paper previously appeared in DISC 2006 [6]. Some additional material previously appeared in DISC 2007 [7]. The second author was supported in part by NSF grants CNS-0305258 and CNS-0435201.  相似文献   

20.
Traditional volume rendering does not incorporate a number of optical properties that are typically observed for semi-transparent materials, such as glass or water, in the real world. Therefore, we have extended GPU-based raycasting to spectral volume rendering based on the Kubelka–Munk theory for light propagation in parallel colorant layers of a turbid medium. This allows us to demonstrate the effects of selective absorption and dispersion in refractive materials, by generating volume renderings using real physical optical properties. We show that this extended volume rendering technique can be easily incorporated into a flexible framework for GPU-based volume raycasting. Our implementation shows a promising performance for a number of real data sets. In particular, we obtain up to 100 times the performance of a comparable CPU implementation. Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号