首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Compared with its competitors such as the bounding volume hierarchy, a drawback of the kd‐tree structure is that a large number of triangles are repeatedly duplicated during its construction, which often leads to inefficient, large and tall binary trees with high triangle redundancy. In this paper, we propose a space‐efficient kd‐tree representation where, unlike commonly used methods, an inner node is allowed to optionally store a reference to a triangle, so highly redundant triangles in a kd‐tree can be culled from the leaf nodes and moved to the inner nodes. To avoid the construction of ineffective kd‐trees entailing computational inefficiencies due to early, possibly unnecessary, ray‐triangle intersection calculations that now have to be performed in the inner nodes during the kd‐tree traversal, we present heuristic measures for determining when and how to choose triangles for inner nodes during kd‐tree construction. Based on these metrics, we describe how the new form of kd‐tree is constructed and stored compactly using a carefully designed data layout. Our experiments with several example scenes showed that our kd‐tree representation technique significantly reduced the memory requirements for storing the kd‐tree structure, while effectively suppressing the unavoidable frame‐rate degradation observed during ray tracing.  相似文献   

2.
We propose a method for creating a bounding volume hierarchy (BVH) that is optimized for all frames of a given animated scene. The method is based on a novel extension of surface area heuristic to temporal domain (T‐SAH). We perform iterative BVH optimization using T‐SAH and create a single BVH accounting for scene geometry distribution at different frames of the animation. Having a single optimized BVH for the whole animation makes our method extremely easy to integrate to any application using BVHs, limiting the per‐frame overhead only to refitting the bounding volumes. We evaluated the T‐SAH optimized BVHs in the scope of real‐time GPU ray tracing. We demonstrate, that our method can handle even highly complex inputs with large deformations and significant topology changes. The results show, that in a vast majority of tested scenes our method provides significantly better run‐time performance than traditional SAH and also better performance than GPU based per‐frame BVH rebuild.  相似文献   

3.
This paper proposes an adaptive rendering technique for ray‐bundle tracing. Ray‐bundle tracing can be done by per‐pixel linked‐list construction on a GPU rasterization pipeline. This rasterization based approach offers significant benefits for the efficient generation of light maps (e.g., hardware acceleration, tessellation, and recycling of shaders used in real‐time graphics). However, it is inapplicable to large and complex scenes due to the limited capacity of the GPU memory because it requires a high‐resolution frame buffer and high‐capacity node buffer for the linked‐lists. In addition, memory overflow can potentially occur on the per‐pixel linked‐list since the memory usage of the lists is usually unknown before the rendering process. We introduce an adaptive tiling technique with memory usage prediction. Our method uses an appropriately tiled frame buffer, thus eliminating almost all of the overflow risks thanks to our adaptive tile subdivision scheme. Using this technique, we are able to render high‐quality light maps of large and complex scenes which cannot be computed using previous ray‐bundle based methods.  相似文献   

4.
We present a novel method for massively parallel hierarchical scene processing on the GPU, which is based on sequential decomposition of the given hierarchical algorithm into small functional blocks. The computation is fully managed by the GPU using a specialized task pool which facilitates synchronization and communication of processing units. We present two applications of the proposed approach: construction of the bounding volume hierarchies and collision detection based on divide‐and‐conquer ray tracing. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization.  相似文献   

5.
We propose a unified rendering approach that jointly handles motion and defocus blur for transparent and opaque objects at interactive frame rates. Our key idea is to create a sampled representation of all parts of the scene geometry that are potentially visible at any point in time for the duration of a frame in an initial rasterization step. We store the resulting temporally‐varying fragments (t‐fragments) in a bounding volume hierarchy which is rebuild every frame using a fast spatial median construction algorithm. This makes our approach suitable for interactive applications with dynamic scenes and animations. Next, we perform spatial sampling to determine all t‐fragments that intersect with a specific viewing ray at any point in time. Viewing rays are sampled according to the lens uv‐sampling for depth‐of‐field effects. In a final temporal sampling step, we evaluate the predetermined viewing ray/t‐fragment intersections for one or multiple points in time. This allows us to incorporate all standard shading effects including transparency. We describe the overall framework, present our GPU implementation, and evaluate our rendering approach with respect to scalability, quality, and performance.  相似文献   

6.
We present a flexible and highly efficient hardware‐assisted volume renderer grounded on the original Projected Tetrahedra (PT) algorithm. Unlike recent similar approaches, our method is exclusively based on the rasterization of simple geometric primitives and takes full advantage of graphics hardware. Both vertex and geometry shaders are used to compute the tetrahedral projection, while the volume ray integral is evaluated in a fragment shader; hence, volume rendering is performed entirely on the GPU within a single pass through the pipeline. We apply a CUDA‐based visibility ordering achieving rendering and sorting performance of over 6 M Tet/s for unstructured datasets. Furthermore, as each tetrahedron is processed independently, we employ a data‐parallel solution which is neither bound by GPU memory size nor does it rely on auxiliary volume information. In addition, iso‐surfaces can be readily extracted during the rendering process, and time‐varying data are handled without extra burden.  相似文献   

7.
Level‐of‐Detail structures are a key component for scalable rendering. Built from raw 3D data, these structures are often defined as Bounding Volume Hierarchies, providing coarse‐to‐fine adaptive approximations that are well‐adapted for many‐view rasterization. Here, the total number of pixels in each view is usually low, while the cost of choosing the appropriate LoD for each view is high. This task represents a challenge for existing GPU algorithms. We propose ManyLoDs, a new GPU algorithm to efficiently compute many LoDs from a Bounding Volume Hierarchy in parallel by balancing the workload within and among LoDs. Our approach is not specific to a particular rendering technique, can be used on lazy representations such as polygon soups, and can handle dynamic scenes. We apply our method to various many‐view rasterization applications, including Instant Radiosity, Point‐Based Global Illumination, and reflection/refraction mapping. For each of these, we achieve real‐time performance in complex scenes at high resolutions.  相似文献   

8.
Empty‐space skipping is an essential acceleration technique for volume rendering. Image‐order empty‐space skipping is not well suited to GPU implementation, since it must perform checks on, essentially, a per‐sample basis, as in kd‐tree traversal, which can lead to a great deal of divergent branching at runtime, which is very expensive in a modern GPU pipeline. In contrast, object‐order empty‐space skipping is extremely fast on a GPU and has negligible overheads compared with approaches without empty‐space skipping, since it employs the hardware unit for rasterisation. However, previous object‐order algorithms have been able to skip only exterior empty space and not the interior empty space that lies inside or between volume objects. In this paper, we address these issues by proposing a multi‐layer depth‐peeling approach that can obtain all of the depth layers of the tight‐fitting bounding geometry of the isosurface by a single rasterising pass. The maximum count of layers peeled by our approach can be up to thousands, while maintaining 32‐bit float‐point accuracy, which was not possible previously. By raytracing only the valid ray segments between each consecutive pair of depth layers, we can skip both the interior and exterior empty space efficiently. In comparisons with 3 state‐of‐the‐art GPU isosurface rendering algorithms, this technique achieved much faster rendering across a variety of data sets.  相似文献   

9.
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data‐parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth‐first frustum traversal through a bounding volume hierarchy for the scene, and localized ray‐primitive intersections. We utilize the well known parallel primitives scan and segmented scan in order to process irregular data structures, to remove the need for a stack, and to minimize branch divergence in all stages. Our ray sorting stage is based on applying hash values to individual rays, ray stream compression, sorting and decompression. Our breadth‐first BVH traversal is based on parallel frustum‐bounding box intersection tests and parallel scan per each BVH level. We demonstrate our algorithm with area light sources to get a soft shadow effect and show that our concept is reasonable for GPU implementation. For the same data sets and ray‐primitive intersection routines our pipeline is ~3x faster than an optimized standard depth first ray tracing implemented in one kernel.  相似文献   

10.
Simulation of light transport through lens systems plays an important role in graphics. While basic imaging properties can be conveniently derived from linear models (like ABCD matrices), these approximations fail to describe nonlinear effects and aberrations that arise in real optics. Such effects can be computed by proper ray tracing, for which, however, finding suitable sampling and filtering strategies is often not a trivial task. Inspired by aberration theory, which describes the deviation from the linear ray transfer in terms of wavefront distortions, we propose a ray‐space formulation for nonlinear effects. In particular, we approximate the analytical solution to the ray tracing problem by means of a Taylor expansion in the ray parameters. This representation enables a construction‐kit approach to complex optical systems in the spirit of matrix optics. It is also very simple to evaluate, which allows for efficient execution on CPU and GPU alike, including the computation of mixed derivatives of any order. We evaluate fidelity and performance of our polynomial model, and show applications in high‐quality offline rendering and at interactive frame rates.  相似文献   

11.
For ray tracing based methods, traversing a hierarchical acceleration data structure takes up a substantial portion of the total rendering time. We propose an additional data structure which cuts off large parts of the hierarchical traversal. We use the idea of ray classification combined with the hierarchical scene representation provided by a bounding volume hierarchy. We precompute short arrays of indices to subtrees inside the hierarchy and use them to initiate the traversal for a given ray class. This arrangement is compact enough to be cache‐friendly, preventing the method from negating its traversal gains by excessive memory traffic. The method is easy to use with existing renderers which we demonstrate by integrating it to the PBRT renderer. The proposed technique reduces the number of traversal steps by 42% on average, saving around 15% of time of finding ray‐scene intersection on average.  相似文献   

12.
We present a real‐time rendering algorithm for inhomogeneous, single scattering media, where all‐frequency shading effects such as glows, light shafts, and volumetric shadows can all be captured. The algorithm first computes source radiance at a small number of sample points in the medium, then interpolates these values at other points in the volume using a gradient‐based scheme that is efficiently applied by sample splatting. The sample points are dynamically determined based on a recursive sample splitting procedure that adapts the number and locations of sample points for accurate and efficient reproduction of shading variations in the medium. The entire pipeline can be easily implemented on the GPU to achieve real‐time performance for dynamic lighting and scenes. Rendering results of our method are shown to be comparable to those from ray tracing.  相似文献   

13.
Stackless KD-Tree Traversal for High Performance GPU Ray Tracing   总被引:1,自引:1,他引:1  
Significant advances have been achieved for realtime ray tracing recently, but realtime performance for complex scenes still requires large computational resources not yet available from the CPUs in standard PCs. Incidentally, most of these PCs also contain modern GPUs that do offer much larger raw compute power. However, limitations in the programming and memory model have so far kept the performance of GPU ray tracers well below that of their CPU counterparts. In this paper we present a novel packet ray traversal implementation that completely eliminates the need for maintaining a stack during kd-tree traversal and that reduces the number of traversal steps per ray. While CPUs benefit moderately from the stackless approach, it improves GPU performance significantly. We achieve a peak performance of over 16 million rays per second for reasonably complex scenes, including complex shading and secondary rays. Several examples show that with this new technique GPUs can actually outperform equivalent CPU based ray tracers.  相似文献   

14.
Image space photon mapping has the advantage of simple implementation on GPU without pre‐computation of complex acceleration structures. However, existing approaches use only a single image for tracing caustic photons, so they are limited to computing only a part of the global illumination effects for very simple scenes. In this paper we fully extend the image space approach by using multiple environment maps for photon mapping computation to achieve interactive global illumination of dynamic complex scenes. The two key problems due to the introduction of multiple images are 1) selecting the images to ensure adequate scene coverage; and 2) reliably computing ray‐geometry intersections with multiple images. We present effective solutions to these problems and show that, with multiple environment maps, the image‐space photon mapping approach can achieve interactive global illumination of dynamic complex scenes. The advantages of the method are demonstrated by comparison with other existing interactive global illumination methods.  相似文献   

15.
Modern supercomputers enable increasingly large N‐body simulations using unstructured point data. The structures implied by these points can be reconstructed implicitly. Direct volume rendering of radial basis function (RBF) kernels in domain‐space offers flexible classification and robust feature reconstruction, but achieving performant RBF volume rendering remains a challenge for existing methods on both CPUs and accelerators. In this paper, we present a fast CPU method for direct volume rendering of particle data with RBF kernels. We propose a novel two‐pass algorithm: first sampling the RBF field using coherent bounding hierarchy traversal, then subsequently integrating samples along ray segments. Our approach performs interactively for a range of data sets from molecular dynamics and astrophysics up to 82 million particles. It does not rely on level of detail or subsampling, and offers better reconstruction quality than structured volume rendering of the same data, exhibiting comparable performance and requiring no additional preprocessing or memory footprint other than the BVH. Lastly, our technique enables multi‐field, multi‐material classification of particle data, providing better insight and analysis.  相似文献   

16.
This paper aims at rendering interactive visual effects inherent to complex interactions between trees and rain in real‐time in order to increase the realism of natural rainy scenes. Such a complex phenomenon involves a great number of physical processes influenced by various interlinked factors and its rendering represents a thorough challenge in Computer Graphics. We approach this problem by introducing an original method to render drops dripping from leaves after interception of raindrops by foliage. Our method introduces a new hydrological model representing interactions between rain and foliage through a phenomenological approach. Our model reduces the complexity of the phenomenon by representing multiple dripping drops with a new fully functional form evaluated per‐pixel on‐the‐fly and providing improved control over density and physical properties. Furthermore, an efficient real‐time rendering scheme, taking full advantage of latest GPU hardware capabilities, allows the rendering of a large number of dripping drops even for complex scenes.  相似文献   

17.
Ray‐traced global illumination (GI) is becoming widespread in production rendering but incoherent secondary ray traversal limits practical rendering to scenes that fit in memory. Incoherent shading also leads to intractable performance with production‐scale textures forcing renderers to resort to caching of irradiance, radiosity, and other values to amortize expensive shading. Unfortunately, such caching strategies complicate artist workflow, are difficult to parallelize effectively, and contend for precious memory. Worse, these caches involve approximations that compromise quality. In this paper, we introduce a novel path‐tracing framework that avoids these tradeoffs. We sort large, potentially out‐of‐core ray batches to ensure coherence of ray traversal. We then defer shading of ray hits until we have sorted them, achieving perfectly coherent shading and avoiding the need for shading caches.  相似文献   

18.
We propose a novel algorithm for construction of bounding volume hierarchies (BVHs) for multi‐core CPU architectures. The algorithm constructs the BVH by a divisive top‐down approach using a progressively refined cut of an existing auxiliary BVH. We propose a new strategy for refining the cut that significantly reduces the workload of individual steps of BVH construction. Additionally, we propose a new method for integrating spatial splits into the BVH construction algorithm. The auxiliary BVH is constructed using a very fast method such as LBVH based on Morton codes. We show that the method provides a very good trade‐off between the build time and ray tracing performance. We evaluated the method within the Embree ray tracing framework and show that it compares favorably with the Embree BVH builders regarding build time while maintaining comparable ray tracing speed.  相似文献   

19.
Matrix Trees     
We propose a new data representation for octrees and kd‐trees that improves upon memory size and algorithm speed of existing techniques. While pointerless approaches exploit the regular structure of the tree to facilitate efficient data access, their memory footprint becomes prohibitively large as the height of the tree increases. Pointerbased trees require memory consumption proportional to the number of tree nodes, thus exploiting the typical sparsity of large trees. Yet, their traversal is slowed by the need to follow explicit pointers across the different levels. Our solution is a pointerless approach that represents each tree level with its own matrix, as opposed to traditional pointerless trees that use only a single vector. This novel data organization allows us to fully exploit the tree's regular structure and improve the performance of tree operations. By using a sparse matrix data structure we obtain a representation that is suited for sparse and dense trees alike. In particular, it uses less total memory than pointer‐based trees even when the data set is extremely sparse. We show how our approach is easily implemented on the GPU and illustrate its performance in typical visualization scenarios.  相似文献   

20.
We present a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory. Rays are organized in large packets that can be distributed among the two units as needed. Testing visibility between rays and the scene is mostly performed using an optimized kernel on the GPU, but the CPU can help as necessary. The CPU cores typically handle most or all shading, which makes it easy to support complex appearances. For efficiency, the CPU cores shade whole batches of rays by sorting them on material and shading each material using a vectorized kernel. In addition, we introduce a method to support light paths with arbitrary recursion, such as multiple recursive Whitted‐style ray tracing and adaptive sampling where the result of a ray is examined before sending the next, while still batching up rays for the benefit of GPU‐accelerated traversal and vectorized shading. This allows our system to achieve high rendering performance while maintaining the flexibility to accommodate different rendering algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号