首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data‐parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth‐first frustum traversal through a bounding volume hierarchy for the scene, and localized ray‐primitive intersections. We utilize the well known parallel primitives scan and segmented scan in order to process irregular data structures, to remove the need for a stack, and to minimize branch divergence in all stages. Our ray sorting stage is based on applying hash values to individual rays, ray stream compression, sorting and decompression. Our breadth‐first BVH traversal is based on parallel frustum‐bounding box intersection tests and parallel scan per each BVH level. We demonstrate our algorithm with area light sources to get a soft shadow effect and show that our concept is reasonable for GPU implementation. For the same data sets and ray‐primitive intersection routines our pipeline is ~3x faster than an optimized standard depth first ray tracing implemented in one kernel.  相似文献   

2.
We propose a method for creating a bounding volume hierarchy (BVH) that is optimized for all frames of a given animated scene. The method is based on a novel extension of surface area heuristic to temporal domain (T‐SAH). We perform iterative BVH optimization using T‐SAH and create a single BVH accounting for scene geometry distribution at different frames of the animation. Having a single optimized BVH for the whole animation makes our method extremely easy to integrate to any application using BVHs, limiting the per‐frame overhead only to refitting the bounding volumes. We evaluated the T‐SAH optimized BVHs in the scope of real‐time GPU ray tracing. We demonstrate, that our method can handle even highly complex inputs with large deformations and significant topology changes. The results show, that in a vast majority of tested scenes our method provides significantly better run‐time performance than traditional SAH and also better performance than GPU based per‐frame BVH rebuild.  相似文献   

3.
We present a spatial index structure to accelerate ray tracing on GPUs. It is a flat, non‐hierarchical spatial subdivision of the scene into axis aligned cells of varying size. In order to construct it, we first nest an octree into each cell of a uniform grid. We then apply two optimization passes to increase ray traversal performance: First, we reduce the expected cost for ray traversal by merging cells together. This adapts the structure to complex primitive distributions, solving the “teapot in a stadium” problem. Second, we decouple the cell boundaries used during traversal for rays entering and exiting a given cell. This allows us to extend the exiting boundaries over adjacent cells that are either empty or do not contain additional primitives. Now, exiting rays can skip empty space and avoid repeating intersection tests. Finally, we demonstrate that in addition to the fast ray traversal performance, the structure can be rebuilt efficiently in parallel, allowing for ray tracing dynamic scenes.  相似文献   

4.
Compared with its competitors such as the bounding volume hierarchy, a drawback of the kd‐tree structure is that a large number of triangles are repeatedly duplicated during its construction, which often leads to inefficient, large and tall binary trees with high triangle redundancy. In this paper, we propose a space‐efficient kd‐tree representation where, unlike commonly used methods, an inner node is allowed to optionally store a reference to a triangle, so highly redundant triangles in a kd‐tree can be culled from the leaf nodes and moved to the inner nodes. To avoid the construction of ineffective kd‐trees entailing computational inefficiencies due to early, possibly unnecessary, ray‐triangle intersection calculations that now have to be performed in the inner nodes during the kd‐tree traversal, we present heuristic measures for determining when and how to choose triangles for inner nodes during kd‐tree construction. Based on these metrics, we describe how the new form of kd‐tree is constructed and stored compactly using a carefully designed data layout. Our experiments with several example scenes showed that our kd‐tree representation technique significantly reduced the memory requirements for storing the kd‐tree structure, while effectively suppressing the unavoidable frame‐rate degradation observed during ray tracing.  相似文献   

5.
Photorealistic image synthesis is a computationally demanding task that relies on ray tracing for the evaluation of integrals. Rendering time is dominated by tracing long paths that are very incoherent by construction. We therefore investigate the use of SIMD instructions to accelerate incoherent rays. SIMD is used in the hierarchy construction, the tree traversal and the leaf intersection. This is achieved by increasing the arity of acceleration structures, which also reduces memory requirements. We show that the resulting hierarchies can be built quickly and are smaller than acceleration structures known so far while at the same time outperforming them for incoherent rays. Our new acceleration structure speeds up ray tracing by a factor of 1.6 to 2.0 compared to a highly optimized bounding interval hierarchy implementation, and 1.3 to 1.6 compared to an efficient kd‐tree. At the same time, the memory requirements are reduced by 10–50%. Additionally we show how a caching mechanism in conjunction with this memory efficient hierarchy can be used to speed up shadow rays in a global illumination algorithm without increasing the memory footprint. This optimization decreased the number of traversal steps up to 50%.  相似文献   

6.
In this paper we introduce the constrained tetrahedralization as a new acceleration structure for ray tracing. A constrained tetrahedralization of a scene is a tetrahedralization that respects the faces of the scene geometry. The closest intersection of a ray with a scene is found by traversing this tetrahedralization along the ray, one tetrahedron at a time. We show that constrained tetrahedralizations are a viable alternative to current acceleration structures, and that they have a number of unique properties that set them apart from other acceleration structures: constrained tetrahedralizations are not hierarchical yet adaptive; the complexity of traversing them is a function of local geometric complexity rather than global geometric complexity; constrained tetrahedralizations support deforming geometry without any effort; and they have the potential to unify several data structures currently used in global illumination.  相似文献   

7.
Ray‐traced global illumination (GI) is becoming widespread in production rendering but incoherent secondary ray traversal limits practical rendering to scenes that fit in memory. Incoherent shading also leads to intractable performance with production‐scale textures forcing renderers to resort to caching of irradiance, radiosity, and other values to amortize expensive shading. Unfortunately, such caching strategies complicate artist workflow, are difficult to parallelize effectively, and contend for precious memory. Worse, these caches involve approximations that compromise quality. In this paper, we introduce a novel path‐tracing framework that avoids these tradeoffs. We sort large, potentially out‐of‐core ray batches to ensure coherence of ray traversal. We then defer shading of ray hits until we have sorted them, achieving perfectly coherent shading and avoiding the need for shading caches.  相似文献   

8.
We propose a novel algorithm for construction of bounding volume hierarchies (BVHs) for multi‐core CPU architectures. The algorithm constructs the BVH by a divisive top‐down approach using a progressively refined cut of an existing auxiliary BVH. We propose a new strategy for refining the cut that significantly reduces the workload of individual steps of BVH construction. Additionally, we propose a new method for integrating spatial splits into the BVH construction algorithm. The auxiliary BVH is constructed using a very fast method such as LBVH based on Morton codes. We show that the method provides a very good trade‐off between the build time and ray tracing performance. We evaluated the method within the Embree ray tracing framework and show that it compares favorably with the Embree BVH builders regarding build time while maintaining comparable ray tracing speed.  相似文献   

9.
Modern supercomputers enable increasingly large N‐body simulations using unstructured point data. The structures implied by these points can be reconstructed implicitly. Direct volume rendering of radial basis function (RBF) kernels in domain‐space offers flexible classification and robust feature reconstruction, but achieving performant RBF volume rendering remains a challenge for existing methods on both CPUs and accelerators. In this paper, we present a fast CPU method for direct volume rendering of particle data with RBF kernels. We propose a novel two‐pass algorithm: first sampling the RBF field using coherent bounding hierarchy traversal, then subsequently integrating samples along ray segments. Our approach performs interactively for a range of data sets from molecular dynamics and astrophysics up to 82 million particles. It does not rely on level of detail or subsampling, and offers better reconstruction quality than structured volume rendering of the same data, exhibiting comparable performance and requiring no additional preprocessing or memory footprint other than the BVH. Lastly, our technique enables multi‐field, multi‐material classification of particle data, providing better insight and analysis.  相似文献   

10.
State‐of‐the‐art density estimation methods for rendering participating media rely on a dense photon representation of the radiance distribution within a scene. A critical bottleneck of such kernel‐based approaches is the excessive number of photons that are required in practice to resolve fine illumination details, while controlling the amount of noise. In this paper, we propose a parametric density estimation technique that represents radiance using a hierarchical Gaussian mixture. We efficiently obtain the coefficients of this mixture using a progressive and accelerated form of the Expectation‐Maximization algorithm. After this step, we are able to create noise‐free renderings of high‐frequency illumination using only a few thousand Gaussian terms, where millions of photons are traditionally required. Temporal coherence is trivially supported within this framework, and the compact footprint is also useful in the context of real‐time visualization. We demonstrate a hierarchical ray tracing‐based implementation, as well as a fast splatting approach that can interactively render animated volume caustics.  相似文献   

11.
We propose a unified rendering approach that jointly handles motion and defocus blur for transparent and opaque objects at interactive frame rates. Our key idea is to create a sampled representation of all parts of the scene geometry that are potentially visible at any point in time for the duration of a frame in an initial rasterization step. We store the resulting temporally‐varying fragments (t‐fragments) in a bounding volume hierarchy which is rebuild every frame using a fast spatial median construction algorithm. This makes our approach suitable for interactive applications with dynamic scenes and animations. Next, we perform spatial sampling to determine all t‐fragments that intersect with a specific viewing ray at any point in time. Viewing rays are sampled according to the lens uv‐sampling for depth‐of‐field effects. In a final temporal sampling step, we evaluate the predetermined viewing ray/t‐fragment intersections for one or multiple points in time. This allows us to incorporate all standard shading effects including transparency. We describe the overall framework, present our GPU implementation, and evaluate our rendering approach with respect to scalability, quality, and performance.  相似文献   

12.
We present a novel method for massively parallel hierarchical scene processing on the GPU, which is based on sequential decomposition of the given hierarchical algorithm into small functional blocks. The computation is fully managed by the GPU using a specialized task pool which facilitates synchronization and communication of processing units. We present two applications of the proposed approach: construction of the bounding volume hierarchies and collision detection based on divide‐and‐conquer ray tracing. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization.  相似文献   

13.
This paper presents a digital storytelling approach that generates automatic animations for time‐varying data visualization. Our approach simulates the composition and transition of storytelling techniques and synthesizes animations to describe various event features. Specifically, we analyze information related to a given event and abstract it as an event graph, which represents data features as nodes and event relationships as links. This graph embeds a tree‐like hierarchical structure which encodes data features at different scales. Next, narrative structures are built by exploring starting nodes and suitable search strategies in this graph. Different stages of narrative structures are considered in our automatic rendering parameter decision process to generate animations as digital stories. We integrate this animation generation approach into an interactive exploration process of time‐varying data, so that more comprehensive information can be provided in a timely fashion. We demonstrate with a storm surge application that our approach allows semantic visualization of time‐varying data and easy animation generation for users without special knowledge about the underlying visualization techniques.  相似文献   

14.
Stackless traversal algorithms for ray tracing acceleration structures require significantly less storage per ray than ordinary stack‐based ones. This advantage is important for massively parallel rendering methods, where there are many rays in flight. On SIMD architectures, a commonly used acceleration structure is the MBVH, which has multiple bounding boxes per node for improved parallelism. It scales to branching factors higher than two, for which, however, only stack‐based traversal methods have been proposed so far. In this paper, we introduce a novel stackless traversal algorithm for MBVHs with up to four‐way branching. Our approach replaces the stack with a small bitmask, supports dynamic ordered traversal, and has a low computation overhead. We also present efficient implementation techniques for recent CPU, MIC (Intel Xeon Phi) and GPU (NVIDIA Kepler) architectures.  相似文献   

15.
We introduce image-space radiosity and a hierarchical variant as a method for interactively approximating diffuse indirect illumination in fully dynamic scenes. As oft observed, diffuse indirect illumination contains mainly low-frequency details that do not require independent computations at every pixel. Prior work leverages this to reduce computation costs by clustering and caching samples in world or object space. This often involves scene preprocessing, complex data structures for caching, or wasted computations outside the view frustum. We instead propose clustering computations in image space, allowing the use of cheap hardware mipmapping and implicit quadtrees to allow coarser illumination computations. We build on a recently introduced multiresolution splatting technique combined with an image-space lightcut algorithm to intelligently choose virtual point lights for an interactive, one-bounce instant radiosity solution. Intelligently selecting point lights from our reflective shadow map enables temporally coherent illumination similar to results using more than 4096 regularly-sampled VPLs.  相似文献   

16.
We present a performance comparison of bounding volume hierarchies and kd‐trees for ray tracing on many‐core architectures (GPUs). The comparison is focused on rendering times and traversal characteristics on the GPU using data structures that were optimized for very high performance of tracing rays. To achieve low rendering times, we extensively examine the constants used in termination criteria for the two data structures. We show that for a contemporary GPU architecture (NVIDIA Kepler) bounding volume hierarchies have higher ray tracing performance than kd‐trees for simple and moderately complex scenes. On the other hand, kd‐trees have higher performance for complex scenes, in particular for those with high depth complexity. Finally, we analyse the causes of the performance discrepancies using the profiling characteristics of the ray tracing kernels.  相似文献   

17.
We present a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory. Rays are organized in large packets that can be distributed among the two units as needed. Testing visibility between rays and the scene is mostly performed using an optimized kernel on the GPU, but the CPU can help as necessary. The CPU cores typically handle most or all shading, which makes it easy to support complex appearances. For efficiency, the CPU cores shade whole batches of rays by sorting them on material and shading each material using a vectorized kernel. In addition, we introduce a method to support light paths with arbitrary recursion, such as multiple recursive Whitted‐style ray tracing and adaptive sampling where the result of a ray is examined before sending the next, while still batching up rays for the benefit of GPU‐accelerated traversal and vectorized shading. This allows our system to achieve high rendering performance while maintaining the flexibility to accommodate different rendering algorithms.  相似文献   

18.
Matrix Trees     
We propose a new data representation for octrees and kd‐trees that improves upon memory size and algorithm speed of existing techniques. While pointerless approaches exploit the regular structure of the tree to facilitate efficient data access, their memory footprint becomes prohibitively large as the height of the tree increases. Pointerbased trees require memory consumption proportional to the number of tree nodes, thus exploiting the typical sparsity of large trees. Yet, their traversal is slowed by the need to follow explicit pointers across the different levels. Our solution is a pointerless approach that represents each tree level with its own matrix, as opposed to traditional pointerless trees that use only a single vector. This novel data organization allows us to fully exploit the tree's regular structure and improve the performance of tree operations. By using a sparse matrix data structure we obtain a representation that is suited for sparse and dense trees alike. In particular, it uses less total memory than pointer‐based trees even when the data set is extremely sparse. We show how our approach is easily implemented on the GPU and illustrate its performance in typical visualization scenarios.  相似文献   

19.
In this paper we review the traversal algorithms for kd‐trees for ray tracing. Ordinary traversal algorithms such as sequential, recursive, and those with neighbour‐links have different limitations, which led to several new developments within the last decade. We describe algorithms exploiting ray coherence and algorithms designed with specific hardware architecture limitations such as memory latency and consumption in mind. We also discuss the robustness of traversal algorithms as one issue that has been neglected in previous research.  相似文献   

20.
Stackless KD-Tree Traversal for High Performance GPU Ray Tracing   总被引:1,自引:1,他引:1  
Significant advances have been achieved for realtime ray tracing recently, but realtime performance for complex scenes still requires large computational resources not yet available from the CPUs in standard PCs. Incidentally, most of these PCs also contain modern GPUs that do offer much larger raw compute power. However, limitations in the programming and memory model have so far kept the performance of GPU ray tracers well below that of their CPU counterparts. In this paper we present a novel packet ray traversal implementation that completely eliminates the need for maintaining a stack during kd-tree traversal and that reduces the number of traversal steps per ray. While CPUs benefit moderately from the stackless approach, it improves GPU performance significantly. We achieve a peak performance of over 16 million rays per second for reasonably complex scenes, including complex shading and secondary rays. Several examples show that with this new technique GPUs can actually outperform equivalent CPU based ray tracers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号