首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
We present a novel highly parallel method for optimizing bounding volume hierarchies (BVH) targeting contemporary GPU architectures. The core of our method is based on the insertion‐based BVH optimization that is known to achieve excellent results in terms of the SAH cost. The original algorithm is, however, inherently sequential: no efficient parallel version of the method exists, which limits its practical utility. We reformulate the algorithm while exploiting the observation that there is no need to remove the nodes from the BVH prior to finding their optimized positions in the tree. We can search for the optimized positions for all nodes in parallel while simultaneously tracking the corresponding SAH cost reduction. We update in parallel all nodes for which better position was found while efficiently handling potential conflicts during these updates. We implemented our algorithm in CUDA and evaluated the resulting BVH in the context of the GPU ray tracing. The results indicate that the method is able to achieve the best ray traversal performance among the state of the art GPU‐based BVH construction methods.  相似文献   

We propose a novel algorithm for construction of bounding volume hierarchies (BVHs) for multi‐core CPU architectures. The algorithm constructs the BVH by a divisive top‐down approach using a progressively refined cut of an existing auxiliary BVH. We propose a new strategy for refining the cut that significantly reduces the workload of individual steps of BVH construction. Additionally, we propose a new method for integrating spatial splits into the BVH construction algorithm. The auxiliary BVH is constructed using a very fast method such as LBVH based on Morton codes. We show that the method provides a very good trade‐off between the build time and ray tracing performance. We evaluated the method within the Embree ray tracing framework and show that it compares favorably with the Embree BVH builders regarding build time while maintaining comparable ray tracing speed.  相似文献   

We present a novel, compact bounding volume hierarchy, TSS BVH, for ray tracing subdivision surfaces computed by the Catmull‐Clark scheme. We use Tetrahedron Swept Sphere (TSS) as a bounding volume to tightly bound limit surfaces of such subdivision surfaces given a user tolerance. Geometric coordinates defining our TSS bounding volumes are implicitly computed from the subdivided mesh via a simple vertex ordering method, and each level of our TSS BVH is associated with a single distance bound, utilizing the Catmull‐Clark scheme. These features result in a linear space complexity as a function of the tree depth, while many prior BVHs have exponential space complexity. We have tested our method against different benchmarks with path tracing and photon mapping. We found that our method achieves up to two orders of magnitude of memory reduction with a high culling ratio over the prior AABB BVH methods, when we represent models with two to four subdivision levels. Overall, our method achieves three times performance improvement thanks to these results. These results are acquired by our theorem that rigorously computes our TSS bounding volumes.  相似文献   

We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data‐parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth‐first frustum traversal through a bounding volume hierarchy for the scene, and localized ray‐primitive intersections. We utilize the well known parallel primitives scan and segmented scan in order to process irregular data structures, to remove the need for a stack, and to minimize branch divergence in all stages. Our ray sorting stage is based on applying hash values to individual rays, ray stream compression, sorting and decompression. Our breadth‐first BVH traversal is based on parallel frustum‐bounding box intersection tests and parallel scan per each BVH level. We demonstrate our algorithm with area light sources to get a soft shadow effect and show that our concept is reasonable for GPU implementation. For the same data sets and ray‐primitive intersection routines our pipeline is ~3x faster than an optimized standard depth first ray tracing implemented in one kernel.  相似文献   

Raytracing metaballs is a problem that has numerous applications in the rendering of dynamic soft objects such as fluids. However, current techniques are either limited in the visual effects that they can render or their performance drops as the number of metaballs and their density increase. We present a new acceleration structure based on BVH and kd‐tree for efficient raytracing of a large number of metaballs. This structure is built from an adapted SAH using a fast greedy algorithm and allows the visualization of several hundreds of thousands metaballs at interactive‐to‐real‐time framerates. Our method can handle arbitrary rays to simulate any complex secondary effects such as reflections or soft shadows, and is robust with respect to the density of metaballs. We achieve this performance thanks to a balanced CPU‐GPU (using CUDA) implementation of the animation, structure creation, and rendering.  相似文献   

In this paper we present a hybrid algorithm for building the bounding volume hierarchy (BVH) that is used in accelerating ray tracing of animated models. This algorithm precomputes densely packed clusters of triangles on surfaces. Folowing that, a set of clusters is used to rebuild the BVH in every frame. Our approach utilizes the assumption that groups of connected triangles remain connected throughout the course of the animation. We introduce a novel heuristic to create triangle clusters that are designed for high performance ray tracing. This heuristic combines the density of connectivity, geometric size and the shape of the cluster.
Our approach accelerates the BVH builder by an order of magnitude rebuilding only the set of clusters that is much smaller than the original set of triangles. The speed-up is achieved against a 'brute-force' BVH builder that repartitions all triangles in every frame of animation without using any pre-clustering. The rendering performance is not affected when a cluster contains a few dozen triangles. We demonstrate the real-time/interactive ray tracing performance for highly-dynamic complex models.  相似文献   

We propose a unified rendering approach that jointly handles motion and defocus blur for transparent and opaque objects at interactive frame rates. Our key idea is to create a sampled representation of all parts of the scene geometry that are potentially visible at any point in time for the duration of a frame in an initial rasterization step. We store the resulting temporally‐varying fragments (t‐fragments) in a bounding volume hierarchy which is rebuild every frame using a fast spatial median construction algorithm. This makes our approach suitable for interactive applications with dynamic scenes and animations. Next, we perform spatial sampling to determine all t‐fragments that intersect with a specific viewing ray at any point in time. Viewing rays are sampled according to the lens uv‐sampling for depth‐of‐field effects. In a final temporal sampling step, we evaluate the predetermined viewing ray/t‐fragment intersections for one or multiple points in time. This allows us to incorporate all standard shading effects including transparency. We describe the overall framework, present our GPU implementation, and evaluate our rendering approach with respect to scalability, quality, and performance.  相似文献   

We present a new SAH guided approach to subdividing triangles as the scene is coarsely partitioned into smaller sets of spatially coherent triangles. Our triangle split approach is integrated into the partitioning stage of a fast BVH construction algorithm, but may as well be used as a stand alone pre‐split pass. Our algorithm significantly reduces the number of split triangles compared to previous methods, while at the same time improving ray tracing performance compared to competing fast BVH construction techniques. We compare performance on Intel's Embree ray tracer and show that BVH construction with our splitting algorithm is always faster than Embree's pre‐split construction algorithm. We also show that our algorithm builds significantly improved quality trees that deliver higher ray tracing performance. Our algorithm is implemented into Embree's open source ray tracing framework, and the source code will be released late 2015.  相似文献   

We present a new data structure for object space partitioning that can be represented completely implicitly. The bounds of each node in the tree structure are recreated at run‐time from the scene objects contained therein. By applying a presorting procedure to the geometry, only a known fraction of the geometry is needed to locate the bounding planes of any node. We evaluate the impact of the implicit bounding plane representation and compare our algorithm to a classic bounding volume hierarchy. Though the representation is completely implicit, we still achieve interactive frame rates on commodity hardware.  相似文献   

We present a novel method for massively parallel hierarchical scene processing on the GPU, which is based on sequential decomposition of the given hierarchical algorithm into small functional blocks. The computation is fully managed by the GPU using a specialized task pool which facilitates synchronization and communication of processing units. We present two applications of the proposed approach: construction of the bounding volume hierarchies and collision detection based on divide‐and‐conquer ray tracing. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization.  相似文献   

We present a performance comparison of bounding volume hierarchies and kd‐trees for ray tracing on many‐core architectures (GPUs). The comparison is focused on rendering times and traversal characteristics on the GPU using data structures that were optimized for very high performance of tracing rays. To achieve low rendering times, we extensively examine the constants used in termination criteria for the two data structures. We show that for a contemporary GPU architecture (NVIDIA Kepler) bounding volume hierarchies have higher ray tracing performance than kd‐trees for simple and moderately complex scenes. On the other hand, kd‐trees have higher performance for complex scenes, in particular for those with high depth complexity. Finally, we analyse the causes of the performance discrepancies using the profiling characteristics of the ray tracing kernels.  相似文献   

We present a photon mapping technique capable of computing high quality global illumination at interactive frame rates. By extending the concept of photon differentials to efficiently handle diffuse reflections, we generate footprints at all photon hit points. These enable illumination reconstruction by density estimation with variable kernel bandwidths without having to locate the k nearest photon hits first. Adapting an efficient BVH construction process for ray tracing acceleration, we build photon maps that enable the fast retrieval of all hits relevant to a shading point. We present a heuristic that automatically tunes the BVH build's termination criterion to the scene and illumination conditions. As all stages of the algorithm are highly parallelizable, we demonstrate an implementation using NVidia's CUDA manycore architecture running at interactive rates on a single GPU. Both light source and camera may be freely moved with global illumination fully recalculated in each frame.  相似文献   

We present novel parallel algorithms for collision detection and separation distance computation for rigid and deformable models that exploit the computational capabilities of many‐core GPUs. Our approach uses thread and data parallelism to perform fast hierarchy construction, updating, and traversal using tight‐fitting bounding volumes such as oriented bounding boxes (OBB) and rectangular swept spheres (RSS). We also describe efficient algorithms to compute a linear bounding volume hierarchy (LBVH) and update them using refitting methods. Moreover, we show that tight‐fitting bounding volume hierarchies offer improved performance on GPU‐like throughput architectures. We use our algorithms to perform discrete and continuous collision detection including self‐collisions, as well as separation distance computation between non‐overlapping models. In practice, our approach (gProximity) can perform these queries in a few milliseconds on a PC with NVIDIA GTX 285 card on models composed of tens or hundreds of thousands of triangles used in cloth simulation, surgical simulation, virtual prototyping and N‐body simulation. Moreover, we observe more than an order of magnitude performance improvement over prior GPU‐based algorithms.  相似文献   

Modern supercomputers enable increasingly large N‐body simulations using unstructured point data. The structures implied by these points can be reconstructed implicitly. Direct volume rendering of radial basis function (RBF) kernels in domain‐space offers flexible classification and robust feature reconstruction, but achieving performant RBF volume rendering remains a challenge for existing methods on both CPUs and accelerators. In this paper, we present a fast CPU method for direct volume rendering of particle data with RBF kernels. We propose a novel two‐pass algorithm: first sampling the RBF field using coherent bounding hierarchy traversal, then subsequently integrating samples along ray segments. Our approach performs interactively for a range of data sets from molecular dynamics and astrophysics up to 82 million particles. It does not rely on level of detail or subsampling, and offers better reconstruction quality than structured volume rendering of the same data, exhibiting comparable performance and requiring no additional preprocessing or memory footprint other than the BVH. Lastly, our technique enables multi‐field, multi‐material classification of particle data, providing better insight and analysis.  相似文献   

In this paper, we present a new approach for shape‐grammar‐based generation and rendering of huge cities in real‐time on the graphics processing unit (GPU). Traditional approaches rely on evaluating a shape grammar and storing the geometry produced as a preprocessing step. During rendering, the pregenerated data is then streamed to the GPU. By interweaving generation and rendering, we overcome the problems and limitations of streaming pregenerated data. Using our methods of visibility pruning and adaptive level of detail, we are able to dynamically generate only the geometry needed to render the current view in real‐time directly on the GPU. We also present a robust and efficient way to dynamically update a scene's derivation tree and geometry, enabling us to exploit frame‐to‐frame coherence. Our combined generation and rendering is significantly faster than all previous work. For detailed scenes, we are capable of generating geometry more rapidly than even just copying pregenerated data from main memory, enabling us to render cities with thousands of buildings at up to 100 frames per second, even with the camera moving at supersonic speed.  相似文献   

We present an efficient algorithm for building an adaptive bounding volume hierarchy (BVH) in linear time on commodity graphics hardware using CUDA. BVHs are widely used as an acceleration data structure to quickly ray trace animated polygonal scenes. We accelerate the construction process with auxiliary grids that help us build high quality BVHs with SAH in O(k?n). We partition scene triangles and build a temporary grid structure only once. We also handle non-uniformly tessellated and long/thin triangles that we split into several triangle references with tight bounding box approximations. We make no assumptions on the type of geometry or animation motion. However, our algorithm takes advantage of coherent geometry layout and coherent frame-by-frame motion. We demonstrate the performance and quality of resulting BVHs that are built quickly with good spatial partitioning.  相似文献   

In this paper we propose a simple but effective method to modify a BVH based on ray distribution for improved ray tracing performance. Our method starts with an initial BVH generated by any state‐of‐the‐art offline algorithm. Then by traversing a small set of sample rays we collect statistics at each node of the BVH. Finally, a simple but ultra‐fast BVH contraction algorithm modifies the initial binary BVH to a multi‐way BVH. The overall acceleration for ray‐primitive testing is about 25% for incoherent diffuse rays and 30% for shadow rays, which is significant as a data structure optimization. Similar results are also presented for packet ray tracing, and for Quad‐BVHs the improvement is 10% to 15%. The approach has the advantages of being simple, and compatible with almost any existing BVH and ray tracing techniques, and it require very little extra work to generate the modified tree.  相似文献   

The generation of inbetween frames that interpolate a given set of key frames is a major component in the production of a 2D feature animation. Our objective is to considerably reduce the cost of the inbetweening phase by offering an intuitive and effective interactive environment that automates inbetweening when possible while allowing the artist to guide, complement, or override the results. Tight inbetweens, which interpolate similar key frames, are particularly time‐consuming and tedious to draw. Therefore, we focus on automating these high‐precision and expensive portions of the process. We have designed a set of user‐guided semi‐automatic techniques that fit well with current practice and minimize the number of required artist‐gestures. We present a novel technique for stroke interpolation from only two keys which combines a stroke motion constructed from logarithmic spiral vertex trajectories with a stroke deformation based on curvature averaging and twisting warps. We discuss our system in the context of a feature animation production environment and evaluate our approach with real production data.  相似文献   

Variable bit rate compression can achieve better quality and compression rates than fixed bit rate methods. None the less, GPU texturing uses lossy fixed bit rate methods like DXT to allow random access and on‐the‐fly decompression during rendering. Changes in games and GPUs since DXT was developed make its compression artifacts less acceptable, and texture bandwidth less of an issue, but texture size is a serious and growing problem. Games use a large total volume of texture data, but have a much smaller active set. We present a new paradigm that separates GPU decompression from rendering. Rendering is from uncompressed data, avoiding the need for random access decompression. We demonstrate this paradigm with a new variable bit rate lossy texture compression algorithm that is well suited to the GPU, including a new GPU‐friendly formulation of range decoding, and a new texture compression scheme averaging 12.4:1 lossy compression ratio on 471 real game textures with a quality level similar to traditional DXT compression. The total game texture set are stored in the GPU in compressed form, and decompressed for use in a fraction of a second per scene.  相似文献   

For ray tracing based methods, traversing a hierarchical acceleration data structure takes up a substantial portion of the total rendering time. We propose an additional data structure which cuts off large parts of the hierarchical traversal. We use the idea of ray classification combined with the hierarchical scene representation provided by a bounding volume hierarchy. We precompute short arrays of indices to subtrees inside the hierarchy and use them to initiate the traversal for a given ray class. This arrangement is compact enough to be cache‐friendly, preventing the method from negating its traversal gains by excessive memory traffic. The method is easy to use with existing renderers which we demonstrate by integrating it to the PBRT renderer. The proposed technique reduces the number of traversal steps by 42% on average, saving around 15% of time of finding ray‐scene intersection on average.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号