首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Special relativistic visualization offers the possibility of experiencing the optical effects of traveling near the speed of light, including apparent geometric distortions as well as Doppler and searchlight effects. Early high-quality computer graphics images of relativistic scenes were created using offline, computationally expensive CPU-side 4D ray tracing. Alternate approaches such as image-based rendering and polygon-distortion methods are able to achieve interactivity, but exhibit inferior visual quality due to sampling artifacts. In this paper, we introduce a hybrid rendering technique based on polygon distortion and local ray tracing that facilitates interactive high-quality visualization of multiple objects moving at relativistic speeds in arbitrary directions. The method starts by calculating tight image-space footprints for the apparent triangles of the 3D scene objects. The final image is generated using a single image-space ray tracing step incorporating Doppler and searchlight effects. Our implementation uses GPU shader programming and hardware texture filtering to achieve high rendering speed.  相似文献   

2.
This paper proposes an adaptive rendering technique for ray‐bundle tracing. Ray‐bundle tracing can be done by per‐pixel linked‐list construction on a GPU rasterization pipeline. This rasterization based approach offers significant benefits for the efficient generation of light maps (e.g., hardware acceleration, tessellation, and recycling of shaders used in real‐time graphics). However, it is inapplicable to large and complex scenes due to the limited capacity of the GPU memory because it requires a high‐resolution frame buffer and high‐capacity node buffer for the linked‐lists. In addition, memory overflow can potentially occur on the per‐pixel linked‐list since the memory usage of the lists is usually unknown before the rendering process. We introduce an adaptive tiling technique with memory usage prediction. Our method uses an appropriately tiled frame buffer, thus eliminating almost all of the overflow risks thanks to our adaptive tile subdivision scheme. Using this technique, we are able to render high‐quality light maps of large and complex scenes which cannot be computed using previous ray‐bundle based methods.  相似文献   

3.
We present a novel highly parallel method for optimizing bounding volume hierarchies (BVH) targeting contemporary GPU architectures. The core of our method is based on the insertion‐based BVH optimization that is known to achieve excellent results in terms of the SAH cost. The original algorithm is, however, inherently sequential: no efficient parallel version of the method exists, which limits its practical utility. We reformulate the algorithm while exploiting the observation that there is no need to remove the nodes from the BVH prior to finding their optimized positions in the tree. We can search for the optimized positions for all nodes in parallel while simultaneously tracking the corresponding SAH cost reduction. We update in parallel all nodes for which better position was found while efficiently handling potential conflicts during these updates. We implemented our algorithm in CUDA and evaluated the resulting BVH in the context of the GPU ray tracing. The results indicate that the method is able to achieve the best ray traversal performance among the state of the art GPU‐based BVH construction methods.  相似文献   

4.
We present an efficient algorithm for object‐space proximity queries between multiple deformable triangular meshes. Our approach uses the rasterization capabilities of the GPU to produce an image‐space representation of the vertices. Using this image‐space representation, inter‐object vertex‐triangle distances and closest points lying under a user‐defined threshold are computed in parallel by conservative rasterization of bounding primitives and sorted using atomic operations. We additionally introduce a similar technique to detect penetrating vertices. We show how mechanisms of modern GPUs such as mipmapping, Early‐Z and Early‐Stencil culling can optimize the performance of our method. Our algorithm is able to compute dense proximity information for complex scenes made of more than a hundred thousand triangles in real time, outperforming a CPU implementation based on bounding volume hierarchies by more than an order of magnitude.  相似文献   

5.
We present a performance comparison of bounding volume hierarchies and kd‐trees for ray tracing on many‐core architectures (GPUs). The comparison is focused on rendering times and traversal characteristics on the GPU using data structures that were optimized for very high performance of tracing rays. To achieve low rendering times, we extensively examine the constants used in termination criteria for the two data structures. We show that for a contemporary GPU architecture (NVIDIA Kepler) bounding volume hierarchies have higher ray tracing performance than kd‐trees for simple and moderately complex scenes. On the other hand, kd‐trees have higher performance for complex scenes, in particular for those with high depth complexity. Finally, we analyse the causes of the performance discrepancies using the profiling characteristics of the ray tracing kernels.  相似文献   

6.
Hierarchical culling is a key acceleration technique used to efficiently handle massive models for ray tracing, collision detection, etc. To support such hierarchical culling, bounding volume hierarchies (BVHs) combined with meshes are widely used. However, BVHs may require a very large amount of memory space, which can negate the benefits of using BVHs. To address this problem, we present a novel hierarchical‐culling oriented compact mesh representation, HCCMesh, which tightly integrates a mesh and a BVH together. As an in‐core representation of the HCCMesh, we propose an i‐HCCMesh representation that provides an efficient random hierarchical traversal and high culling efficiency with a small runtime decompression overhead. To further reduce the storage requirement, the in‐core representation is compressed to our out‐of‐core representation, o‐HCCMesh, by using a simple dictionary‐based compression method. At runtime, o‐HCCMeshes are fetched from an external drive and decompressed to the i‐HCCMeshes stored in main memory. The i‐HCCMesh and o‐HCCMesh show 3.6:1 and 10.4:1 compression ratios on average, compared to a naively compressed (e.g., quantized) mesh and BVH representation. We test the HCCMesh representations with ray tracing, collision detection, photon mapping, and non‐photorealistic rendering. Because of the reduced data access time, a smaller working set size, and a low runtime decompression overhead, we can handle models ten times larger in commodity hardware without the expensive disk I/O thrashing. When we avoid the disk I/O thrashing using our representation, we can improve the runtime performances by up to two orders of magnitude over using a naively compressed representation.  相似文献   

7.
We propose an efficient and robust image‐space denoising method for noisy images generated by Monte Carlo ray tracing methods. Our method is based on two new concepts: virtual flash images and homogeneous pixels. Inspired by recent developments in flash photography, virtual flash images emulate photographs taken with a flash, to capture various features of rendered images without taking additional samples. Using a virtual flash image as an edge‐stopping function, our method can preserve image features that were not captured well only by existing edge‐stopping functions such as normals and depth values. While denoising each pixel, we consider only homogeneous pixels—pixels that are statistically equivalent to each other. This makes it possible to define a stochastic error bound of our method, and this bound goes to zero as the number of ray samples goes to infinity, irrespective of denoising parameters. To highlight the benefits of our method, we apply our method to two Monte Carlo ray tracing methods, photon mapping and path tracing, with various input scenes. We demonstrate that using virtual flash images and homogeneous pixels with a standard denoising method outperforms state‐of‐the‐art image‐space denoising methods.  相似文献   

8.
Beam tracing combines the flexibility of ray tracing and the speed of polygon rasterization. However, beam tracing so far only handles linear transformations; thus, it is only applicable to linear effects such as planar mirror reflections but not to non‐linear effects such as curved mirror reflection, refraction, caustics and shadows. In this paper, we introduce non‐linear beam tracing to render these non‐linear effects. Non‐linear beam tracing is highly challenging because commodity graphics hardware supports only linear vertex transformation and triangle rasterization. We overcome this difficulty by designing a non‐linear graphics pipeline and implementing it on top of a commodity GPU. This allows beams to be non‐linear where rays within the same beam do not have to be parallel or intersect at a single point. Using these non‐linear beams, real‐time GPU applications can render secondary rays via polygon streaming similar to how they render primary rays. A major strength of this methodology is that it naturally supports fully dynamic scenes without the need to pre‐store a scene database. Utilizing our approach, non‐linear ray tracing effects can be rendered in real‐time on a commodity GPU under a unified framework.  相似文献   

9.
目前GPU计算能力让kD-Tree划分实时场景光线追踪并行算法的执行变得更具有可行性。图像处理器(GPU)高效应用于多边形的渲染,GPU内部单元的可编程性已经让其广泛应用于多边形渲染以外的领域。本文详细描述使用OpenCL的kD-Tree遍历算法,对运算占主要部分的相交测试作出改进,同时提高了GPU计算能力与存储器的利用率,从而提升了光线追踪算法效率。  相似文献   

10.
We propose a method for creating a bounding volume hierarchy (BVH) that is optimized for all frames of a given animated scene. The method is based on a novel extension of surface area heuristic to temporal domain (T‐SAH). We perform iterative BVH optimization using T‐SAH and create a single BVH accounting for scene geometry distribution at different frames of the animation. Having a single optimized BVH for the whole animation makes our method extremely easy to integrate to any application using BVHs, limiting the per‐frame overhead only to refitting the bounding volumes. We evaluated the T‐SAH optimized BVHs in the scope of real‐time GPU ray tracing. We demonstrate, that our method can handle even highly complex inputs with large deformations and significant topology changes. The results show, that in a vast majority of tested scenes our method provides significantly better run‐time performance than traditional SAH and also better performance than GPU based per‐frame BVH rebuild.  相似文献   

11.
Image space photon mapping has the advantage of simple implementation on GPU without pre‐computation of complex acceleration structures. However, existing approaches use only a single image for tracing caustic photons, so they are limited to computing only a part of the global illumination effects for very simple scenes. In this paper we fully extend the image space approach by using multiple environment maps for photon mapping computation to achieve interactive global illumination of dynamic complex scenes. The two key problems due to the introduction of multiple images are 1) selecting the images to ensure adequate scene coverage; and 2) reliably computing ray‐geometry intersections with multiple images. We present effective solutions to these problems and show that, with multiple environment maps, the image‐space photon mapping approach can achieve interactive global illumination of dynamic complex scenes. The advantages of the method are demonstrated by comparison with other existing interactive global illumination methods.  相似文献   

12.
At each shade point, the spherical visibility function encodes occlusion from surrounding geometry, in all directions. Computing this function is difficult and point‐sampling approaches, such as ray‐tracing or hardware shadow mapping, are traditionally used to efficiently approximate it. We propose a semi‐analytic solution to the problem where the spherical silhouette of the visibility is computed using a search over a 4D dual mesh of the scene. Once computed, we are able to semi‐analytically integrate visibility‐masked spherical functions along the visibility silhouette, instead of over the entire hemisphere. In this way, we avoid the artefacts that arise from using point‐sampling strategies to integrate visibility, a function with unbounded frequency content. We demonstrate our approach on several applications, including direct illumination from realistic lighting and computation of pre‐computed radiance transfer data. Additionally, we present a new frequency‐space method for exactly computing all‐frequency shadows on diffuse surfaces. Our results match ground truth computed using importance‐sampled stratified Monte Carlo ray‐tracing, with comparable performance on scenes with low‐to‐moderate geometric complexity.  相似文献   

13.
We describe a new technique for coherent out‐of‐core point‐based global illumination and ambient occlusion. Point‐based global illumination (PBGI) is used in production to render tremendously complex scenes, so in‐core storage of point and octree data structures quickly becomes a problem. However, a simple out‐of‐core extension of a classical top‐down octree building algorithm would be extremely inefficient due to large amount of I/O required. Our method extends previous PBGI algorithms with an out‐of‐core technique that uses minimal I/O and stores data on disk compactly and in coherent chunks for later access during shading. Using properties of a space‐filling Z‐curve, we are able to preprocess the data in two passes: an external ID‐sort and an octree construction pass.  相似文献   

14.
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data‐parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth‐first frustum traversal through a bounding volume hierarchy for the scene, and localized ray‐primitive intersections. We utilize the well known parallel primitives scan and segmented scan in order to process irregular data structures, to remove the need for a stack, and to minimize branch divergence in all stages. Our ray sorting stage is based on applying hash values to individual rays, ray stream compression, sorting and decompression. Our breadth‐first BVH traversal is based on parallel frustum‐bounding box intersection tests and parallel scan per each BVH level. We demonstrate our algorithm with area light sources to get a soft shadow effect and show that our concept is reasonable for GPU implementation. For the same data sets and ray‐primitive intersection routines our pipeline is ~3x faster than an optimized standard depth first ray tracing implemented in one kernel.  相似文献   

15.
We present an interactive GPU‐based algorithm for accurately rendering high‐quality, dynamic glossy reflection effects from both HDR environment maps and local scene objects. Our method uses hardware rasterization to produce primary pixels, and GPU‐based BRDF importance sampling [ [CK07] ] to quickly generate reflected rays. We utilize a fast GPU ray tracer proposed by Carr et al. [ [CHCH06] ] to compute reflection hits. Our main contribution is an adaptive level‐of‐detail (LOD) control algorithm that greatly improves ray tracing performance during reflection shading. Specifically, we use the solid angle represented by each reflected ray to adaptively pick the level of termination in the BVH traversal step during ray tracing. This leads to 2 ~ 3x speedup over an unmodified implementation of [ [CHCH06] ]. Based on the same solid angle measure, we derive a texture filtering formula to reduce reflection aliasing artifacts, taking advantage of hardware MIP mapping. This extends the filtering algorithm presented in [ [CK07] ] from environment mapping to local scene reflection. Using our algorithm, we demonstrate interactive rendering rates for several scenes featuring dynamic lighting and material changes, spatially varying BRDF parameters, and rigid‐body object movement.  相似文献   

16.
Stackless KD-Tree Traversal for High Performance GPU Ray Tracing   总被引:1,自引:1,他引:1  
Significant advances have been achieved for realtime ray tracing recently, but realtime performance for complex scenes still requires large computational resources not yet available from the CPUs in standard PCs. Incidentally, most of these PCs also contain modern GPUs that do offer much larger raw compute power. However, limitations in the programming and memory model have so far kept the performance of GPU ray tracers well below that of their CPU counterparts. In this paper we present a novel packet ray traversal implementation that completely eliminates the need for maintaining a stack during kd-tree traversal and that reduces the number of traversal steps per ray. While CPUs benefit moderately from the stackless approach, it improves GPU performance significantly. We achieve a peak performance of over 16 million rays per second for reasonably complex scenes, including complex shading and secondary rays. Several examples show that with this new technique GPUs can actually outperform equivalent CPU based ray tracers.  相似文献   

17.
This paper reviews the latest developments of displacement mapping algorithms implemented on the vertex, geometry, and fragment shaders of graphics cards. Displacement mapping algorithms are classified as per‐vertex and per‐pixel methods. Per‐pixel approaches are further categorized as safe algorithms that aim at correct solutions in all cases, to unsafe techniques that may fail in extreme cases but are usually much faster than safe algorithms, and to combined methods that exploit the robustness of safe and the speed of unsafe techniques. We discuss the possible roles of vertex, geometry and fragment shaders to implement these algorithms. Then the particular GPU‐based bump, parallax, relief, sphere, horizon mapping, cone stepping, local ray tracing, pyramidal and view‐dependent displacement mapping methods, as well as their numerous variations are reviewed providing also implementation details of the shader programs. We present these methods using uniform notations and also point out when different authors called similar concepts differently. In addition to basic displacement mapping, self‐shadowing and silhouette processing are also reviewed. Based on our experiences gained having reimplemented these methods, their performance and quality are compared, and the advantages and disadvantages are fairly presented.  相似文献   

18.
This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few pixels. In contrast, the shader core counts and increasing GFLOPs in modern GPUs clearly suggests parallelizing this computation entirely across multiple shader threads, making use of the powerful wide-ALU instructions. In this paper, we present a very efficient SIMD-like rasterization code targeted at very small triangles that scales very well with the number of shader cores and has higher performance than traditional edge equation based algorithms. We have extended the ATTILA GPU shader ISA (del Barrioet al. in IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241, 2006) with two fixed point instructions to meet the rasterization precision requirement. This paper also introduces a novel subpixel Bounding Box size optimization that adjusts the bounds much more finely, which is critical for small triangles, and doubles the 2×2-pixel stamp test efficiency. The proposed shader rasterization program can run on top of the original pixel shader program in such a way that selected fragments are rasterized, attribute interpolated and pixel shaded in the same pass. Our results show that our technique yields better performance than a classic rasterizer at 8 or more shader cores, with speedups as high as 4× for 16 shader cores.  相似文献   

19.
We present a real‐time rendering algorithm for inhomogeneous, single scattering media, where all‐frequency shading effects such as glows, light shafts, and volumetric shadows can all be captured. The algorithm first computes source radiance at a small number of sample points in the medium, then interpolates these values at other points in the volume using a gradient‐based scheme that is efficiently applied by sample splatting. The sample points are dynamically determined based on a recursive sample splitting procedure that adapts the number and locations of sample points for accurate and efficient reproduction of shading variations in the medium. The entire pipeline can be easily implemented on the GPU to achieve real‐time performance for dynamic lighting and scenes. Rendering results of our method are shown to be comparable to those from ray tracing.  相似文献   

20.
Voxel‐based approaches are today's standard to encode volume data. Recently, directed acyclic graphs (DAGs) were successfully used for compressing sparse voxel scenes as well, but they are restricted to a single bit of (geometry) information per voxel. We present a method to compress arbitrary data, such as colors, normals, or reflectance information. By decoupling geometry and voxel data via a novel mapping scheme, we are able to apply the DAG principle to encode the topology, while using a palette‐based compression for the voxel attributes, leading to a drastic memory reduction. Our method outperforms existing state‐of‐the‐art techniques and is well‐suited for GPU architectures. We achieve real‐time performance on commodity hardware for colored scenes with up to 17 hierarchical levels (a 128K3voxel resolution), which are stored fully in core.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号