首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We introduce a novel flexible approach to spatiotemporal exploration of rectilinear scalar volumes. Our out‐of‐core representation, based on per‐frame levels of hierarchically tiled non‐redundant 3D grids, efficiently supports spatiotemporal random access and streaming to the GPU in compressed formats. A novel low‐bitrate codec able to store into fixed‐size pages a variable‐rate approximation based on sparse coding with learned dictionaries is exploited to meet stringent bandwidth constraint during time‐critical operations, while a near‐lossless representation is employed to support high‐quality static frame rendering. A flexible high‐speed GPU decoder and raycasting framework mixes and matches GPU kernels performing parallel object‐space and image‐space operations for seamless support, on fat and thin clients, of different exploration use cases, including animation and temporal browsing, dynamic exploration of single frames, and high‐quality snapshots generated from near‐lossless data. The quality and performance of our approach are demonstrated on large data sets with thousands of multi‐billion‐voxel frames.  相似文献   

2.
We introduce efficient, large scale fluid simulation on GPU hardware using the fluid‐implicit particle (FLIP) method over a sparse hierarchy of grids represented in NVIDIA® GVDB Voxels. Our approach handles tens of millions of particles within a virtually unbounded simulation domain. We describe novel techniques for parallel sparse grid hierarchy construction and fast incremental updates on the GPU for moving particles. In addition, our FLIP technique introduces sparse, work efficient parallel data gathering from particle to voxel, and a matrix‐free GPU‐based conjugate gradient solver optimized for sparse grids. Our results show that our method can achieve up to an order of magnitude faster simulations on the GPU as compared to FLIP simulations running on the CPU.  相似文献   

3.
In this paper, we present the first algorithm for progressive sampling of 3D surfaces with blue noise characteristics that runs entirely on the GPU. The performance of our algorithm is comparable to state‐of‐the‐art GPU Poisson‐disk sampling methods, while additionally producing ordered sequences of samples where every prefix exhibits good blue noise properties. The basic idea is, to reduce the 3D sampling domain to a set of 2.5D images which we sample in parallel utilizing the rasterization hardware of current GPUs. This allows for simple visibility‐aware sampling that only captures the surface as seen from outside the sampled object, which is especially useful for point‐based level‐of‐detail rendering methods. However, our method can be easily extended for sampling the entire surface without changing the basic algorithm. We provide a statistical analysis of our algorithm and show that it produces good blue noise characteristics for every prefix of the resulting sample sequence and analyze the performance of our method compared to related state‐of‐the‐art sampling methods.  相似文献   

4.
Bounding volume hierarchy (BVH) has been widely adopted as the acceleration structure in broad‐phase collision detection. Previous state‐of‐the‐art BVH‐based collision detection approaches exploited the spatio‐temporal coherence of simulations by maintaining a bounding volume test tree (BVTT) front. A major drawback of these algorithms is that large deformations in the scenes decrease culling efficiency and slow down collision queries. Moreover, for front‐based methods, the inefficient caching on GPU caused by the arbitrary layout of BVH and BVTT front nodes becomes a critical performance issue. We present a fast and robust BVH‐based collision detection scheme on GPU that addresses the above problems by ordering and restructuring BVHs and BVTT fronts. Our techniques are based on the use of histogram sort and an auxiliary structure BVTT front log, through which we analyze the dynamic status of BVTT front and BVH quality. Our approach efficiently handles inter‐ and intra‐object collisions and performs especially well in simulations where there is considerable spatio‐temporal coherence. The benchmark results demonstrate that our approach is significantly faster than the previous BVH‐based method, and also outperforms other state‐of‐the‐art spatial subdivision schemes in terms of speed.  相似文献   

5.
Cloth simulations, widely used in computer animation and apparel design, can be computationally expensive for real‐time applications. Some parallelization techniques have been proposed for visual simulation of cloth using CPU or GPU clusters and often rely on parallelization using spatial domain decomposition techniques that have a large communication overhead. In this paper, we propose a novel time‐domain parallelization technique that makes use of the two‐level mesh representation to resolve the time‐dependency issue and develop a practical algorithm to smooth the state transition from the corresponding coarse to fine meshes. A load estimation and a load balancing technique used in online partitioning are also proposed to maximize the performance acceleration. Our method achieves a nearly linear performance scaling on manycore clusters and outperforms spatial‐domain parallelization on a diverse set of benchmarks.  相似文献   

6.
We introduce a novel method for interactive generation of visually consistent, snow‐covered landscapes and provide control of their dynamic evolution over time. Our main contribution is the real‐time phenomenological simulation of avalanches and other user‐guided events, such as tracks left by Nordic skiing, which can be applied to interactively sculpt the landscape. The terrain is modeled as a height field with additional layers for stable, compacted, unstable, and powdery snow, which behave in combination as a semi‐viscous fluid. We incorporate the impact of several phenomena, including sunlight, temperature, prevailing wind direction, and skiing activities. The snow evolution includes snow‐melt and snow‐drift, which affect stability of the snow mass and the probability of avalanches. A user can shape landscapes and their evolution either with a variety of interactive brushes, or by prescribing events along a winter season time‐line. Our optimized GPU‐implementation allows interactive updates of snow type and depth across a large (10 × 10 km) terrain, including real‐time avalanches, making this suitable for visual assets in computer games. We evaluate our method through perceptual comparison against exiting methods and real snow‐depth data.  相似文献   

7.
We propose a method for creating a bounding volume hierarchy (BVH) that is optimized for all frames of a given animated scene. The method is based on a novel extension of surface area heuristic to temporal domain (T‐SAH). We perform iterative BVH optimization using T‐SAH and create a single BVH accounting for scene geometry distribution at different frames of the animation. Having a single optimized BVH for the whole animation makes our method extremely easy to integrate to any application using BVHs, limiting the per‐frame overhead only to refitting the bounding volumes. We evaluated the T‐SAH optimized BVHs in the scope of real‐time GPU ray tracing. We demonstrate, that our method can handle even highly complex inputs with large deformations and significant topology changes. The results show, that in a vast majority of tested scenes our method provides significantly better run‐time performance than traditional SAH and also better performance than GPU based per‐frame BVH rebuild.  相似文献   

8.
We present a performance comparison of bounding volume hierarchies and kd‐trees for ray tracing on many‐core architectures (GPUs). The comparison is focused on rendering times and traversal characteristics on the GPU using data structures that were optimized for very high performance of tracing rays. To achieve low rendering times, we extensively examine the constants used in termination criteria for the two data structures. We show that for a contemporary GPU architecture (NVIDIA Kepler) bounding volume hierarchies have higher ray tracing performance than kd‐trees for simple and moderately complex scenes. On the other hand, kd‐trees have higher performance for complex scenes, in particular for those with high depth complexity. Finally, we analyse the causes of the performance discrepancies using the profiling characteristics of the ray tracing kernels.  相似文献   

9.
We present a level of detail (LOD) method designed for tree branches. It can be combined with methods for processing tree foliage to facilitate navigation through large virtual forests. Starting from a skeletal representation of a tree, we fit polygon meshes of various densities to the skeleton while the mesh density is adjusted according to the required visual fidelity. For distant models, these branch meshes are gradually replaced with semi‐transparent lines until the tree recedes to a few lines. Construction of these complete LOD models is guided by error metrics to ensure smooth transitions between adjacent LOD models. We then present an instancing technique for discrete LOD branch models, consisting of polygon meshes plus semi‐transparent lines. Line models with different transparencies are instanced on the GPU by merging multiple tree samples into a single model. Our technique reduces the number of draw calls in GPU and increases rendering performance. Our experiments demonstrate that large‐scale forest scenes can be rendered with excellent detail and shadows in real time.  相似文献   

10.
We present a flexible and highly efficient hardware‐assisted volume renderer grounded on the original Projected Tetrahedra (PT) algorithm. Unlike recent similar approaches, our method is exclusively based on the rasterization of simple geometric primitives and takes full advantage of graphics hardware. Both vertex and geometry shaders are used to compute the tetrahedral projection, while the volume ray integral is evaluated in a fragment shader; hence, volume rendering is performed entirely on the GPU within a single pass through the pipeline. We apply a CUDA‐based visibility ordering achieving rendering and sorting performance of over 6 M Tet/s for unstructured datasets. Furthermore, as each tetrahedron is processed independently, we employ a data‐parallel solution which is neither bound by GPU memory size nor does it rely on auxiliary volume information. In addition, iso‐surfaces can be readily extracted during the rendering process, and time‐varying data are handled without extra burden.  相似文献   

11.
Variable bit rate compression can achieve better quality and compression rates than fixed bit rate methods. None the less, GPU texturing uses lossy fixed bit rate methods like DXT to allow random access and on‐the‐fly decompression during rendering. Changes in games and GPUs since DXT was developed make its compression artifacts less acceptable, and texture bandwidth less of an issue, but texture size is a serious and growing problem. Games use a large total volume of texture data, but have a much smaller active set. We present a new paradigm that separates GPU decompression from rendering. Rendering is from uncompressed data, avoiding the need for random access decompression. We demonstrate this paradigm with a new variable bit rate lossy texture compression algorithm that is well suited to the GPU, including a new GPU‐friendly formulation of range decoding, and a new texture compression scheme averaging 12.4:1 lossy compression ratio on 471 real game textures with a quality level similar to traditional DXT compression. The total game texture set are stored in the GPU in compressed form, and decompressed for use in a fraction of a second per scene.  相似文献   

12.
Modern 3D capture pipelines produce dense surface meshes at high speed, which challenge geometric operators to process such massive data on‐the‐fly. In particular, aiming at instantaneous feature‐preserving smoothing and clustering disqualifies global variational optimizers and one usually relies on high‐performance parallel kernels based on simple measures performed on the positions and normal vectors associated with the surface vertices. Although these operators are effective on small supports, they fail at properly capturing larger scale surface structures. To cope with this problem, we propose to enrich the surface representation with filtered quadrics, a compact and discriminating range space to guide processing. Compared to normal‐based approaches, this additional vertex attribute significantly improves feature preservation for fast bilateral filtering and mode‐seeking clustering, while exhibiting a linear memory cost in the number of vertices and retaining the simplicity of convolutional filters. In particular, the overall performance of our approach stems from its natural compatibility with modern fine‐grained parallel computing architectures such as graphics processor units (GPU). As a result, filtered quadrics offer a superior ability to handle a broad spectrum of frequencies and preserve large salient structures, delivering meshes on‐the‐fly for interactive and streaming applications, as well as quickly processing large data collections, instrumental in learning‐based geometry analysis.  相似文献   

13.
Zippy: A Framework for Computation and Visualization on a GPU Cluster   总被引:1,自引:0,他引:1  
Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general‐purpose computation and visualization applications. However, the programming model for high performance general‐purpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy frame‐work, a general and scalable solution to this problem. It abstracts the GPU cluster programming with a two‐level parallelism hierarchy and a non‐uniform memory access (NUMA) model. Zippy preserves the advantages of both message passing and shared‐memory models. It employs global arrays (GA) to simplify the communication, synchronization, and collaboration among multiple GPUs. Moreover, it exposes data locality to the programmer for optimal performance and scalability. We present three example applications developed with Zippy: sort‐last volume rendering, Marching Cubes isosurface extraction and rendering, and lattice Boltzmann flow simulation with online visualization. They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster.  相似文献   

14.
We present an approach for the automatic generation, interactive exploration and real‐time modification of disassembly procedures for complex, multipartite CAD data sets. In order to lift the performance barriers prohibiting interactive disassembly planning, we run a detailed analysis on the input model to identify recurring part constellations and efficiently determine blocked part motions in parallel on the GPU. Building on the extracted information, we present an interface for computing and editing extensive disassembly sequences in real‐time while considering user‐defined constraints and avoiding unstable configurations. To evaluate the performance of our C++/CUDA implementation, we use a variety of openly available CAD data sets, ranging from simple to highly complex. In contrast to previous approaches, our work enables interactive disassembly planning for objects which consist of several thousand parts and require cascaded translations during part removal.  相似文献   

15.
Displacement mapping is routinely used to add geometric details in a fast and easy‐to‐control way, both in offline rendering as well as recently in interactive applications such as games. However, it went largely unnoticed (with the exception of McGuire and Whitson [MW08]) that, when applying displacement mapping to a surface with a low‐distortion parametrization, this parametrization is distorted as the geometry was changed by the displacement mapping. Typical resulting artifacts are “rubber band”‐like distortion patterns in areas of strong displacement change where a small isotropic area in texture space is mapped to a large anisotropic area in world space. We describe a fast, fully GPU‐based two‐step procedure to resolve this problem. First, a correction deformation is computed from the displacement map. Second, two variants to apply this correction when computing displacement mapping are proposed. The first variant is backward‐compatible and can resolve the artifact in any rendering pipeline without modifying it and without requiring additional computation at render time, but only works for bijective parametrizations. The second variant works for more general parametrizations, but requires to modify the rendering code and incurs a very small computational overhead.  相似文献   

16.
Bidirectional Texture Functions (BTFs) are among the highest quality material representations available today and thus well suited whenever an exact reproduction of the appearance of a material or complete object is required. In recent years, BTFs have started to find application in various industrial settings and there is also a growing interest in the cultural heritage domain. BTFs are usually measured from real‐world samples and easily consist of tens or hundreds of gigabytes. By using data‐driven compression schemes, such as matrix or tensor factorization, a more compact but still faithful representation can be derived. This way, BTFs can be employed for real‐time rendering of photo‐realistic materials on the GPU. However, scenes containing multiple BTFs or even single objects with high‐resolution BTFs easily exceed available GPU memory on today's consumer graphics cards unless quality is drastically reduced by the compression. In this paper, we propose the Bidirectional Sparse Virtual Texture Function, a hierarchical level‐of‐detail approach for the real‐time rendering of large BTFs that requires only a small amount of GPU memory. More importantly, for larger numbers or higher resolutions, the GPU and CPU memory demand grows only marginally and the GPU workload remains constant. For this, we extend the concept of sparse virtual textures by choosing an appropriate prioritization, finding a trade off between factorization components and spatial resolution. Besides GPU memory, the high demand on bandwidth poses a serious limitation for the deployment of conventional BTFs. We show that our proposed representation can be combined with an additional transmission compression and then be employed for streaming the BTF data to the GPU from from local storage media or over the Internet. In combination with the introduced prioritization this allows for the fast visualization of relevant content in the users field of view and a consecutive progressive refinement.  相似文献   

17.
Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
Multi‐Light Image Collections (MLICs), i.e., stacks of photos of a scene acquired with a fixed viewpoint and a varying surface illumination, provide large amounts of visual and geometric information. In this survey, we provide an up‐to‐date integrative view of MLICs as a mean to gain insight on objects through the analysis and visualization of the acquired data. After a general overview of MLICs capturing and storage, we focus on the main approaches to produce representations usable for visualization and analysis. In this context, we first discuss methods for direct exploration of the raw data. We then summarize approaches that strive to emphasize shape and material details by fusing all acquisitions in a single enhanced image. Subsequently, we focus on approaches that produce relightable images through intermediate representations. This can be done both by fitting various analytic forms of the light transform function, or by locally estimating the parameters of physically plausible models of shape and reflectance and using them for visualization and analysis. We finally review techniques that improve object understanding by using illustrative approaches to enhance relightable models, or by extracting features and derived maps. We also review how these methods are applied in several, main application domains, and what are the available tools to perform MLIC visualization and analysis. We finally point out relevant research issues, analyze research trends, and offer guidelines for practical applications.  相似文献   

19.
In this paper, we provide a smooth extension of the energy aware Gauss‐Seidel iteration to the Position‐Based Dynamics (PBD) method. This extension is inspired by the kinetic and potential energy changes equalization and uses the foundations of the recent extended version of PBD algorithm (XPBD). The proposed method is not meant to conserve the total energy of the system and modifies each position constraint based on the equality of the kinetic and potential energy changes within the Gauss‐Seidel process of the XPBD algorithm. Our extension provides an implicit solution for relatively better stiffness during the simulation of elastic objects. We apply our solution directly within each Gauss‐Seidel iteration and it is independent of both simulation step‐size and integration methods. To demonstrate the benefits of our proposed extension with higher frame rates, we develop an efficient and practical mesh coloring algorithm for the XPBD method which provides parallel processing on a GPU. During the initialization phase, all mesh primitives are grouped according to their connectivity. Afterwards, all these groups are computed simultaneously on a GPU during the simulation phase. We demonstrate the benefits of our method with many spring potential and strain‐based continuous material constraints. Our proposed algorithm is easy to implement and seamlessly fits into the existing position‐based frameworks.  相似文献   

20.
With fierce competition between CPU and graphics processing unit (GPU) platforms, performance evaluation has become the focus of various sectors. In this paper, we take a well‐known algorithm in the field of biosequence matching and database searching, the Smith–Waterman (S‐W) algorithm as an example, and demonstrate approaches that fully exploit its performance potentials on CPU, GPU, and field‐programmable gate array (FPGA) computing platforms. For CPU platforms, we perform two optimizations, single instruction, multiple data and multithread, with compiler options, to gain over 70 × speedups over naive CPU versions on quad‐core CPU platforms. For GPU platforms, we propose the combination of coalesced global memory accesses, shared memory tiles, and loop unfolding, achieving 50 × speedups over initial GPU versions on an NVIDIA GeForce GTX 470 card. Experimental results show that the GPU GTX 470 gains 12 × speedups, instead of 100 × reported by some studies, over Intel quadcore CPU Q9400, under the same manufacturing technology and both with fully optimized schemes. In addition, for FPGA platforms, we customize a linear systolic array for the S‐W algorithm in a 45‐nm FPGA chip from Xilinx (XC6VLX760), with up to 1024 processing elements. Under only 133 MHz clock rate, the FPGA platform reaches the highest performance and becomes the most power‐efficient platform, using only 25 W compared with 190 W of the GPU GTX 470. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号