为实现多分辨率几何图像的高效逼真绘制,将满足饱和性质的层次包围球概念与基于屏幕空间的误差判据思想引入到节点误差测度函数的设计中,提出一种适用于视点相关几何图像的节点误差测度函数.针对传统多分辨率几何图像裂缝消除算法复杂、难以用GPU加速的问题,使用正切函数控制顶点偏移;在GPU上快速消除几何图像四叉树结构的不同分辨率节点绘制时出现的裂缝,实现了节点间网格的快速平滑过渡.实验结果证明,该节点误差测度函数能实现几何图像四叉树节点的精确选择,三维模型特征保持明显;裂缝处理算法易于GPU实现,能获得较高的帧率.  相似文献   

KD树是三维场景渲染中常用的空间加速算法.由于SIMD计算平台不支持递归操作,导致KD树在GPU上的应用受到限制,因此提出了一个新的基于SIMD架构的并行KD树算法.通过创建时对KD树线索化,不仅省去堆栈使用,且因无需回溯到根节点而减少大量无效遍历操作,实现了基于GPU的高效并行加速.实验结果表明,线索KD树算法每秒计...  相似文献   

GPU在复杂场景的阴影绘制中的应用   总被引:4,自引:0,他引:4       下载免费PDF全文
通过有效利用图形硬件的图形处理单元(GPU)的运算能力和可编程性,将人量计算从CPU分离出来。在GPU上采用顶点和片元程序进行阴影计算,从而加速复杂场景阴影绘制。选择图像空间阴影算法进行GPU加速绘制。用Cg图形编程语言和OpenGL实现了算法的绘制过程,能够满足通用的复杂3D场景应用的需要,达到满意的实时绘制效果。  相似文献   

近几年随着GPU的可编程能力的增强,很多基于点的绘制算法都可以移植到GPU上来实现,这样既可以让CPU有时间来处理其他事,又可以通过GPU提高算法的运行速度。由于目前的GPU不支持epsilon-z-buffering算法,大部分基于GPU的绘制算法都是通过Multi-pass绘制来达到较高的绘制质量。然而,这些算法需要在第一和第二个pass中光栅化大量的可能可见的面圆,并在第二个pass的像素shader中对这些可能可见的面圆进行大量的计算。本文提出了一种基于GPU的改进Multi-pass绘制算法,与前面的Multi-pass算法相比,我们的算法只需在第一个pass中对大量可能可见的面圆进行光栅化和深度测试后,便可求出所有可见面圆,即离视点最近的面圆。然后在第二个pass中只对这些可见面圆进行光栅化和逐像素计算,从而避免了大量不必要的计算。  相似文献   

在主流个人计算机硬件条件下,为加速百万以上三角面片构成的复杂稠密几何模型的绘制速度,综合基于几何的建模与绘制方法GBMR和基于点的建模与绘制方法PBMR的优点,提出了一种同时使用三角面片和点作为基本单元进行对象建模与绘制的点面混合方法.在预处理阶段,对模型表面进行网格分割,存储子块三角面片和顶点点云数据,同时对顶点点云按顶点重要度排序并序列化为线性结构.在实时绘制阶段,进行视相关的裁剪和背面剔除,不同子块按视点距离分别由三角形或点进行绘制.以上过程充分利用图形处理单元GPU,实现了基于GPU的点面混合的对象连续多分辨率绘制,有效地提高了复杂模型的绘制效率.  相似文献   

分层分布式狄利克雷分布(HD-LDA)算法是一个对潜在狄利克雷分布(LDA)进行改进的基于概率增长模型的文本分类算法,与只能在单机上运行的LDA算法相比,可以运行在分布式框架下,进行分布式并行处理。Mahout在Hadoop框架下实现了HD-LDA算法,但是因为单节点算法的计算量大,仍然存在对大数据分类运行时间太长的问题。而大规模文本集合分散到多个节点上迭代推导,单个节点上文档集合的推导仍是顺序进行的,所以处理大规模文本集合时仍然需要很长时间才能完成全部文本的分类。为此,提出将Hadoop与图形处理器(GPU)相结合,将单节点文本集合的推导过程转移到GPU上运行,实现单节点多个文档并行推导,利用多台并行的GPU对HD-LDA算法进行加速。应用结果表明,使用该方法能使分布式框架下的HD-LDA算法对大规模文本集合处理达到7倍的加速比。  相似文献   

多代表点特征树与空间聚类算法   总被引:1,自引:0,他引:1  
空间数据具有海量、复杂、连续、空间自相关、存在缺损与误差等的特点,要求空间聚类算法具有高效率,能处理各种复杂形状的簇,聚类结果与数据空间分布顺序无关,并且对离群点是健壮的等性能,已有的算法难以同时满足要求。本文提出了一个适合处理海量复杂空间数据的数据结构一多代表点特征树。基于多代表点特征树提出了适合挖掘海量复杂空间数据聚类算法CAMFT,该算法利用多代表点特征树对海量的数据进行压缩,结合随机采样的方法进一步增强算法处理海量数据的能力;同时,多代表点特征树能够保存复杂形状的聚类特征,适合处理复杂空间数据。实验表明了算法CAMFT能够快速处理带有离群点的复杂形状聚类的空间数据,结果与对象空间分布顺序无关,并且效率优于已有的同类聚类算法BLRCH与CURE。  相似文献   

近年来电子设计自动化(EDA)研究人员尝试利用图形处理器(graphic processing unit,GPU)提供的高性能计算能力对IC参数分析进行加速研究.为了利用GPU进行电源线/地线网络(power/ground network,P/G网)快速分析,设计了一种基于经典的连续过松弛(successive over-relaxation,SOR)算法的高效P/G网分析并行算法.基于GPU并行计算加速原理,此算法进行了如下改进:1)采用红-黑次序的松弛策略.将所有的节点分为红黑两类,红色节点的所有邻点只有黑色节点、黑色节点的所有邻点只有红色节点,红色节点与黑色节点交替松弛,保证了GPU并行计算中的数据一致性.对于具有N个节点的P/G网而言,一次红色节点或黑色节点松弛可以同时对N/2个节点进行松弛操作,即理论上可以同时启动N/2个并行线程.2)优化数据结构.实现了对数据空间的合并访问,以保证对GPU全局存储空间的最优访问.3)在共享存储器内通过并行归约对松弛标记进行快速统计,同时利用zero-copy技术进行松弛标记的快速拷贝,以快速决定是否继续松弛.大量的实验结果表明:与单线程的CPU程序相比,此算法的加速倍数随GPU所提供物理线程的数目增加而线性增加,可以获得最大242倍的加速效果,是目前EDA研究领域中加速效果最好的GPU算法.  相似文献   

为提高多分辨率几何图像的实时绘制效率,将枚举四叉树思想引入到传统四叉树的设计中,提出以单个枚举四叉树节点作为独立渲染单元的方法,极大地减少了绘制函数的调度次数.针对实时绘制时几何图像不同层次边缘易产生裂缝的现象,采用枚举四叉树的结构,在数据预处理阶段构造基础渲染节点时用分段函数消除节点内部的裂缝;实时绘制阶段在GPU上使用反正切函数控制顶点偏移,消除节点间的裂缝.实验结果证明,枚举四叉树结构能充分利用GPU的批量处理能力,有效地提高了绘制效率;裂缝处理算法易于GPU实现,获得了较高的帧率和较好的绘制效果.  相似文献   

层次可见性与层次细节地表模型相结合的快速绘制   总被引:2,自引:0,他引:2  
大规模地表模型的实时绘制是虚拟现实技术中重要的研究课题之一。为了加速地表模型的绘制,人们采用视点相关的动态多分辨率层次细节模型方法,但是算法效率依然有待提高。该文提出一种层次可见性与层次细节地表模型相结合的快速地形绘制方法。算法旨在利用地表模型所具有的horizon特性在预处理中为地表多分辨率块模型建立相应的层次“块”可见性结构,快速判定地形块相对于当前视点的可见性,以减少多分辨率模型中模型细节的处理和绘制三角形的数目。同时为消除地表模型层次变化所带来的可见性错误,算法提出一种层次结构可见性计算方法,以修正多分辨率模型所带来的可见性动态变化。实验结果表明算法有效地提高了绘制效率,是可行的。  相似文献   

Sequential point trees provide the state-of-the-art technique for rendering point models, by re-arranging hierarchical points sequentially according to geometric errors running on GPU for fast rendering. This paper presents a view dependent method to augment sequential point trees by embedding the hierarchical tree structures in the sequential list of hierarchical points. By the method,two kinds of indices are constructed to facilitate the points rendering in an order mostly from near to far and from coarse to fine. As a result, invisible points can be culled view-dependently in high efficiency for hardware acceleration, and at the same time, the advantage of sequential point trees could be still fully taken. Therefore, the new method can run much faster than the conventional sequential point trees, and the acceleration can be highly promoted particularly when the objects possess complex occlusion relationship and viewed closely because invisible points would be in a high percentage of the points at finer levels.  相似文献   

针对传统光线投射算法计算量大、速度慢、在没有硬件加速情况下难以实时重建的问题,提出了一种基于GPU编程的快速计算重采样点值的光线投射算法。首先,设计一个GPU程序确定投射光线的终点与方向;其次,采用加速度步长采样方法确定重采样点的位置并利用快速复合插值方法计算重采样点的颜色值;最后,采用不透明度提前截止法进一步加速重建过程。实验结果表明,该方法计算复杂度低、执行效率高。在保证重建图像质量的同时,与现有基于CPU的光线投射算法相比,重建速度提高6倍,与基于GPU的传统光线投射算法相比,速度提高2倍。  相似文献   

The hidden point removal (HPR) operator introduced by Katz et al. [KTB07] provides an elegant solution for the problem of estimating the visibility of points in point samplings of surfaces. Since the method requires computing the three‐dimensional convex hull of a set with the same cardinality as the original cloud, the method has been largely viewed as impractical for real‐time rendering of medium to large clouds. In this paper we examine how the HPR operator can be used more efficiently by combining several image space techniques, including an approximate convex hull algorithm, cloud sampling, and GPU programming. Experiments show that this combination permits faster renderings without overly compromising the accuracy.  相似文献   

A particle system for interactive visualization of 3D flows   总被引:3,自引:0,他引:3  
We present a particle system for interactive visualization of steady 3D flow fields on uniform grids. For the amount of particles we target, particle integration needs to be accelerated and the transfer of these sets for rendering must be avoided. To fulfill these requirements, we exploit features of recent graphics accelerators to advect particles in the graphics processing unit (GPU), saving particle positions in graphics memory, and then sending these positions through the GPU again to obtain images in the frame buffer. This approach allows for interactive streaming and rendering of millions of particles and it enables virtual exploration of high resolution fields in a way similar to real-world experiments. The ability to display the dynamics of large particle sets using visualization options like shaded points or oriented texture splats provides an effective means for visual flow analysis that is far beyond existing solutions. For each particle, flow quantities like vorticity magnitude and A2 are computed and displayed. Built upon a previously published GPU implementation of a sorting network, visibility sorting of transparent particles is implemented. To provide additional visual cues, the GPU constructs and displays visualization geometry like particle lines and stream ribbons.  相似文献   

使用GPU编程的光线投射体绘制算法   总被引:6,自引:0,他引:6  
将传统的光线投射体绘制算法在具有可编程管线的图形处理器(GPU)上重新实现.首先将体数据作为三维纹理保存在显存中,然后通过编写顶点程序和片段程序将光线进入点/离开点计算和光线遍历的计算移入GPU中执行,最后根据不同的采样点颜色混合公式实现不同的绘制效果.文中算法仅需绘制一个四边形即可完成三维重建.实验结果表明:在进行光照效果的重建时,该算法能够达到实时交互的绘制要求,并能实现半透明等复杂绘制效果.  相似文献   

Research in rendering large point clouds traditionally focused on the generation and use of hierarchical acceleration structures that allow systems to load and render the smallest fraction of the data with the largest impact on the output. The generation of these structures is slow and time consuming, however, and therefore ill-suited for tasks such as quickly looking at scan data stored in widely used unstructured file formats, or to immediately display the results of point-cloud processing tasks. We propose a progressive method that is capable of rendering any point cloud that fits in GPU memory in real time, without the need to generate hierarchical acceleration structures in advance. Our method supports data sets with a large amount of attributes per point, achieves a load performance of up to 100 million points per second, displays already loaded data in real time while remaining data is still being loaded, and is capable of rendering up to one billion points using an on-the-fly generated shuffled vertex buffer as its data structure, instead of slow-to-generate hierarchical structures. Shuffling is done during loading in order to allow efficiently filling holes with random subsets, which leads to a higher quality convergence behavior.  相似文献   

We present a geometry compression scheme for restricted quadtree meshes and use this scheme for the compression of adaptively triangulated digital elevation models (DEMs). A compression factor of 8–9 is achieved by employing a generalized strip representation of quadtree meshes to incrementally encode vertex positions. In combination with adaptive error-controlled triangulation, this allows us to significantly reduce bandwidth requirements in the rendering of large DEMs that have to be paged from disk. The compression scheme is specifically tailored for GPU-based decoding, since it minimizes dependent memory access operations. We can thus trade CPU operations and CPU–GPU data transfer for GPU processing, resulting in twice faster streaming of DEMs from main memory into GPU memory. A novel storage format for decoded DEMs on the GPU facilitates a sustained rendering throughput of about 300 million triangles per second. Due to these properties, the proposed scheme enables scalable rendering with respect to the display resolution independent of the data size. For a maximum screen-space error below 1 pixel it achieves frame rates of over 100 fps, even on high-resolution displays. We validate the efficiency of the proposed method by presenting experimental results on scanned elevation models of several hundred gigabytes.  相似文献   

Empty‐space skipping is an essential acceleration technique for volume rendering. Image‐order empty‐space skipping is not well suited to GPU implementation, since it must perform checks on, essentially, a per‐sample basis, as in kd‐tree traversal, which can lead to a great deal of divergent branching at runtime, which is very expensive in a modern GPU pipeline. In contrast, object‐order empty‐space skipping is extremely fast on a GPU and has negligible overheads compared with approaches without empty‐space skipping, since it employs the hardware unit for rasterisation. However, previous object‐order algorithms have been able to skip only exterior empty space and not the interior empty space that lies inside or between volume objects. In this paper, we address these issues by proposing a multi‐layer depth‐peeling approach that can obtain all of the depth layers of the tight‐fitting bounding geometry of the isosurface by a single rasterising pass. The maximum count of layers peeled by our approach can be up to thousands, while maintaining 32‐bit float‐point accuracy, which was not possible previously. By raytracing only the valid ray segments between each consecutive pair of depth layers, we can skip both the interior and exterior empty space efficiently. In comparisons with 3 state‐of‐the‐art GPU isosurface rendering algorithms, this technique achieved much faster rendering across a variety of data sets.  相似文献   

针对点曲面的视点相关绘制问题,提出了一个新的表面基层次聚类简化算法。区别于普遍采用的空间剖分基策略,该算法的显著优势在于能够运用法向锥半角误差标准有效跟踪曲面的起伏变化,并以此为聚类简化过程提供可靠的全局误差控制。离线简化阶段,连同各种预定义的聚类约束条件,算法构造了点曲面模型的连续层次多分辨率表达。实时绘制阶段,层次可见性裁剪以及优化的树遍历提高了系统的整体性能。此外,通过引入附加的轮廓增强机制,在较大的屏幕投影误差和较高的模型简化率情况下,系统仍然能够保证较好的绘制视觉质量。  相似文献   

In this paper, we present a new approach for shape‐grammar‐based generation and rendering of huge cities in real‐time on the graphics processing unit (GPU). Traditional approaches rely on evaluating a shape grammar and storing the geometry produced as a preprocessing step. During rendering, the pregenerated data is then streamed to the GPU. By interweaving generation and rendering, we overcome the problems and limitations of streaming pregenerated data. Using our methods of visibility pruning and adaptive level of detail, we are able to dynamically generate only the geometry needed to render the current view in real‐time directly on the GPU. We also present a robust and efficient way to dynamically update a scene's derivation tree and geometry, enabling us to exploit frame‐to‐frame coherence. Our combined generation and rendering is significantly faster than all previous work. For detailed scenes, we are capable of generating geometry more rapidly than even just copying pregenerated data from main memory, enabling us to render cities with thousands of buildings at up to 100 frames per second, even with the camera moving at supersonic speed.  相似文献   

