期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王文博殷宏解文彬王家腾《计算机应用》2015,35(6):1716-1719

针对顶点着色器细分地形网格需要额外生成模板、计算细分层次复杂的不足,提出了一种利用细分着色器进行地形网格细分的层次细节(LOD)地形渲染算法。利用分块四叉树组织建立地形粗糙网格的分层结构,以LOD判别函数对活动地形块进行筛选;提出了在细分控制着色器中基于视点三维连续距离的细分因子计算方法,并针对外部细分因子进行处理消除了裂缝;实现在细分计算着色器上的置换贴图,对精细网格的高度分量进行位移。而且将四叉树结构存储至顶点缓冲区,减少中央处理器(CPU)与图形处理器(GPU)的资源交换;引入细分队列加速细分过程。实验证明,该算法具有平滑的细节层次过渡和良好的细分效果,能够有效提高GPU利用率和地形渲染效率。相似文献

2.

Tree Branch Level of Detail Models for Forest Navigation

下载免费PDF全文

Xiaopeng Zhang Guanbo Bao Weiliang Meng Marc Jaeger Hongjun Li Oliver Deussen Baoquan Chen 《Computer Graphics Forum》2017,36(8):402-417

We present a level of detail (LOD) method designed for tree branches. It can be combined with methods for processing tree foliage to facilitate navigation through large virtual forests. Starting from a skeletal representation of a tree, we fit polygon meshes of various densities to the skeleton while the mesh density is adjusted according to the required visual fidelity. For distant models, these branch meshes are gradually replaced with semi‐transparent lines until the tree recedes to a few lines. Construction of these complete LOD models is guided by error metrics to ensure smooth transitions between adjacent LOD models. We then present an instancing technique for discrete LOD branch models, consisting of polygon meshes plus semi‐transparent lines. Line models with different transparencies are instanced on the GPU by merging multiple tree samples into a single model. Our technique reduces the number of draw calls in GPU and increases rendering performance. Our experiments demonstrate that large‐scale forest scenes can be rendered with excellent detail and shadows in real time. 相似文献

3.

An efficient GPU out‐of‐core framework for interactive rendering of large‐scale CAD models

Junjie Xue Gang Zhao Wenlei Xiao 《Computer Animation and Virtual Worlds》2016,27(3-4):231-240

Real‐time rendering of large‐scale engineering computer‐aided design (CAD) models has been recognized as a challenging task. Because of the constraints of limited graphics processing unit (GPU) memory size and computation capacity, a massive model with hundreds of millions of triangles cannot be loaded and rendered in real‐time using most of modern GPUs. In this paper, an efficient GPU out‐of‐core framework is proposed for interactively visualizing large‐scale CAD models. To improve efficiency of data fetching from CPU host memory to GPU device memory, a parallel offline geometry compression scheme is introduced to minimize the storage cost of each primitive by compressing the levels of detail (LOD) geometries into a highly compact format. At the rendering stage, occlusion culling and LOD processing algorithms are integrated and implemented with an efficient GPU‐based approach to determine a minimal scale of primitives to be transferred for each frame. A prototype software system is developed to preprocess and render massive CAD models with the proposed framework. Experimental results show that users can walkthrough massive CAD models with hundreds of millions of triangles at high frame rates using our framework. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

4.

基于GPU加速的TIP技术

闯跃龙《计算机辅助工程》2008,17(4):81-84

为加快TIP（Tour Into the Picture）的绘制速度,提出1种基于GPU（Graphics Processing Unit）的方法,充分利用GPU的运算能力,把背景纹理提取过程从CPU转移到GPU中进行,利用GPU固定管道进行TIP绘制,CPU负责前景模型的深度计算及纹理提取．因此,CPU与GPU可以并行运算,显著提高纹理映射速度从而缩短整个TIP绘制时间,满足用户在虚拟场景中漫游的实时性要求．相似文献

5.

地形可视化中的改进Geoclipmap算法

张建廷刘福太艾祖亮《计算机应用》2010,30(12):3292-3294

在大地形实时绘制中,大规模的地形数据和有限的硬件数据通信带宽是限制地形绘制效率的主要原因。在Geoclipmap算法的基础上,通过使用几何场景图(GSG)组织结构提高数据外存加载效率,在mipmap棱锥生成过程中采取sinc滤波方法进行重采样,避免地形细节丢失。为减少CPU到图形处理器（GPU）的数据流量,提出一种基于层次包围球的二级视锥体裁剪技术,并将法线的生成放到GPU的片段着色器中。实验结果表明,算法保持地形真实感,并有效提高绘制效率,能满足大地形的实时渲染要求。相似文献

6.

Interactive Ray Tracing of Large Models Using Voxel Hierarchies

Attila T. Áfra 《Computer Graphics Forum》2012,31(1):75-88

We propose an efficient approach for interactive visualization of massive models with CPU ray tracing. A voxel‐based hierarchical level‐of‐detail (LOD) framework is employed to minimize rendering time and required system memory. In a pre‐processing phase, a compressed out‐of‐core data structure is constructed, which contains the original primitives of the model and the LOD voxels, organized into a kd‐tree. During rendering, data is loaded asynchronously to ensure a smooth inspection of the model regardless of the available I/O bandwidth. With our technique, we are able to explore data sets consisting of hundreds of millions of triangles in real‐time on a desktop PC with a quad‐core CPU. 相似文献

7.

基于综合LOD因子的自适应GPU地形渲染 总被引：1，自引：0，他引：1

下载免费PDF全文

张兵强张立民艾祖亮张建廷《计算机工程》2012,38(12):201-204

根据四叉树的地形分块数据组织形式,提出一种面向图形处理器(GPU)的自适应地形渲染算法。将综合细节层次因子作为地形块节点评价函数,对静态地形块误差、动态视点依赖误差和视点移动速度进行量化,在顶点着色器上实现高程值的平滑过渡,消除突跃现象,并通过添加“裙”遮盖裂缝。实验结果表明,该算法的地形自适应性较好,具有较高的帧率和GPU利用率。相似文献

8.

基于视点互信息的树叶实时简化方法

王超凡王标佘江峰《计算机系统应用》2020,29(12):35-44

三维树木模型在虚拟地理环境, 三维城市场景等领域中应用广泛, 但由于树木中包含丰富的几何信息, 难以对大规模的森林场景进行有效的渲染. 为此我们设计了一种基于视点互信息(Viewpoint Mutual Information, VMI)的树木实时简化方法. 在预处理中按照树枝间的拓扑关系将树木划分为具有父子关系的节点, 然后根据VMI计算每片树叶在多个视点下的平均重要度并以此对树叶进行排序, 重要程度较小的树叶在简化过程中将会被优先删除. 实时简化过程中, 我们提出了一种视点依赖的简化方法, 大大降低了需要渲染的数据量. 为了提高渲染森林场景时的性能, 我们使用了多种渲染优化措施以避免不必要的细节层次(Level Of Detail, LOD)切换. 相似文献

9.

Interactive Large‐Scale Procedural Forest Construction and Visualization Based on Particle Flow Simulation

下载免费PDF全文

Štefan Kohek Damjan Strnad 《Computer Graphics Forum》2018,37(1):389-402

Interactive visualization of large forest scenes is challenging due to the large amount of geometric detail that needs to be generated and stored, particularly in scenarios with a moving observer such as forest walkthroughs or overflights. Here, we present a new method for large‐scale procedural forest generation and visualization at interactive rates. We propose a hybrid approach by combining geometry‐based and volumetric modelling techniques with gradually transitioning level of detail (LOD). Nearer trees are constructed using an extended particle flow algorithm, in which particle trails outline the tree ramification in an inverse direction, i.e. from the leaves towards the roots. Reduced geometric representation of a tree is obtained by subsampling the trails. For distant trees, a new volumetric rendering technique in pixel‐space is introduced, which avoids geometry formation altogether and enables visualization of vast forest areas with millions of unique trees. We demonstrate that a GPU‐based implementation of the proposed method provides interactive frame rates in forest overflight scenarios, where new trees are constructed and their LOD adjusted on the fly. 相似文献

10.

Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform

Zhiyong Yuan Weixin Si Xiangyun Liao Zhaoliang Duan Yihua Ding Jianhui Zhao 《The Journal of supercomputing》2012,61(1):84-102

Open Computing Language (OpenCL) is an open royalty-free standard for general purpose parallel programming across Central Processing Units (CPUs), Graphic Processing Units (GPUs) and other processors. This paper introduces OpenCL to implement real-time smoking simulation in a virtual surgery training simulation system. Firstly, the Computational Fluid Dynamics (CFD) is adopted to construct the real-time smoking simulation model based on the Navier?CStokes (N-S) equations of an incompressible fluid under the condition of normal temperature and pressure. Then we propose a parallel computing technique based on OpenCL to accomplish the parallel computing of smoking simulation model on CPU and GPU, respectively. Finally, we render the smoke in real time by using a three-dimensional (3D) texture volume rendering method. Experimental results show that the parallel computing technique we have proposed achieve a satisfactory effect on image quality and rendering rate both on CPU and GPU. 相似文献

11.

基于图形处理器加速的医学图像配准技术进展

查珊珊王远军聂生东《计算机应用》2015,35(9):2486-2491

针对目前医学图像配准技术无法满足临床实时性需求问题,对基于图形处理器(GPU)加速的医学图像配准技术进行综述探讨。首先对GPU通用计算进行概述,再以医学图像配准基本框架为主线,对近年来基于GPU加速的医学图像配准技术在国内外发展现状进行深入研究,并针对正电子发射型计算机断层显像(PET)和电子计算机断层扫描(CT)数据的非线性配准问题,分别基于中央处理器(CPU)和GPU平台进行配准实验,通过实验结果的对比,体现GPU加速配准技术的优越性。基于GPU加速的自由形变(FFD)和归一化互信息(NMI)结合的非线性配准方法配准后互信息值略低于CPU平台的配准结果,但其配准速度是CPU平台的12倍。基于GPU加速的配准算法在保持配准精度的基础上,配准速度都得到了很大的提升。相似文献

12.

基于GPU图像去噪总变分对偶模型的并行计算

赵明超陈智斌文有为《计算机应用》2016,36(5):1228-1231

研究基于总变分(TV)的图像去噪问题,针对中央处理器(CPU)计算速度较慢的问题,提出了在图像处理器(GPU)上并行计算的方法。考虑总变分最小问题的对偶模型,建立原始变量与对偶变量的关系,采用梯度投影算法求解对偶变量。数值实验分别在GPU与CPU上进行。实验结果表明,总变分去噪模型对偶算法在GPU设备上执行的效率高于在CPU上执行的效率,并且随着图像尺寸的增大,GPU并行计算的优势更加突出。相似文献

13.

图形处理器通用计算的实现与验证

下载免费PDF全文

齐记杨孔庆杨磊《计算机工程与应用》2009,45(33):67-69

讨论了显示卡用于通用科学计算的问题,并以大型矩阵的基本运算问题详细比较了CPU和GPU计算之间的差别。在基本的矩阵运算中,运用适当的矩阵分块,GPU的计算速度比CPU快50倍左右。而且,显示卡低廉的价格为更多科研工作者实现大规模运算提供了可能。相似文献

14.

基于二叉树和GPU的无缝地形场景渲染方法 总被引：1，自引：0，他引：1

曹巍段光耀《计算机应用》2012,32(9):2548-2552

设计了一种基于图形处理器(GPU)的无缝地形渲染方法。该方法基于二叉树构建多层次地形网格,该网格用基于行、列号的地形模板表示。在设计过程中,将高程数据转化为适于GPU读取的高程纹理图,再通过顶点纹理提取(VTF)技术从纹理图中采样出高程值用于渲染,整个过程在GPU端完成,提升了地形数据访问效率。同时,采用实时优化自适应网格(ROAM)算法的强制拆分法,通过控制相邻地形块的等级来消除裂缝。最后,采用TriangleStrip方式进行渲染,避免了相邻三角形中顶点坐标数据的重复传递,减少了传递到GPU的数据量。用两块地形数据对算法渲染效率进行了检验,并将算法与Clipmap算法进行了帧率对比。结果表明,该算法有效解决了分块数据的裂缝问题,达到了交互式地形渲染的要求。相似文献

15.

State‐of‐the‐Art in Compressed GPU‐Based Direct Volume Rendering

M. Balsa Rodríguez E. Gobbetti J.A. Iglesias Guitián M. Makhinya F. Marton R. Pajarola S.K. Suter 《Computer Graphics Forum》2014,33(6):77-100

Great advancements in commodity graphics hardware have favoured graphics processing unit (GPU)‐based volume rendering as the main adopted solution for interactive exploration of rectilinear scalar volumes on commodity platforms. Nevertheless, long data transfer times and GPU memory size limitations are often the main limiting factors, especially for massive, time‐varying or multi‐volume visualization, as well as for networked visualization on the emerging mobile devices. To address this issue, a variety of level‐of‐detail (LOD) data representations and compression techniques have been introduced. In order to improve capabilities and performance over the entire storage, distribution and rendering pipeline, the encoding/decoding process is typically highly asymmetric, and systems should ideally compress at data production time and decompress on demand at rendering time. Compression and LOD pre‐computation does not have to adhere to real‐time constraints and can be performed off‐line for high‐quality results. In contrast, adaptive real‐time rendering from compressed representations requires fast, transient and spatially independent decompression. In this report, we review the existing compressed GPU volume rendering approaches, covering sampling grid layouts, compact representation models, compression techniques, GPU rendering architectures and fast decoding techniques. 相似文献

16.

CPU与GPU并行计算的火焰模拟

王栋栋庄雷《计算机应用》2009,29(6):1702-1710

采用基于粒子插值的SPH方法对火焰流体进行模拟,用GPU加速粒子状态地计算,同时用CPU并行地计算粒子邻接关系并控制粒子产生速率。在SPH模型中,较为高效地加入了漩涡场的计算,增加了粒子运动的细节。在粒子渲染过程中,采用了色度场、有向点扩散和颜色锐化技术,由离散的粒子空间分布得到了较为理想的连续火焰图像。由于该方法属于流体模拟的拉格朗日法,所以火焰具有物理真实性,又由于采用GPU为主CPU为辅的计算架构,使得模拟达到了实时。相似文献

17.

多核CPU和GPU加速分子动力学模拟

林江宏林锦贤吕暾《计算机应用》2011,31(3):843-847

在多核中央处理器(CPU)—图形处理器(GPU)异构并行体系结构上,采用OpenMP和计算统一设备架构(CUDA)编程实现了基于AMBER力场的蛋白质分子动力学模拟程序。通过合理地将程序划分为CPU单线程、CPU多线程和GPU多线程执行部分,高效地利用了计算机的处理能力。性能测试结果表明,相对于优化后的CPU串行计算,多核CPU-GPU异构并行计算模型有强大的性能优势,特别是将占整个程序执行时间90%的作用力的计算移植到GPU上执行,获得了最高可达12倍的计算加速比。相似文献

18.

基于GPU的低密度奇偶校验码译码加速技术

徐启迪刘争红郑霖《计算机应用》2022,42(12):3841-3846

随着通信技术的发展,通信终端逐渐采用软件的方式来兼容多种通信制式和协议。针对以计算机中央处理器（CPU）作为运算单元的传统软件无线电架构,无法满足高速无线通信系统如多进多出（MIMO）等宽带数据的吞吐率要求问题,提出了一种基于图形处理器（GPU）的低密度奇偶校验（LDPC）码译码器的加速方法。首先,根据GPU并行加速异构计算在GNU Radio 4G/5G物理层信号处理模块中的加速表现的理论分析,采用了并行效率更高的分层归一化最小和（LNMS）算法;其次,通过使用全局同步策略、合理分配GPU内存空间以及流并行机制等方法减少了译码器的译码时延,同时配合GPU多线程并行技术对LDPC码的译码流程进行了并行优化;最后,在软件无线电平台上对提出的GPU加速译码器进行了实现与验证,并分析了该并行译码器的误码率性能和加速性能的瓶颈。实验结果表明,与传统的CPU串行码处理方式相比,CPU+GPU异构平台对LDPC码的译码速率可提升至原来的200倍左右,译码器的吞吐量可以达到1 Gb/s以上,特别是在大规模数据的情况下对传统译码器的译码性有着较大的提升。相似文献

19.

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

《Parallel Computing》2016

Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high computational power and high performance per Watt. However, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation, while the CPU is responsible for the communication. This approach always requires a dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work we have shown that the GPU is able to control the communication independently of the CPU. However, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work we show how dynamic parallelism solves this problem. GPU-controlled communication in combination with dynamic parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Using other in-kernel synchronization methods results in massive performance losses, due to the forced serialization of the GPU thread blocks. Although the performance of applications using GPU-controlled communication is still slightly worse than the performance of hybrid applications, we will show that performance per Watt increases by up to 10% while still using commodity hardware. 相似文献

20.

Optimized GPU evaluation of arbitrary degree NURBS curves and surfaces

Adarsh Krishnamurthy Rahul Khardekar Sara McMains 《Computer aided design》2009,41(12):971-980

This paper presents a new unified and optimized method for evaluating and displaying trimmed NURBS surfaces using the Graphics Processing Unit (GPU). Trimmed NURBS surfaces, the de facto standard in commercial mechanical CAD modeling packages, are currently being tessellated into triangles before being sent to the graphics card for display since there is no native hardware support for NURBS. Other GPU-based NURBS evaluation and display methods either approximated the NURBS patches with lower degree patches or relied on specific hard-coded programs for evaluating NURBS surfaces of different degrees. Our method uses a unified GPU fragment program to evaluate the surface point coordinates of any arbitrary degree NURBS patch directly, from the control points and knot vectors stored as textures in graphics memory. This evaluated surface is trimmed during display using a dynamically generated trim-texture calculated via alpha blending. The display also incorporates dynamic Level of Detail (LOD) for real-time interaction at different resolutions of the NURBS surfaces. Different data representations and access patterns are compared for efficiency and the optimized evaluation method is chosen. Our GPU evaluation and rendering speeds are more than 40 times faster than evaluation using the CPU. 相似文献