首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
田泽  张骏  许宏杰  郭亮  黎小玉 《计算机科学》2013,40(Z6):210-216
图形处理器(GPU)以其强大的图形加速性能以及在通用计算领域的出色表现正在被越来越广泛地应用。但随着芯片规模和集成度的不断提升,单个GPU芯片的功耗已经高达376W,是高端通用处理器的2~3倍。高功耗带来的可靠性、稳定性以及芯片成本问题使“功耗墙”已经成为未来GPU设计过程中需要突破的关键问题之一。立足于体系结构层次,结合图形处理器的渲染流水线的结构特点,从深度测试和消隐、染色器数据通路、纹理映射和压缩、渲染策略、寄存器文件和片上Cache等角度描述了图形处理器的低功耗设计技术,并指出了GPU低功耗设计技术的进一步研究方向。  相似文献   

2.
近年来图形处理器(GPU)快速拓展的可编程性能力加上渲染流水线的高速度及并行性,使得图形处理器通用计算(GPGPU)迅速成为一个研究热点。针对大规模神经网络BP算法效率低下问题,提出了一种GPU加速的神经网络BP算法。将BP网络的前向计算、反向学习转换为GPU纹理的渲染过程,从而利用GPU强大的浮点运算能力和高度并行的计算特性对BP算法进行求解。实验结果表明,在保证求解结果准确度不变的情况下,该方法运行效率有明显的提高。  相似文献   

3.
研究了三维电影播放应用中的视频纹理技术,分析了该方法的国内外研究现状、难题和挑战,提出了有效地解决方案,并对该方案进行设计.采用 DirectShow进行视频数据的调度与管理,Direct3D作为图形渲染API,提取出传统渲染管道下的并行运算部分,利用可编程GPU并行运算的特性对其进行加速渲染,并利用GPU纹理压缩方法解决了显存容量有限的问题.实验结果表明,该设计方法能够有效地提高帧速率,缓解CPU负担,解决实时渲染中的性能瓶颈问题,在实际应用中具有较强的使用价值.  相似文献   

4.
针对图形处理器研究问题,其中图形海量数据集的分析与处理,多用小波变换方法。但计算量大,难以适应实时性要求。近年来图形处理器的性能大幅提高,其深度流水线和并行运算机制提高,为解决实时计算问题提供了良好的平台。在研究小波变换矩阵形式及GPU编程模型的基础上,提出了一种关于GPU的小波变换方法,利用数组与纹理之间的对应关系实现小波变换,将离散的数据点映射到纹理,将小波变换的计算影射为高维矩阵与向量间的乘积形式,并通过渲染到纹理的形式取得中间结果。方法充分发挥了GPU流水线的并行性优势,实验表明方法可有效减少计算时间,从而达到实时绘制的要求。  相似文献   

5.
提出基于GPU编程的真实感水面的优化实时渲染算法。介绍了各种水面渲染需要使用到的图形,数学处理技术。通过固定的顶点流实现了水波建模,凹凸映射贴图和纹理混合,水面的反射和折射等多种特效,并使用可编程流水线的补色渲染完成最后的水面绘制。实验证明该方法可以很好地模拟真实水面的渲染要求,可以满足3D游戏和动画中对水面渲染的需要。  相似文献   

6.
提出基于GPU编程的真实感水面的优化实时渲染算法。介绍了各种水面渲染需要使用到的图形,数学处理技术。通过固定的顶点流实现了水波建模,凹凸映射贴图和纹理混合,水面的反射和折射等多种特效,并使用可编程流水线的补色渲染完成最后的水面绘制。实验证明该方法可以很好地模拟真实水面的渲染要求,可以满足3D游戏和动画中对水面渲染的需要。  相似文献   

7.
随着图形处理器性能的提高及其可编程特性的发展,图形处理流水线的某些处理阶段和图形算法逐渐从CPU向GPU转移。文章介绍了可编程图形硬件基础,分析了基于GPU的光线跟踪技术的实现原理。设计的6个实验场景所包含的三角形面片数,从2016个到60960个成复杂度递增,在3种不同分辨率下,分别实现GPU和CPU的光线跟踪绘制。对实验结果比较、分析后,得到了GPU加速的光线跟踪技术的特点。  相似文献   

8.
图形处理器用于通用计算的技术、现状及其挑战   总被引:72,自引:4,他引:72  
吴恩华 《软件学报》2004,15(10):1493-1504
多年来计算机图形处理器(GP以大大超过摩尔定律的速度高速发展.图形处理器的发展极大地提高了计算机图形处理的速度和图形质量,并促进了与计算机图形相关应用领域的快速发展与此同时,图形处理器绘制流水线的高速度和并行性以及近年来发展起来的可编程功能为图形处理以外的通用计算提供了良好的运行平台,这使得基于GPU的通用计算成为近两三年来人们关注的一个研究热点.从介绍GPU的发展历史及其现代GPU的基本结构开始,阐述GPU用于通用计算的技术原理,以及其用于通用计算的主要领域和最新发展情况,并详细地介绍了GPU在流体模拟和代数计算、数据库应用、频谱分析等领域的应用和技术,包括在流体模拟方面的研究工作.还对GPU应用的软件工具及其最新发展作了较详细的介绍.最后,展望了GPU应用于通用计算的发展前景,并从硬件和软件两方面分析了这一领域未来所面临的挑战.  相似文献   

9.
针对在动态图形绘制基础上进行图像渲染的问题,基于Mac OS X操作系统的核心制图与渲染技术,提出了重组OpenGL渲染流程实现加速图形绘制与图像渲染结合的管道化方法,目的是实现GPU完全承担绘图和渲染加速。整个流程无须CPU参与,在提高图形子系统性能的同时优化了应用程序响应能力。  相似文献   

10.
基于GPU投影网格的曲面渲染技术   总被引:1,自引:0,他引:1  
研究曲面渲染技术对船舶、汽车、飞机造型虚拟的设计飞行器飞行实时仿真系统设计尢为重要.为了克服传统的曲面渲染方法的不足和提高实时性,充分利用图形处理器(GPU)不断提高的渲染能力,包括GPU的可编程性和高度并行计算特性,在GPU上实现了投影网格(Projected Grid)的视点相关的曲面渲染技术.从视点发出的投射光线穿过投影网格后,将根据可视化细节的重要程度,自动生成具有不同细节层次(Levels of Detail)的曲面网格,并且实时地更新网格的细节层次需求.在整个渲染过程中保持稳定的帧率,生成与视点相关的曲面光滑流畅.试验证明满足了实时交互性的要求,在工程虚拟仿真领域有广泛的应用前景.  相似文献   

11.
Great advancements in commodity graphics hardware have favoured graphics processing unit (GPU)‐based volume rendering as the main adopted solution for interactive exploration of rectilinear scalar volumes on commodity platforms. Nevertheless, long data transfer times and GPU memory size limitations are often the main limiting factors, especially for massive, time‐varying or multi‐volume visualization, as well as for networked visualization on the emerging mobile devices. To address this issue, a variety of level‐of‐detail (LOD) data representations and compression techniques have been introduced. In order to improve capabilities and performance over the entire storage, distribution and rendering pipeline, the encoding/decoding process is typically highly asymmetric, and systems should ideally compress at data production time and decompress on demand at rendering time. Compression and LOD pre‐computation does not have to adhere to real‐time constraints and can be performed off‐line for high‐quality results. In contrast, adaptive real‐time rendering from compressed representations requires fast, transient and spatially independent decompression. In this report, we review the existing compressed GPU volume rendering approaches, covering sampling grid layouts, compact representation models, compression techniques, GPU rendering architectures and fast decoding techniques.  相似文献   

12.
In this paper we present a streaming compression scheme for gigantic point sets including per-point normals. This scheme extends on our previous Duodecim approach [21] in two different ways. First, we show how to use this approach for the compression and rendering of high-resolution iso-surfaces in volumetric data sets. Second, we use deferred shading of point primitives to considerably improve rendering quality. Iso-surface reconstruction is performed in a hexagonal close packing (HCP) grid, into which the initial data set is resampled. Normals are resampled from the initial domain using volumetric gradients. By incremental encoding, only slightly more than 3 bits per surface point and 5 bits per surface normal are required at high fidelity. The compressed data stream can be decoded in the graphics processing unit (GPU). Decoded point positions are saved in graphics memory, and they are then used on the GPU again to render point primitives. In this way high quality gigantic data sets can directly be rendered from their compressed representation in local GPU memory at interactive frame rates (see Fig. 1).  相似文献   

13.
14.
彭伟  李建新  闫镔  童莉  陈健  管士勇 《计算机应用》2011,31(8):2221-2224
GPU加速体绘制已成为体可视化领域的研究热点,然而超出显存的大规模数据无法直接载入,成为GPU应用的瓶颈。分块技术能够在保证图像质量的条件下解决该问题,但分块数据的频繁加载和访问明显降低了绘制速度。针对上述问题,通过建立最优化分块模型得到了大规模数据的最优分块,并通过构造节点编号纹理和改进距离模板设计的方法进一步提高了基于八叉树的分块体绘制算法的绘制速度。实验结果表明,该方法加速效果明显。  相似文献   

15.
We present a geometry compression scheme for restricted quadtree meshes and use this scheme for the compression of adaptively triangulated digital elevation models (DEMs). A compression factor of 8–9 is achieved by employing a generalized strip representation of quadtree meshes to incrementally encode vertex positions. In combination with adaptive error-controlled triangulation, this allows us to significantly reduce bandwidth requirements in the rendering of large DEMs that have to be paged from disk. The compression scheme is specifically tailored for GPU-based decoding, since it minimizes dependent memory access operations. We can thus trade CPU operations and CPU–GPU data transfer for GPU processing, resulting in twice faster streaming of DEMs from main memory into GPU memory. A novel storage format for decoded DEMs on the GPU facilitates a sustained rendering throughput of about 300 million triangles per second. Due to these properties, the proposed scheme enables scalable rendering with respect to the display resolution independent of the data size. For a maximum screen-space error below 1 pixel it achieves frame rates of over 100 fps, even on high-resolution displays. We validate the efficiency of the proposed method by presenting experimental results on scanned elevation models of several hundred gigabytes.  相似文献   

16.
Bidirectional Texture Functions (BTFs) are among the highest quality material representations available today and thus well suited whenever an exact reproduction of the appearance of a material or complete object is required. In recent years, BTFs have started to find application in various industrial settings and there is also a growing interest in the cultural heritage domain. BTFs are usually measured from real‐world samples and easily consist of tens or hundreds of gigabytes. By using data‐driven compression schemes, such as matrix or tensor factorization, a more compact but still faithful representation can be derived. This way, BTFs can be employed for real‐time rendering of photo‐realistic materials on the GPU. However, scenes containing multiple BTFs or even single objects with high‐resolution BTFs easily exceed available GPU memory on today's consumer graphics cards unless quality is drastically reduced by the compression. In this paper, we propose the Bidirectional Sparse Virtual Texture Function, a hierarchical level‐of‐detail approach for the real‐time rendering of large BTFs that requires only a small amount of GPU memory. More importantly, for larger numbers or higher resolutions, the GPU and CPU memory demand grows only marginally and the GPU workload remains constant. For this, we extend the concept of sparse virtual textures by choosing an appropriate prioritization, finding a trade off between factorization components and spatial resolution. Besides GPU memory, the high demand on bandwidth poses a serious limitation for the deployment of conventional BTFs. We show that our proposed representation can be combined with an additional transmission compression and then be employed for streaming the BTF data to the GPU from from local storage media or over the Internet. In combination with the introduced prioritization this allows for the fast visualization of relevant content in the users field of view and a consecutive progressive refinement.  相似文献   

17.
Hardware-accelerated volume rendering using the GPU is now the standard approach for real-time volume rendering, although limited graphics memory can present a problem when rendering large volume data sets. Volumetric compression in which the decompression is coupled to rendering has been shown to be an effective solution to this problem; however, most existing techniques were developed in the context of software volume rendering, and all but the simplest approaches are prohibitive in a real-time hardware-accelerated volume rendering context. In this paper we present a novel block-based transform coding scheme designed specifically with real-time volume rendering in mind, such that the decompression is fast without sacrificing compression quality. This is made possible by consolidating the inverse transform with dequantization in such a way as to allow most of the reprojection to be precomputed. Furthermore, we take advantage of the freedom afforded by off-line compression in order to optimize the encoding as much as possible while hiding this complexity from the decoder. In this context we develop a new block classification scheme which allows us to preserve perceptually important features in the compression. The result of this work is an asymmetric transform coding scheme that allows very large volumes to be compressed and then decompressed in real-time while rendering on the GPU.  相似文献   

18.
现代3D图形处理器已从固定渲染管线发展成可编程渲染管线,且其并行度越来越高,研究并设计高性能的3D图形处理器对3D图形处理具有重要意义。着色器是实现3D图形处理器的核心,因此开发性能高、面积小、功耗低又易于扩展的着色器对3D图形处理器的开发具有重要作用。提出的统一架构图形处理器基于单指令多线程和单指令多数据,单指令多线程可以提高图形处理的并行度,从而提高图形处理性能;单指令多数据可以降低设计复杂度,从而实现面积小、功耗低又易于扩展的着色器。实验结果表明,提出的统一架构图形处理器在面积较小、功耗较低的情况下实现了较高的性能,且设计可扩展性较好。  相似文献   

19.
In medical area, interactive three-dimensional volume visualization of large volume datasets is a challenging task. One of the major challenges in graphics processing unit (GPU)-based volume rendering algorithms is the limited size of texture memory imposed by current GPU architecture. We attempt to overcome this limitation by rendering only visible parts of large CT datasets. In this paper, we present an efficient, high-quality volume rendering algorithm using GPUs for rendering large CT datasets at interactive frame rates on standard PC hardware. We subdivide the volume dataset into uniform sized blocks and take advantage of combinations of early ray termination, empty-space skipping and visibility culling to accelerate the whole rendering process and render visible parts of volume data. We have implemented our volume rendering algorithm for a large volume data of 512 x 304 x 1878 dimensions (visible female), and achieved real-time performance (i.e., 3-4 frames per second) on a Pentium 4 2.4GHz PC equipped with NVIDIA Geforce 6600 graphics card ( 256 MB video memory). This method can be used as a 3D visualization tool of large CT datasets for doctors or radiologists.  相似文献   

20.
基于CUDA的并行加速渲染算法   总被引:1,自引:1,他引:0       下载免费PDF全文
GPU可以快速有效的处理海量数据,因此在近些年成为图形图像数据处理领域的研究热点。针对现有GPU渲染中在处理含有大量相同或相似模型场景时存在资源利用率低下和带宽消耗过大的问题,在原有GPU渲染架构的基础上提出了一种基于CUDA的加速渲染方法。在该方法中,根据现有的GPU渲染模式构建对应的模型,通过模型找出其不足,从而引申出常量内存的概念;然后分析常量内存的特性以及对渲染产生的作用,从而引入基于常量内存控制的方法来实现渲染的加速,整个渲染过程可以通过渲染算法进行控制。实验结果表明,该方法对解决上述问题具有较好的效果,最终实现加速渲染。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号