首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
光栅化是图形渲染管线中的关键环节,光栅化加速器是决定图形处理器性能的重要部件.基于重心坐标系的透视校正插值算法,在边函数算法基础上,提出双向四行并行的扫描方式,优化了光栅化算法,提高了图形渲染管线效率.硬件上实现了多线程双向并行扫描的光栅化加速器,验证环境采用20 nmXCVU440平台芯片,综合实现后频率为125 M...  相似文献   

2.
几何变换(包含旋转,平移,缩放)单元是图形加速器中唯一表现图形动画的基本单元,其性能对于整个图形加速器起着至关重要的作用。本文针对图形加速器中几何变换的特性,提出其软件设计模型,并采用并行流水线的结构,用硬件设计与实现了图形加速器中的几何变换单元,提高了图形加速器的运行速度。最后将电路下载到FPGA开发板进行了验证。实验结果表明,本文设计的几何变换电路可以实现其功能,且其并行设计在大规模数据处理过程中更能体现出其良好的性能。  相似文献   

3.
填充是图形处理中一个很重要的操作。Turbo C 2.0中有一个很好的填充函数,但笔者在开发C语言的TVGA卡图形支撑软件包时,面临着不得不重新编写填充函数的问题。如何实现有界图形区域的填充?各种资料上少有介绍。下面是笔者的一些思考,供读者参考。 1.递归法 填充要完成的功能应该是从填充点出发,向四周扩散,直至碰到边界为止。更具体地说就是,先得到填充点的颜色,然  相似文献   

4.
多边形域填充是图形图像处理中最基本的操作之一。文章结合代数曲线积分思想与活性边表技术,提出了一种新的任意多边形代数积分算法。与传统多边形域填充算法相比,新算法不但能实现任意多边形域(如带孔区域、自相交区域)的有效填充,而且具有速度快、效率高的特点。因此有效解决了任意矢量图形转换为光栅图形的技术困难,经过在手写字符填充及多边形区域特征值计算上大量应用证明,该算法在矢量与光栅转换、字符填充、多边形区域特征值计算上有很强的实用价值。  相似文献   

5.
看到贵刊1996年第9期上宋德舜的文章《图形区域的快速填充算法》,文中阐述了用循环队列实现图形区域的快速填充算法,但该算法并不那么理想,同样是内点表示法表示的区域,用同样的循环队列,本文提供的新算法进行的填充速度是原算法的3—4倍。 一、原算法的不妥之处 在fill_area()函数中,原算法对区域中每一行  相似文献   

6.
一种改进的矢量花纹填充方法   总被引:1,自引:0,他引:1  
区域填充是在现今的大部分图形图像处理软件系统中必不可少的功能,一般有矢量花纹填充以及栅格花纹填充。矢量填充一般应用于有高分辨率,高像素要求的填充,因其填充复杂,时间过程繁琐只有小部分的应用;栅格填充因其填充过程简单,填充方法成熟故得到了广泛使用。矢量栅格混合填充是在综合两种不同模式填充优点的基础上提出的一种全新的填充模式:用简便的栅格填充的过程实现矢量填充所能达到的高品质填充效果。实验结果表明:混合填充保持着矢量填充的缩放不失真的特性,并拥有栅格填充效率。  相似文献   

7.
本文讨论任意封闭图形区域填充非递归算法,最后给出一个高效算法的BORLANDC实现。  相似文献   

8.
一种简单的图案填充算法   总被引:1,自引:0,他引:1  
区域填充在计算机图形图像处理中得到广泛的应用。本文实现了一种基于可行域的图案填充的方法。该算法能正确填充任意复杂形状的轮廓,并具有实现方便、速度快、算法简单、易于理解等特点,此快速算法具有很大的应用价值。  相似文献   

9.
Diamond公司宣布的Viper V550图形加速器。为商业客户和普通用户而设计的Diamond新的图形加速器采用最新的PC图形处理技术以适应新一代的商业软件、3D游戏和Internet环境。 Viper V550有PCI和AGP两种零售版本,采用强大的NVIDIA的RIVA TNT~*3D图形控制器,具有先进的128位的处理能力、16MB显存的设计。Benchmark的测试结果表明Viper V550相对于其竞争对手AGP版本性能高出39%,PCI版本性能高出46%。 Viper V550采用NVIDIA的RIVA TNT处理器,这是第一个具有单芯片128位3D处理能力的芯片,每个时钟回路能处理两像素。可调节的真彩分辨率高达1920×1200,填充率高达每秒2亿5千万像素  相似文献   

10.
针对图像分析中区域填充算法的自动化和通用性要求,对种子填充算法进行改进,提出了反向注入式种子填充算法。与常规的区域填充算法相比,该算法的特点是包括初始种子点在内的所有填充区域均完全由计算机自主、高效地处理,实现了任意复杂区域填充的普遍适用性,解决了扫描线算法和种子填充算法的不足。该算法能一次完成包含多个区域的填充问题,在处理多而密集区域的填充问题上具有很高的效率。  相似文献   

11.
We present specialized implementations of the preconditioned iterative linear system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms and many-core hardware co-processors based on the Intel Xeon Phi and graphics accelerators. For the conventional x86 architectures, our approach exploits task parallelism via the OmpSs runtime as well as a message-passing implementation based on MPI, respectively yielding a dynamic and static schedule of the work to the cores, with different numeric semantics to those of the sequential ILUPACK. For the graphics processor we exploit data parallelism by off-loading the computationally expensive kernels to the accelerator while keeping the numeric semantics of the sequential case.  相似文献   

12.
The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-effective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU).  相似文献   

13.
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with an approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle–particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.  相似文献   

14.
Cabral  B. Hunter  C.L. 《Computer》1989,22(8):77-84
An examination is made of three projects underway at LLNL. The Magic (Machine Graphics in Color) Project is developing techniques to move data from a Cray supercomputer and effectively display it on high-resolution color terminal in the scientist's office. In a follow-on to this project, laser disk technology provides a real-time animation capability. The Graphics Workstation Project has examined visualization applications of workstations with graphics accelerators and is currently extending this work to `graphics minisupercomputers'. The Advanced Visualization Research Project uses high-end graphics devices to perform algorithm development, especially in the implementation of volumetric methods  相似文献   

15.
The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing block-level linear algebra operations in the overall, distributed solver algorithm. Optimizations incorporated include: (1) an efficient memory mapping and synchronization interface to minimize data movement, (2) multi-process sharing of the accelerator within a node to obtain balanced load with multicore processors, and (3) an automatic memory management system to efficiently utilize accelerator memory when sub-matrices spill over the limits of device memory. Results are reported from our novel implementation that uses MAGMA and CUBLAS accelerator software systems simultaneously with ACML (2013)  [2] for multithreaded execution on processors. Overall, using 940 nVidia Tesla X2090 accelerators and 15,040 cores, the best heterogeneous execution delivers a 10.9-fold reduction in run time relative to an already efficient parallel multicore-only baseline implementation that is highly optimized with intra-node and inter-node concurrency and computation–communication overlap. Detailed quantitative results are presented to explain all critical runtime components contributing to hybrid performance.  相似文献   

16.
Problems of computational actuarial mathematics, dynamic financial analysis, and optimization of insurance business and the possibility of their solution by means of parallel computing on graphics accelerators are discussed. The ruin probability and other performance criteria of an insurance company are estimated by the Monte Carlo method. In many cases, it is the only applicable method. Since the ruin probability is small enough, to achieve an acceptable estimate accuracy, an astronomical number of simulations may be required. Parallelization of the Monte Carlo method and the use of graphical accelerators allow us getting the desired result in a reasonable time. The results of numerical experiments on the developed system of actuarial modeling are presented, allowing the use of graphical accelerator that supports Nvidia CUDA 1.3 and higher.  相似文献   

17.
MobileNet网络是一种广泛应用于嵌入式领域的深度神经网络,为了解决其硬件实现效率低的问题,同时达到在不同硬件资源下具有一定可伸缩性,提出了基于FPGA的一款MobileNet网络加速器结构,针对网络的堆叠结构特性设计了三级流水的加速阵列,并实现了在0~4000乘法器开销下都达到70% 以上的计算效率.最终在XIL...  相似文献   

18.
How some recently introduced graphics accelerators address window clipping is discussed. Some graphics accelerators can download the window-clip rectangle data structures used to represent the visible areas of a window to specialized RAM or hardware registers. These accelerators have special hardware and/or software to hold the clipping information and to properly clip graphics primitives. Others use window clip planes, which allow easy hardware representation of windows with arbitrary shape and complexity. They are ideal for clipping graphics output to nonrectangular windows because graphics output performance does not degrade as the number of rectangles needed to represent the window increases. CPU-based window clipping is examined. Devices in each of these classes are described  相似文献   

19.
脉动阵列结构规整、吞吐量大,适合矩阵乘算法,广泛用于设计高性能卷积、矩阵乘加速结构。在深亚微米工艺下,通过增大阵列规模来提升芯片计算性能,会导致频率下降、功耗剧增等问题。因此,结合3D集成电路技术,提出了一种将平面脉动阵列结构映射到3D集成电路上的双精度浮点矩阵乘加速结构3D-MMA。首先,设计了针对该结构的分块映射调度算法,提升矩阵乘计算效率;其次,提出了基于3D-MMA的加速系统,构建了3D-MMA的性能模型,并对其设计空间进行探索;最后,评估了该结构实现代价,并同已有先进加速器进行对比分析。实验结果表明,访存带宽为160GB/s时,采用4层16×16脉动阵列的堆叠结构时,3D-MMA计算峰值性能达3TFLOPS,效率达99%,且实现代价小于二维实现。在相同工艺下,同线性阵列加速器及K40GPU相比,3D-MMA的性能是后者的1.36及1.92倍,而面积远小于后者。探索了3D集成电路在高性能矩阵乘加速器设计中的优势,对未来进一步提升高性能计算平台性能具有一定的参考价值。  相似文献   

20.
The GRAphics AcceLerator (GRAAL) design-exploration framework is an open system that offers a coherent development methodology for hardware/software cosimulation and codesign of embedded 3D graphics accelerators. GRAAL incorporates tools to help visually debug graphics algorithms implemented in hardware and to estimate performance in terms of throughput, power consumption, and area.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号