期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Frishman Y Tal A 《IEEE transactions on visualization and computer graphics》2007,13(6):1310-1319

This paper presents a new algorithm for force directed graph layout on the GPU. The algorithm, whose goal is to compute layouts accurately and quickly, has two contributions. The first contribution is proposing a general multi-level scheme, which is based on spectral partitioning. The second contribution is computing the layout on the GPU. Since the GPU requires a data parallel programming model, the challenge is devising a mapping of a naturally unstructured graph into a well-partitioned structured one. This is done by computing a balanced partitioning of a general graph. This algorithm provides a general multi-level scheme, which has the potential to be used not only for computation on the GPU, but also on emerging multi-core architectures. The algorithm manages to compute high quality layouts of large graphs in a fraction of the time required by existing algorithms of similar quality. An application for visualization of the topologies of ISP (Internet Service Provider) networks is presented. 相似文献

2.

GPU加速的台风可视化方法

下载免费PDF全文

秦绪佳张勤锋陈坚郑红波徐晓刚《中国图象图形学报》2012,17(2):293-300

自然现象的可视化是计算机图形学和虚拟现实领域的重要研究内容。对传统光线投射算法分析的基础上进行改进,提出基于球壳体的光线投射算法。将GPU运用于球壳体数据场的体绘制,设计了基于球壳体数据场的顶点着色程序和像素着色程序。同时,对台风源数据格式进行解析,生成了用于台风可视化的体数据,采用提出的算法实现了台风云层和因子的可视化。实验结果表明,本文基于GPU的球壳体光线投射算法在球体表面较好地实现了实时台风可视化效果。相似文献

3.

基于GPU的流体动力学模拟 总被引：2，自引：0，他引：2

杨冰蒋杰应龙吴玲达《计算机工程与应用》2007,43(11):7-10

提出了一种基于GPU的流体动力学可视化方法。首先,分析了流体动力学的物理模型,用合理的数学表达式表述了该模型,并且给出了求解方法;其次,设计了在GPU上实现流体动力学模拟的算法,既模拟出逼真的运动效果又控制了算法的复杂度;最后的实验证明了该文算法在执行效率上较以往基于CPU的算法有很大的提高,并且模拟的结果逼真、可信。算法充分吸收了以往方法的优点,对流体动力学的可视化的细节采用了最优的物理模型及快速的数值解法,具有较强的稳健性和创新性。相似文献

4.

Zippy: A Framework for Computation and Visualization on a GPU Cluster 总被引：1，自引：0，他引：1

Zhe Fan Feng Qiu Arie E. Kaufman 《Computer Graphics Forum》2008,27(2):341-350

Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general‐purpose computation and visualization applications. However, the programming model for high performance general‐purpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy frame‐work, a general and scalable solution to this problem. It abstracts the GPU cluster programming with a two‐level parallelism hierarchy and a non‐uniform memory access (NUMA) model. Zippy preserves the advantages of both message passing and shared‐memory models. It employs global arrays (GA) to simplify the communication, synchronization, and collaboration among multiple GPUs. Moreover, it exposes data locality to the programmer for optimal performance and scalability. We present three example applications developed with Zippy: sort‐last volume rendering, Marching Cubes isosurface extraction and rendering, and lattice Boltzmann flow simulation with online visualization. They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster. 相似文献

5.

Accelerating Louvain community detection algorithm on graphic processing unit

Mohammadi Maryam Fazlali Mahmood Hosseinzadeh Mehdi 《The Journal of supercomputing》2021,77(6):6056-6077

The Louvain community detection algorithm is a hierarchal clustering method categorized in the NP-hard problem. Its execution time to find communities in large graphs is, therefore, a challenge. Parallelization is an effective solution for amortizing Louvain's execution time. In this paper, we propose an adaptive CUDA Louvain method (ACLM) algorithm that benefits from the graphic processing unit (GPU). ACLM uses the shared memory in GPU, as well as the optimal number of threads in the GPU blocks. These features minimize parallelization overhead and accelerate the calculation of modularity parameters. The proposed algorithm allocates threads to each block based on the number of required streaming multiprocessors (SMs) and warps on GPU. The implementation results show that ACLM can effectively accelerate the execution time by 77% compared to the competitive method in the large graph benchmarks.

相似文献

6.

一种基于GPU集群的深度优先并行算法设计与实现

余莹李肯立郑光勇《计算机科学》2015,42(1):82-85

深度优先搜索算法在GPU集群中大型图上的简单执行,会导致线程间的负载不平衡和无法合并内存访问的情况,这使得算法的性能较低.为了明显提高算法在单个GPU和多个GPU环境下的性能,在处理数据之前通过采取一系列有效的操作来进行重新编排.提出了构造线程和数据之间映射的新技术,通过利用前缀求和及二分查找操作来达到完美的负载平衡.为了降低通信开销,对DFS各分支中需要进行交换的边集执行修剪操作.实验结果表明,算法在单个GPU上可以尽可能地实现最佳的并行性,在多GPU环境下可以最小化通信开销.在一个GPU集群中,它可以对合有数十亿节点的图有效地执行分布式DFS. 相似文献

7.

Efficient asynchronous executions of AMR computations and visualization on a GPU system

Hari K. Raghavan Sathish S. Vadhiyar 《Journal of Parallel and Distributed Computing》2013

Adaptive Mesh Refinement is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. In-situ visualization plays an important role for analyzing the time evolving characteristics of the domain structures. Continuous visualization of the output data for various timesteps results in a better study of the underlying domain and the model used for simulating the domain. In this paper, we develop strategies for continuous online visualization of time evolving data for AMR applications executed on GPUs. We reorder the meshes for computations on the GPU based on the users input related to the subdomain that he wants to visualize. This makes the data available for visualization at a faster rate. We then perform asynchronous executions of the visualization steps and fix-up operations on the CPUs while the GPU advances the solution. By performing experiments on Tesla S1070 and Fermi C2070 clusters, we found that our strategies result in 60% improvement in response time and 16% improvement in the rate of visualization of frames over the existing strategy of performing fix-ups and visualization at the end of the timesteps. 相似文献

8.

3D非均匀直线网格GPU体绘制方法研究

下载免费PDF全文

袁斌《图学学报》2010,31(3):76

计算机图形硬件技术的快速发展可以用来加速可视化过程,为此针对非均匀直线网格,给出了基于均匀辅助网格的CPU光线投射算法、基于辅助纹理的GPU光线投射算法,以及基于切片的3D纹理体绘制算法,并在Nvidia Geforce 6800GT图形卡上对这些算法进行了测试。结果表明,GPU算法远远快于CPU算法,而基于切片的3D纹理体绘制算法则快于GPU光线投射算法。相似文献

9.

基于GPU的非牛顿流体自由表面绘制方法 总被引：1，自引：0，他引：1

下载免费PDF全文

蒋杰应龙杨冰吴玲达《计算机工程与应用》2007,43(18):19-23

提出一种基于GPU的非牛顿流体自由表面绘制方法。首先,分析了非牛顿流体的物理模型,将流体的运动规律用合理的数学表达式进行描述;其次,针对非牛顿流体的特点设计了合理的可视模型,提出了流体运动及自由表面的绘制方法,并且设计了相应的GPU实现算法;最后的实验证明了算法在合理的时间内能完全逼真的对非牛顿流体的自由表面进行真实的再现。算法充分吸收了以往方法的优点,采用了合理的数学模型,并利用GPU的运算特性实现了非牛顿流体自由表面的绘制,在绘制效果和效率上较以往算法都有较大改进。相似文献

10.

GO: A cluster algorithm for graph visualization

《Journal of Visual Languages and Computing》2015

As we are in the big data age, graph data such as user networks in Facebook and Flickr becomes large. How to reduce the visual complexity of a graph layout is a challenging problem. Clustering graphs is regarded as one of effective ways to address this problem. Most of current graph visualization systems, however, directly use existing clustering algorithms that are not originally developed for the visualization purpose. For graph visualization, a clustering algorithm should meet specific requirements such as the sufficient size of clusters, and automatic determination of the number of clusters. After identifying the requirements of clustering graphs for visualization, in this paper we present a new clustering algorithm that is particularly designed for visualization so as to reduce the visual complexity of a layout, together with a strategy for improving the scalability of our algorithm. Experiments have demonstrated that our proposed algorithm is capable of detecting clusters in a way that is required in graph visualization. 相似文献

11.

Efficient breadth first search on multi-GPU systems

Enrico Mastrostefano Massimo Bernaschi 《Journal of Parallel and Distributed Computing》2013

Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. To obtain a significant improvement on a single GPU and to scale by using multiple GPUs, we resort to a suitable combination of operations to rearrange data before processing them. We propose a novel technique for mapping threads to data that achieves a perfect load balance by leveraging prefix-sum and binary search operations. To reduce the communication overhead, we perform a pruning operation on the set of edges that needs to be exchanged at each BFS level. The result is an algorithm that exploits at its best the parallelism available on a single GPU and minimizes communication among GPUs. We show that a cluster of GPUs can efficiently perform a distributed BFS on graphs with billions of nodes. 相似文献

12.

Efficient decomposition of strongly connected components on GPUs

《Journal of Systems Architecture》2014,60(1):1-10

The GPU (Graphics Processing Unit) has recently become one of the most power efficient processors in embedded and many other environments, and has been integrated into more and more SoCs (System on Chip). Thus modern GPUs play a very important role in power aware computing. Strongly Connected Component (SCC) decomposition is a fundamental graph algorithm which has wide applications in model checking, electronic design automation, social network analysis and other fields. GPUs have been shown to have great potential in accelerating many types of computations including graph algorithms. Recent work have demonstrated the plausibility of GPU SCC decomposition, but the implementation is inefficient due to insufficient consideration of the distinguishing GPU programming model, which leads to poor performance on irregular and sparse graphs.This paper presents a new GPU SCC decomposition algorithm that focuses on full utilization of the contemporary embedded and desktop GPU architecture. In particular, a subgraph numbering scheme is proposed to facilitate the safe and efficient management of the subgraph IDs and to serve as the basis of efficient source selection. Furthermore, we adopt a multi-source partition procedure that greatly reduces the recursion depth and use a vertex labeling approach that can highly optimize the GPU memory access. The evaluation results show that the proposed approach achieves up to 41× speedup over Tarjan’s algorithm, one of the most efficient sequential SCC decomposition algorithms, and up to 3.8× speedup over the previous GPU algorithms. 相似文献

13.

基于几何着色器的流场动态可视化研究与实现

下载免费PDF全文

樊宇吕晓琪张继凯王月明张信雪《计算机工程与应用》2019,55(9):157-161

为有效解决复杂流场可视化效率低下问题，加快可视化速度，提出了一种基于几何着色器的快速流场可视化算法。在流场可视化的过程当中引入几何着色器，利用GPU的并行处理能力和强大的图像处理能力对流场实时地进行箭头和流线的绘制，然后采用积分颜色映射方法，与常规的线性颜色映射法相比较，改善了颜色的均匀分布，增强流场强度层次感。实验表明，该算法可以有效地反映流场特征分布，减少可视化过程中的数据传输量，降低资源浪费，提高可视化渲染效率。相似文献

14.

Fermi架构下超声成像组织运动可视化并行算法

何兴无《计算机系统应用》2013,22(4):147-152

在临床超声实时成像系统中组织运动情况是医生想要获取的重要诊断信息, 例如心脏运动. 基于线积分卷积的二维矢量场可视化技术可以同时展现运动矢量场的强度和方向. 但这一算法在处理时涉及大量的复杂计算, 尤其是流线追踪处理部分, 使其成为临床实时成像系统中的一大性能提升瓶颈. 为此研究并提出了一种基于新兴的高性能并行计算平台Fermi架构GPU(graphics processing unit图形处理单元)的并行运动可视化算法. 数据测试结果显示, 与基于CPU的实现相比, 采用Fermi架构的GPU处理不仅可相似文献

15.

基于GPU的地下管线三维可视化建模研究

刘浩赵文吉段福洲曹巍潘李亮《计算机工程与应用》2013,49(18):145-148

地下管线的三维建模与可视化是构建“数字城市”的重要内容,总结现有地下管线实时建模算法的不足,提出一种利用GPU编程实现的地下管线实时三维可视化建模算法。利用现代GPU的可编程特性将管线建模的计算任务全部移植到GPU端完成,CPU端只需传入管径和管线节点坐标,利用GPU提供的几何着色器完成管线模型的顶点坐标计算、管线顶点数据自动生成及管线三角网构建等工作,并通过光照和纹理映射实现管线材质的真实感效果。实验结果表明,该算法克服了现有建模算法的缺陷,能够在保证管线拟合逼真度的基础上完成大规模管网系统三维实时可视化建模的任务。相似文献

16.

面向异构架构的传递闭包并行算法

肖汉郭宝云李彩林周清雷《计算机工程》2021,47(8):131-139

传统求图传递闭包的方法存在计算量大与计算时间长的问题。为加快处理大数据量的传递闭包算法的计算速度,结合算法密集计算和开放式计算语言（OpenCL）框架的特征,采用本地存储器优化的并行子矩阵乘和分块的矩阵乘并行计算,提出一种基于OpenCL的传递闭包并行算法。利用本地存储器优化的并行子矩阵乘算法来优化计算步骤,提高图形处理器（GPU）的存储器利用率,降低数据获取延迟。通过分块矩阵乘并行计算算法实现大数据量的矩阵乘,提高GPU计算核心的利用率。数据结果表明,与CPU串行算法、基于开放多处理的并行算法和基于统一设备计算架构的并行算法相比,传递闭包并行算法在OpenCL架构下NVIDIA GeForce GTX 1070计算平台上分别获得了593.14倍、208.62倍和1.05倍的加速比。相似文献

17.

Algorithm visualization using tree graphs

Konstantinos Konstantinides 《The Visual computer》1991,7(4):220-228

Recent advances in graphics workstations allow the development of improved visualization tools for algorithm and program development. Algorithm visualization permits better analysis, development, and presentation of the algorithm characteristics. In this paper, we present a simple algorithm visualization technique using tree graphs. The technique is applied to the visualization of three sorting algorithms: the bubble sort, the quicksort, and the merge and sort, and one matrix algorithm, the Gaussian elimination. Key states of the data are displayed on the nodes, while the graph itself represents the underlying structure of the algorithm. All graphics are displayed under the X Window environment using simple graphics and window programming techniques. 相似文献

18.

基于图匹配的分层布局算法

赵玉聪钟志农吴烨景宁《计算机与现代化》2015,(8):107

针对节点数目较大并且度数比较平均的无向图,根据分层扩展的思想,提出一种基于图匹配的分层布局算法（Graph Matching Hierarchy,GMH）。基于图匹配思想对大图进行递归化简,然后应用FR算法对最粗化图进行布局,最后利用质心布局算法对图进行扩展。实验结果表明,GMH算法能够提高可视化效率,改善布局效果,且分层布局的结果更易于理解。  相似文献

19.

Performance of dynamic texture segmentation using GPU

Francisco Gómez Fernández María Elena Buemi Juan Manuel Rodríguez Julio C. Jacobo-Berlles 《Journal of Real-Time Image Processing》2016,11(2):375-383

This work is focused on the assessment of the use of GPU computation in dynamic texture segmentation under the mixture of dynamic textures (MDT) model. In this generative video model, the observed texture is a time-varying process commanded by a hidden state process. The use of mixtures in this model allows simultaneously handling of different visual processes. Nowadays, the use of GPU computing is growing in high-performance applications, but the adaptation of existing algorithms in such a way as to obtain a benefit from its use is not an easy task. In this paper, we made two implementations, one in CPU and the other in GPU, of a known segmentation algorithm based on MDT. In the MDT algorithm, there is a matrix inversion process that is highly demanding in terms of computing power. We make a comparison between the gain in performance obtained by porting to GPU this matrix inversion process and the gain obtained by porting to GPU the whole MDT segmentation process. We also study real-time motion segmentation performance by separating the learning part of the algorithm from the segmentation part, leaving the learning stage as an off-line process and keeping the segmentation as an online process. The results of performance analyses allow us to decide the cases in which the full GPU implementation of the motion segmentation process is worthwhile. 相似文献

20.

锥束CT检测成像仿真系统的研究与实现 总被引：1，自引：0，他引：1

金智勇赵星张慧滔杨涛董莹莹张朋《小型微型计算机系统》2010,31(1)

设计并实现一个用于CT专业教学和科研的锥束CT检测成像仿真系统.该仿真系统能够模拟对CT设备的控制,生成正投影数据,重建被测物体,并能对重建结果进行三维显示.该系统包括如下关键技术:按真实比例对CT设备建模,实现了锥束CT工作过程的可视化;用场景图方法组织CT场景,简化了场景图形的管理方式;采用GPU加速CT正投影、图像重建和体绘制算法,在一定程度上解决了锥束CT大数据量快速计算问题.实验结果表明,该系统具有较快的运行速度,可满足用户交互操作的需要. 相似文献