首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a new isotropic remeshing method, based on Centroidal Voronoi Tessellation (CVT) . Constructing CVT requires to repeatedly compute Restricted Voronoi Diagram (RVD) , defined as the intersection between a 3D Voronoi diagram and an input mesh surface. Existing methods use some approximations of RVD. In this paper, we introduce an efficient algorithm that computes RVD exactly and robustly. As a consequence, we achieve better remeshing quality than approximation-based approaches, without sacrificing efficiency. Our method for RVD computation uses a simple procedure and a kd -tree to quickly identify and compute the intersection of each triangle face with its incident Voronoi cells. Its time complexity is O ( m log n ), where n is the number of seed points and m is the number of triangles of the input mesh. Fast convergence of CVT is achieved using a quasi-Newton method, which proved much faster than Lloyd's iteration. Examples are presented to demonstrate the better quality of remeshing results with our method than with the state-of-art approaches.  相似文献   

2.
由多核CPU和GPU构成的异构计算平台已经成为当前高性能计算的重要发展方向。为了有效提升列数据 库的查询性能,充分利用异构计算平台的计算资源,在一套已定义的列数据库原语集合的基础上,提出了一套原语调 度方法。该方法包括原语执行机制、基于动态规划的CPU原语调度方法和基于〔}PU显存管理机制的GPU原语调度 方法。这使得系统可合理利用多核CPU计算资源,有效利用GPU显存中数据的局部性,以提升整体性能。对"I'PG H基准程序中几个典型查询进行了测试,结果表示,CPU原语调度方法使查询更稳定,GPU原语调度方法使查询更 快。同时通过实验发现了此异构计算平台下的列数据库调度方法存在的不足,这为后续工作指明了改进方向。  相似文献   

3.
数据流编程语言简化了相关领域的编程,很好地把任务计算和数据通信分开,从而使应用程序分别在任务级和数据级均具有可并行性。针对GPU/CPU混合架构中存在的大量数据并行、任务并行和流水线并行等问题,提出并实现了面向GPU/CPU混合架构的数据流程序任务划分方法和多粒度调度策略,包括任务的分类处理、GPU端任务的水平分裂和CPU端离散任务的均衡化,构造了软件流水调度,经过编译优化生成OpenCL的目标代码。任务的分类处理根据数据流程序各个任务的计算特点和任务间的通信量大小,将各任务分配到合适的计算平台上;GPU端任务的水平分裂利用GPU端任务的并行性将其均衡分裂到各个GPU,以避免GPU间高额的通信开销影响程序整体的执行性能;CPU端离散任务的均衡化通过选择合适CPU核,将CPU端各任务均衡分配给各CPU核,以保证负载均衡并提高各CPU核的利用率。实验以多块NVIDIA Tesla C2050、多核CPU为混合架构平台,选取多媒体领域典型的算法作为测试程序,实验结果表明了划分方法和调度策略的有效性。  相似文献   

4.
The present paper describes an optimized C++ library for the study of electromagnetics. The implementation is based on the Finite-Difference Time-Domain method for transient analysis, and the Finite Element Method for electrostatics. Both methods share the same core and are optimized for CPU and GPU computing. To illustrate its running, FEM method is applied for solving Laplace’s equation analyzing the relation between surface curvature and electrostatic potential of a long cylindrical conductor, whereas FDTD is applied for analyzing Thin Film Filters at optical wavelengths. Furthermore, a comparison of the performance of both CPU and GPU versions is analyzed as a function of the grid size simulation. This approach allows the study of a wide range of electromagnetic problems taking advantage of the benefits of each numerical method and the computing power of the modern CPUs and GPUs.  相似文献   

5.
在异构资源环境中高效利用计算资源是提升任务效率和集群利用率的关键。Kuberentes作为容器编排领域的首选方案,在异构资源调度场景下调度器缺少GPU细粒度信息无法满足用户自定义需求,并且CPU/GPU节点混合部署下调度器无法感知异构资源从而导致资源竞争。综合考虑异构资源在节点上的分布及其硬件状态,提出一种基于Kubernetes的CPU/GPU异构资源细粒度调度策略。利用设备插件机制收集每个节点上GPU的详细信息,并将GPU资源指标提交给调度算法。在原有CPU和内存过滤算法的基础上,增加自定义GPU信息的过滤,从而筛选出符合用户细粒度需求的节点。针对CPU/GPU节点混合部署的情况,改进调度器的打分算法,动态感知应用类型,对CPU和GPU应用分别采用负载均衡算法和最小最合适算法,保证异构资源调度策略对不同类型应用的正确调度,并且在CPU资源不足的情况下充分利用GPU节点的碎片资源。通过对GPU细粒度调度和CPU/GPU节点混合部署情况下的调度效果进行实验验证,结果表明该策略能够有效进行GPU调度并且避免资源竞争。  相似文献   

6.
稀疏矩阵与向量相乘SpMV是求解稀疏线性系统中的一个重要问题,但是由于非零元素的稀疏性,计算密度较低,造成计算效率不高。针对稀疏矩阵存在的一些不规则性,利用混合存储格式来进行SpMV计算,能够提高对稀疏矩阵的压缩效率,并扩大其适应范围。HYB是一种广泛使用的混合压缩格式,其性能较为稳定。而随着GPU并行计算得到普遍应用以及CPU日趋多核化,因此利用GPU和多核CPU构建异构并行计算系统得到了普遍的认可。针对稀疏矩阵的HYB存储格式中的ELL和COO存储特征,把两部分数据分别分割到CPU和GPU进行协同并行计算,既能充分利用CPU和GPU的计算资源,又能够发挥CPU和GPU的计算特性,从而提高了计算资源的利用效能。在分析CPU+GPU异构计算模式的特征的基础上,对混合格式的数据分割和共享方面进行优化,能够较好地发挥在异构计算环境的优势,提高计算性能。  相似文献   

7.
Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high computational power and high performance per Watt. However, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation, while the CPU is responsible for the communication. This approach always requires a dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work we have shown that the GPU is able to control the communication independently of the CPU. However, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work we show how dynamic parallelism solves this problem. GPU-controlled communication in combination with dynamic parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Using other in-kernel synchronization methods results in massive performance losses, due to the forced serialization of the GPU thread blocks. Although the performance of applications using GPU-controlled communication is still slightly worse than the performance of hybrid applications, we will show that performance per Watt increases by up to 10% while still using commodity hardware.  相似文献   

8.
郭新钊  张军 《计算机仿真》2010,27(1):218-221
水面效果的仿真可大幅提高自然环境仿真的真实感,传统对于CPU的仿真存在占用CPU时间和系统资源的缺点,针对存在问题,建立了基于图形处理单元(GPU)的水面仿真方法,讨论水面特效在GPU上的实现、以及水面网格在GPU中的重构。因为运算以及水面网格重构都在GPU中完成,充分利用GPU强大的图形处理能力,因此不会造成额外的系统开支,并且增强了对水面细节的表现,使得水面的逼真度和实时性增强。  相似文献   

9.
CUDA是应用较广的GPU通用计算模型,BP算法是目前应用最广泛的神经网络模型之一。提出了用CUDA模型并行化BP算法的方法。用该方法训练BP神经网络,训练开始前将数据传到GPU,训练开始后计算隐含层和输出层的输入输出和误差,更新权重和偏倚的过程都在GPU上实现。将该方法用于手写数字图片训练练实验,与在四核CPU上的训练相比,加速比为6.12~8.17。分别用在CPU和GPU上训练得到的结果识别相同的测试集图片,GPU上的训练结果对图片的识别率比CPU上的高0.05%~0.22%。  相似文献   

10.
随着GPU通用计算能力的不断发展,一些新的更高效的处理技术应用到图像处理领域.目前已有一些图像处理算法移植到GPU中且取得了不错的加速效果,但这些算法没有充分利用CPU/GPU组成的异构系统中各处理单元的计算能力.文章在研究GPU编程模型和并行算法设计的基础上,提出了CPU/GPU异构环境下图像协同并行处理模型.该模型充分考虑异构系统中各处理单元的计算能力,通过图像中值滤波算法,验证了CPU/GPU环境下协同并行处理模型在高分辨率灰度图像处理中的有效性.实验结果表明,该模型在CPU/GPU异构环境下通用性较好,容易扩展到其他图像处理算法.  相似文献   

11.
网格重排序是提升流体力学CPU和GPU并行计算效率的重要手段之一。对于非结构网格,由于其数据存储无规律,数据的间接访问会导致访存延迟,尤其是在GPU并行计算时,数据的间接访问将引起内存的非对齐访问,放大了访存延迟的影响。对此,采用Reverse Cuthill-Mckee网格重排序方法优化了非结构网格的数据局部性,并设计了一种面向编号重排序方法。算例测试表明,网格重排序不影响最终计算结果。对比分析了网格重排序对非结构求解器在CPU和GPU上的性能影响:对CPU计算,可以使部分热点函数运行时间降低约20%,整体运行时间降低15%~20%;对GPU计算,大部分热点函数运行时间可降低35%~60%,程序整体运行时间降低约40%。  相似文献   

12.
CPU/GPU异构系统具有很大的发展潜力,深入研究CPU/GPU异构平台的并行优化,可实现系统整体计算能力的最大化。通过对CPU/GPU任务划分的优化来平衡CPU和GPU的负载,可提高计算资源的利用率,缩短计算任务的执行时间;通过对GPU线程划分的优化,可使GPU获得更高的速度。从而提高系统整体性能。  相似文献   

13.
GPU强大的计算性能使得CPU-GPU异构体系结构成为高性能计算领域热点研究方向.虽然GPU的性能/功耗比较高,但在构建大规模计算系统时,功耗问题仍然是限制系统运行的关键因素之一.现在已有的针对GPU的功耗优化研究主要关注如何降低GPU本身的功耗,而没有将CPU和GPU作为一个整体进行综合考虑.文中深入分析了CUDA程序在CPU-GPU异构系统上的运行特点,归纳其中的任务依赖关系,给出了使用AOV网表示程序执行过程的方法,并在此基础上分析程序运行的关键路径,找出程序中可以进行能耗优化的部分,并求解相应的频率调节幅度,在保持程序性能不变的前提下最小化程序的整体能量消耗.  相似文献   

14.
提出了一种基于开放运算语言(OpenCL)的GPU加速三维时域有限差分(FDTD)电磁场仿真计算的方法.该方法利用图形处理单元(GPU)的并行处理特性并结合OpenCL接口标准实现了时域卷积完全匹配层(CPML)吸收边界条件的三维FDTD的高性能加速计算.首先设置FDTD仿真参数并动态申请内存空间,然后初始化OpenCL的计算参数,对三维电磁模型基于OpenCL进行FDTD加速仿真.本方法显著提升了FDTD电磁场仿真速度,与利用CPU计算相比速度提升可达5-8倍,且具有CPML吸收边界条件,可以模拟电磁波在自由空间的传播;基于OpenCL编译的语言程序可以运行在CPU或GPU硬件上,并可充分发挥多核CPU的并行计算能力,使得FDTD电磁场仿真具有更广泛的实际应用.  相似文献   

15.
《Parallel Computing》2014,40(8):425-447
EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components.Using the hybrid OpenMP–OpenCL model of parallel programming opens the way to harness the power of CPU–GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU–GPU platforms. The main contributions of the paper are:
  • •method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations;
  • •method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques;
  • •method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources;
  • •approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs.
Hybrid platforms tested in this study contain different numbers of CPUs and GPUs – from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems – both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively.  相似文献   

16.
The computing power of graphics processing units (GPU) has increased rapidly, and there has been extensive research on general‐purpose computing on GPU (GPGPU) for cryptographic algorithms such as RSA, Elliptic Curve Cryptosystem (ECC), NTRU, and Advanced Encryption Standard. With the rise of GPGPU, commodity computers have become complex heterogeneous GPU+CPU systems. This new architecture poses new challenges and opportunities in high‐performance computing. In this paper, we present high‐speed parallel implementations of the rainbow method based on perfect tables, which is known as the most efficient time‐memory trade‐off, in the heterogeneous GPU+CPU system. We give a complete analysis of the effect of multiple checkpoints on reducing the cost of false alarms and take advantage of it for load balancing between GPU and CPU. For GTX460, our implementation is about 1.86 and 3.25 times faster than other GPU‐accelerated implementations, RainbowCrack and Cryptohaze, respectively, and for GTX580, 1.53 and 2.40 times faster. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
异构计算作为一种特殊的并行计算方式,能根据计算任务的特点发挥不同计算资源的能力,在提高服务器计算性能、能效比和实时性方面有极大优势,但目前异构计算环境存在编程复杂、可信性无法保证的问题.针对以上问题,提出了一个基于状态变迁矩阵(STM)的编程框架,可以集成GPU和FP-GA的资源.通过状态迁移矩阵对CUDA和Vivad...  相似文献   

18.
在实际工程应用中,使用传统的CPU串行计算来开展燃烧数值模拟往往难以满足对模拟速度的要求。利用GPU比CPU更强的计算能力,通过在交错网格上将燃烧物理方程离散化,使用预处理稳定双共轭梯度法(PBiCGSTAB)求解离散化方程,并且探索面向GPU编程的矩阵向量乘并行算法和逆矩阵向量乘并行算法,从而给出一种在GPU上数值求解层流扩散燃烧的可行方法。实验结果表明,GPU并行程序获得了相对串行CPU程序约10倍以上的加速效果,且计算结果与实际情况相符,因而所提方法是可行且高效的。  相似文献   

19.
Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency of the scheduling scheme, which selects the device to execute the application between the CPU and the GPU, is one of the most critical factors in determining the performance. This paper proposes a dynamic scheduling scheme for the selection of the device between the CPU and the GPU to execute the application based on the estimated-execution-time information. The proposed scheduling scheme enables the selection between the CPU and the GPU to minimize the completion time, resulting in a better system performance, even though it requires the training period to collect the execution history. According to our simulations, the proposed estimated-execution-time scheduling can improve the utilization of the CPU and the GPU compared to existing scheduling schemes, resulting in reduced execution time and enhanced energy efficiency of heterogeneous computing systems.  相似文献   

20.
为有效提高异构的CPU/GPU集群计算性能,提出一种支持异构集群的CPU与GPU协同计算的两级动态调度算法。根据各节点计算能力评测结果和任务请求动态分发数据,在节点内CPU和GPU之间动态调度任务,使用数据缓存和数据处理双队列机制,提高异构集群的传输和处理效率。该算法实现了集群各节点"能者多劳",避免了单节点性能瓶颈造成的任务长尾现象。实验结果表明,该算法较传统MPI/GPU并行计算性能提高了11倍。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号