期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曾青华袁家斌《计算机与数字工程》2013,41(3)

由于MapReduce模型进行Map和Reduce操作时需要频繁的CPU计算,面对大量并行计算任务时,CPU占用率甚至达到百分之百.而GPU有比CPU更好的并行计算能力,适度使用GPU,可降低了CPU的占用时间,又能用GPU的参与来平衡系统的计算能力.论文结合GPU技术和MapReduce技术的不同优势,设计出一种基于MapReduce和GPU双重并行计算的云计算模型.通过理论建模与实验验证,结果表明此模型可实现多GPU的MapReduce任务并行处理,提高了高性能计算的性能. 相似文献

2.

基于MapReduce的关联规则增量更新算法

朱晓峰李玲娟徐小龙陈建新《微机发展》2012,(4):115-118,122

云计算以其强大的存储和计算能力而成为解决海量数据挖掘问题的有效途径。经典的关联规则增量更新算法FUP需要频繁扫描原数据集,不适用于海量数据的处理。文中以提高海量数据上关联规则增量更新效率为目标,将FUP算法与云计算的MapReduce编程模式相结合,提出了一种基于MapReduce的关联规则增量更新算法MRFUP。该算法只需扫描原数据集一次,并能充分利用云计算强大的存储和并行计算能力。基于Hadoop的实验结果表明,MRFUP算法可提高对海量数据的处理能力和效率,适用于海量数据的关联规则挖掘。相似文献

3.

基于 MapReduce 的关联规则增量更新算法 总被引：1，自引：0，他引：1

朱晓峰李玲娟徐小龙陈建新《计算机技术与发展》2012,(4)

云计算以其强大的存储和计算能力而成为解决海量数据挖掘问题的有效途径.经典的关联规则增量更新算法FUP 需要频繁扫描原数据集,不适用于海量数据的处理.文中以提高海量数据上关联规则增量更新效率为目标,将 FUP算法与云计算的 MapReduce 编程模式相结合,提出了一种基于 MapReduce 的关联规则增量更新算法 MRFUP.该算法只需扫描原数据集一次,并能充分利用云计算强大的存储和并行计算能力.基于 Hadoop 的实验结果表明,MRFUP 算法可提高对海量数据的处理能力和效率,适用于海量数据的关联规则挖掘相似文献

4.

基于GPU的多属性数据快速融合渲染研究

朱化红邓飞刘静玮《广东电脑与电讯》2015,1(6):27-30

由于传统的渲染技术是使用CPU 进行数据体颜色计算或融合处理的,这种技术对大规模数据体进行渲染时效率低、时间长,针对这种情况提出一种采用GPU 进行数据体颜色计算和融合处理的方法。该方法充分利用GPU 强大的并行处理能力,将待渲染的数据以纹理形式提交给GPU,由GPU 进行必要的颜色插值和融合处理后直接渲染。实验结果表明, 该方法能够将多种属性融为一体,有机地结合了各属性的优点,能对油气储层进行综合评价,提高储层分析和解释的准确度, 并且使用了硬件加速功能,渲染速度快。相似文献

5.

一种适应GPU的混合OLAP查询处理模型

张宇张延松陈红王珊《软件学报》2016,27(5):1246-1265

通用GPU因其强大的并行计算能力成为新兴的高性能计算平台,并逐渐成为近年来学术界在高性能数据库实现技术领域的研究热点.但当前GPU数据库领域的研究沿袭的是ROLAP(relational OLAP)多维分析模型,研究主要集中在关系操作符在GPU平台上的算法实现和性能优化技术,以哈希连接的GPU并行算法研究为中心.GPU拥有数千个并行计算单元,但其逻辑控制单元较少,相对于CPU具有更强的并行计算能力,但逻辑控制和复杂内存管理能力较弱,因此并不适合需要复杂数据结构和复杂内存管理机制的内存数据库查询处理算法直接移植到GPU平台.提出了面向GPU向量计算特性的混合OLAP多维分析模型semi-MOLAP,将MOLAP(multidimensionalOLAP)模型的直接数组访问和计算特性与ROLAP模型的存储效率结合在一起,实现了一个基于完全数组结构的GPU semi-MOLAP多维分析模型,简化了GPU数据管理,降低了GPU semi-MOLAP算法复杂度,提高了GPU semi-MOLAP算法的代码执行率.同时,基于GPU和CPU计算的特点,将semi-MOLAP操作符拆分为CPU和GPU平台的协同计算,提高了CPU和GPU的利用率以及OLAP的查询整体性能. 相似文献

6.

影像数据分布并行计算处理平台体系架构研究

《计算机工程》2017,(5)

遥感影像数据并行处理系统大多依赖于国外商用产品,而国内自主化并行计算处理系统的任务流程化支撑能力以及并行计算性能难以适应规模化生产。为此,基于Hadoop的HDFS,MapReduce集群并行架构、CPU和GPU协同并行处理、内存映像、BMP等技术,提出流程驱动执行的高性能分布式并行计算处理平台体系架构。实验结果表明,工作站集群和工作站内多粒度混合的并行计算架构提高了平台并行处理性能,为海量遥感影像数据产品的批量生产提供一种自主化解决方案。相似文献

7.

消息代理机制下的MapReduce数据流优化

葛君伟蒋仙方义秋《计算机工程与应用》2013,49(5)

MapReduce编程模型是广泛应用于云计算环境下处理海量数据的一种并行计算框架。然而该框架下的面向数据密集型计算,集群节点间的数据传输依赖性较强,造成节点间的消息处理负载过重。提出基于消息代理机制的MapReduce改进模型,优化数据流。经实验数据表明,基于消息代理机制的MapReduce框架能提高数据密集型应用上的负载均衡。相似文献

8.

GPU计算在油气勘探中应用前景 总被引：1，自引：1，他引：0

林茂塔依尔邹杰景少军关宇《计算机系统应用》2013,22(3):6-10

油气勘探数据处理工作涉及大量计算,需要高性能计算技术的扶助,目前流行的PC集群在处理工作中存在一些问题,GPU作为一种辅助计算设备能够配合CPU完成一些密集计算的工作.作为一种新兴的高性能计算技术,GPU编程技术的特点使其更适合于中小规模密集型计算环境,因此需要计算机人员在引进该技术时谨慎考虑配置模式,以GPU/CPU协同工作模式有效提升处理系统计算效率. 相似文献

9.

CPU与GPU的技术比较

杨柳《计算机光盘软件与应用》2012,(22):143+138

CPU与GPU各有所长。CPU的资源多用于缓存,而GPU的资源多用于数据计算。将CPU技术就进行比较,希望创造具有高性能处理器与独立显卡的处理性能,从而提高了电脑的运行效率,提高更好的性价比,使其为我们带来更好的选择。相似文献

10.

基于云计算的并行K-means聚类算法研究 总被引：2，自引：0，他引：2

谢雪莲李兰友 《计算机测量与控制》2014,22(5):1510-1512

目前数据呈爆炸式增长,海量存储状态,给聚类研究带来了诸如计算复杂性和计算能力不足都很多问题;而云计算平台通过负载均衡,动态配置大量的虚拟计算资源,有效地突破了耗时耗能的瓶颈,在海量数据挖掘中体现出了其独特的优势;文章深入研究了基于云计算平台Hadoop的并行K-means算法,并结合MapReduce分布式计算模型,给出了算法设计的方法和策略,包括MapReduce处理的map、shuffle和Reduce 3个过程,仿真结果表明K-means并行算法的效率较高。相似文献

11.

Providing Source Code Level Portability Between CPU and GPU with MapCG

下载免费PDF全文

洪春涛陈德颢陈羽北陈文光郑纬民林海波《计算机科学技术学报》2012,27(1):42-56

Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years.At present,the common approach to programming GPU units is to write GPU specific cod... 相似文献

12.

GPGPU编程技术初探

林茂董玉敏邹杰杨敏张晋楠《电脑编程技巧与维护》2010,(2):15-17,23

伴随着GPGPU计算技术的不断发展,HPC高性能计算系统体系结构正在悄然发生着一场变革,这场变革为高性能计算发展提供了一个新的方向、CUDA是NIVIDIA公司提供的利用GPGPU进行并行运算应用开发的一套C语言编程平台,通过它可以利用特定显卡的高性能运算能力进行一些大规模高性能计算,有效提升计算机系统的使用效率,本文主要介绍GPU发展现状以及如何利用CUDA编程技术进行并行运算软件开发．相似文献

13.

MPtostream:an OpenMP compiler for CPU-GPU heterogeneous parallel systems

《中国科学:信息科学(英文版)》2012,(9):1961-1971

In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big challenge.The OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific computing.To effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the transplantation.We have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing GPUs.Our experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone. 相似文献

14.

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Julien C. Thibault Inanc Senocak 《The Journal of supercomputing》2012,59(2):693-719

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially. 相似文献

15.

MapReduce:新型的分布式并行计算编程模型 总被引：3，自引：0，他引：3

李成华张新访金海向文《计算机工程与科学》2011,33(3):129

MapReduce是Google提出的分布式并行计算编程模型,用于大规模数据的并行处理。Ma-pReduce模型受函数式编程语言的启发,将大规模数据处理作业拆分成若干个可独立运行的Map任务,分配到不同的机器上去执行,生成某种格式的中间文件,再由若干个Reduce任务合并这些中间文件获得最后的输出文件。用户在使用MapReduce模型进行大规模数据处理时,可以将主要精力放在如何编写Map和Reduce函数上,其它并行计算中的复杂问题诸如分布式文件系统、工作调度、容错、机器间通信等都交给MapReduce系统处理,在很大程度上降低了整个编程难度。MapReduce日益成为云计算平台的主流编程模型。Apache Hadoop项目提供开源的MapReduce系统还有待进一步完善。相似文献

16.

Implementation of GPU virtualization using PCI pass-through mechanism

Chao-Tung Yang Jung-Chun Liu Hsien-Yi Wang Ching-Hsien Hsu 《The Journal of supercomputing》2014,68(1):183-213

As a general purpose scalable parallel programming model for coding highly parallel applications, CUDA from NVIDIA provides several key abstractions: a hierarchy of thread blocks, shared memory, and barrier synchronization. It has proven to be rather effective at programming multithreaded many-core GPUs that scale transparently to hundreds of cores; as a result, scientists all over the industry and academia are using CUDA to dramatically expedite on production and codes. GPU-based clusters are likely to play an essential role in future cloud computing centers, because some computation-intensive applications may require GPUs as well as CPUs. In this paper, we adopted the PCI pass-through technology and set up virtual machines in a virtual environment; thus, we were able to use the NVIDIA graphics card and the CUDA high performance computing as well. In this way, the virtual machine has not only the virtual CPU but also the real GPU for computing. The performance of the virtual machine is predicted to increase dramatically. This paper measured the difference of performance between physical and virtual machines using CUDA, and investigated how virtual machines would verify CPU numbers under the influence of CUDA performance. At length, we compared CUDA performance of two open source virtualization hypervisor environments, with or without using PCI pass-through. Through experimental results, we will be able to tell which environment is most efficient in a virtual environment with CUDA. 相似文献

17.

Two-phase execution of binary applications on CPU/GPU machines

Erzhou Zhu Ruhui Ma Yang Hou Yindong Yang Feng Liu Haibing Guan 《Computers & Electrical Engineering》2014

High computational power of GPUs (Graphics Processing Units) offers a promising accelerator for general-purpose computing. However, the need for dedicated programming environments has made the usage of GPUs rather complicated, and a GPU cannot directly execute binary code of a general-purpose application. This paper proposes a two-phase virtual execution environment (GXBIT) for automatically executing general-purpose binary applications on CPU/GPU architectures. GXBIT incorporates two execution phases. The first phase is responsible for extracting parallel hot spots from the sequential binary code. The second phase is responsible for generating the hybrid executable (both CPU and GPU instructions) for execution. This virtual execution environment works well for any applications that run repeatedly. The performance of generated CUDA (Compute Unified Device Architecture) code from GXBIT on a number of benchmarks is close to 63% of the hand-tuned GPU code. It also achieves much better overall performance than the native platforms. 相似文献

18.

异构平台下格子Boltzmann方法实现及性能分析

张丹丹徐莹徐磊《计算机科学》2012,39(4):296-298,303

对CPU+GPU异构平台下的多种并行编程模式进行了研究,并针对格子Boltzmann方法实现了CUDA,MPI+CUDA,MPI+OpenMP+CUDA多级并行算法。结果表明,算法具有较好的加速性能;提出的根据计算量比例参数调节CPU和GPU之间负载均衡的方法,对于在异构平台上实现多级并行处理及资源的有效利用具有一定的参考和应用价值。相似文献

19.

A Distributed PTX Virtual Machine on Hybrid CPU/GPU Clusters

《Journal of Systems Architecture》2016

Hybrid CPU/GPU cluster recently has drawn lots of attention from high performance computing because of excellent execution performance and energy efficiency. Many supercomputing sites in the newest TOP 500 and Green 500 are built by hybrid CPU/GPU clusters instead of CPU clusters. However, the programming complexity of hybrid CPU/GPU clusters is so high such that most of users usually hesitate to move toward to this new cluster computing platform. To resolve this problem, we propose a distributed PTX virtual machine called BigGPU on heterogeneous clusters in this paper. As named, this virtual machine physically is a distributed system which is aimed at parallel re-compiling and executing the PTX codes by aggregating CPUs and GPUs available in a computational cluster. With the support of this virtual machine, users can regard a hybrid CPU/GPU as a single large-scale GPU. Consequently, they can develop applications by using only CUDA without combining MPI and multithreading APIs while can simultaneously use distributed CPUs and GPUs for resolving the same problem. Moreover, they need not handle the problem of load balance among heterogeneous processors and the constraints of device memory and thread configuration existing in physical GPUs because BigGPU supports large-scale virtual device memory space and thread configuration. On the other hand, we have evaluated the execution performance of BigGPU in this paper. Our experimental results have shown that BigGPU indeed can effectively exploit the computational power of CPUs and GPUs for enhancing the execution performance of user's CUDA programs. 相似文献

20.

并行时空处理模型下的快速N-body算法

下载免费PDF全文

王伟曾栩鸿王福焕傅丽丽曾国荪《计算机科学与探索》2011,5(11):1006-1013

图形处理器(graphic processing unit,GPU)的最新发展已经能够以低廉的成本提供高性能的通用计算。基于GPU的CUDA(compute unified device architecture)和OpenCL(open computing language)编程模型为程序员提供了充足的类似于C语言的应用程序接口(application programming interface,API),便于程序员发挥GPU的并行计算能力。采用图形硬件进行加速计算,通过一种新的GPU处理模型——并行时间空间模型,对现有GPU上的N-body实现进行了分析,从而提出了一种新的GPU上快速仿真N-body问题的算法,并在AMD的HD Radeon 5850上进行了实现。实验结果表明,相对于CPU上的实现,获得了400倍左右的加速;相对于已有GPU上的实现,也获得了2至5倍的加速。相似文献