首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
当前GPU集群的主流编程模型是MPI与CUDA的松散耦合,采用这种编程模型进行编程,存在编程复杂度大、程序的可移植性差、执行效率低等问题。为此,提出一种面向通用计算GPU集群的任务自动分配系统StreamMAP。对编译器进行改造,以编译制导的方式提供集群任务的计算资源需求,通过运行时系统动态地发现、建立并维护系统资源拓扑,设计一种较为契合GPU集群应用特征的任务分配策略。实验结果表明,StreamMAP系统能降低集群应用程序的编程复杂度,使之较为高效地利用GPU集群的计算资源,且程序的可移植性和可扩展性也得到了保证。  相似文献   

2.
应用GPU集群加速计算蛋白质分子场   总被引:1,自引:2,他引:1  
针对生物化学计算中采用量子化学理论计算蛋白质分子场所带来的巨大计算量的问题,搭建起一个GPU集群系统,用来加速计算基于量子化学的蛋白质分子场.该系统采用消息传递并行编程环境(MPI)连接集群各结点,以开放多线程OpenMP编程标准作为多核CPU编程环境,以CUDA语言作为GPU编程环境,提出并实现了集群系统结点中GPU和多核CPU协同计算的并行加速架构优化设计.在保持较高计算精度的前提下,结合MPI,OpenMP和CUDA混合编程模式,大大提高了系统的计算性能,并对不同体系和规模的蛋白质分子场模拟进行了计算分析.与相应的CPU集群、GPU单机和CPU单机计算方法对比,该GPU集群大幅度地提高了高分辨率复杂蛋白质分子场模拟的计算效率,比CPU集群的平均计算加速比提高了7.5倍.  相似文献   

3.
在集群与GPU组成的异构并行计算平台上,使用MPI+CUDA混合编程模型,实现基于ABEEMσπ模型的分子动力学模拟中电荷分布的计算.通过对电荷分布分布求解中的计算部分移植到GPU上进行,并针对算法中通信开销大和资源未充分利用的问题,通过异构平台的异步并发方法进行优化,提高了求解效率.性能测试结果表明,相比于单纯MPI并行算法,优化后GPU加速的异构并行算法,在化学大分子模型电荷分布计算上,有着明显的性能优势.  相似文献   

4.
由于超强的计算能力、高速访存带宽、支持大规模数据级并行程序设计等特点,GPU已经成为超级计算机和高性能计算(HPC)集群的主流加速器。随着处理单元的发展和集群节点的拓展,GPU集群不仅在节点层面呈现异构化,节点内也趋于异构化,大大提高了在GPU集群中编程的复杂度。主流GPU异构集群系统大多采用针对GPU的异构计算编程模型与面向分布式内存的消息传递模型的简单结合方式,这种方式使得GPU集群程序设计缺乏确定的准则,往往是低效而且易错的。为了提高在GPU集群中编程的效率,降低编程复杂度,以及实现平台无关性,提出一套异构GPU集群的并行分布式编程的解决方案。该方案通过采用扩展语言方法提出了编程框架DISPAR,并实现了预处理器系统StreamCC。实验证明了其可行性。  相似文献   

5.
根据21CMA相关器的算法特点,在对比基于CPU并行的MPI集群、MPI+CUDA异构并行集群和Hadoop+CUDA异构并行集群的架构特点的基础上,提出了一种基于Hadoop+CUDA平台实现软相关器的方法。本方法利用GPU在计算FFT、向量乘和向量加等密集型计算模型的优势,设计相关器的并行模型,使其性能较前期在CPU并行的MPI集群实现的相关器有了大幅提升。同时,本文选择广泛应用于大数据处理平台的Hadoop软件架构,利用Hadoop Streaming工具实现非Java编写的程序在分布式系统中并行执行,非常便捷地获得了集群系统的线性加速比。Hadoop HDFS并行文件系统管理结果数据和过程日志更加灵活可靠,为后续的大数据分析提供了支撑环境。  相似文献   

6.
张丹丹  徐莹  徐磊 《计算机科学》2012,39(4):296-298,303
对CPU+GPU异构平台下的多种并行编程模式进行了研究,并针对格子Boltzmann方法实现了CUDA,MPI+CUDA,MPI+OpenMP+CUDA多级并行算法。结果表明,算法具有较好的加速性能;提出的根据计算量比例参数调节CPU和GPU之间负载均衡的方法,对于在异构平台上实现多级并行处理及资源的有效利用具有一定的参考和应用价值。  相似文献   

7.
为了研究GPU的通用计算能力和适合SMP集群的编程模型,首次提出MPI+CUDA多粒度混合并行编程的新方法,节点间采用MPI实现粗粒度并行,节点内采用CUDA实现细粒度并行的混合编程方式.利用此方法在搭建的3节点SMP集群环境中,测试了大规模矩阵乘问题的并行计算能力.实验结果表明,该方法能够显著提升并行效率,同时证明MPI+CUDA混合编程模型能够充分发挥SMP集群节点间分布式存储和节点内共享内存的优势,为装有CUDA-enabled GPU的SMP集群提供了一种有效的并行策略.  相似文献   

8.
MPI (Message Passing Interface)专为节点密集型大规模计算集群设计,然而,随着MPI+CUDA (Compute Unified Device Architecture)应用程序以及计算节点拥有GPU的计算机集群的出现,类似于MPI的传统通信库已无法满足.而在机器学习领域,也面临着同样的挑战,如Caff以及CNTK (Microsoft CognitiveToolkit)的深度学习框架,由于训练过程中, GPU会缓存庞大的数据量,而大部分机器学习训练的优化算法具有迭代性特点,导致GPU间的通信数据量大,通信频率高,这些已成为限制深度学习训练性能提升的主要因素之一,虽然推出了像NCCL(Nvidia Collective multi-GPU Communication Library)这种解决深度学习通信问题的集合通信库,但也存在不兼容MPI等问题.因此,设计一种更加高效、符合当前新趋势的通信加速机制便显得尤为重要,为解决上述新形势下的挑战,本文提出了两种新型通信广播机制:(1)一种基于MPI_Bcast的管道链PC (Pipelined Chain)通信机制:为GPU缓存提供高效的节点内外通信.(2)一种适用于多GPU集群系统的基于拓扑感知的管道链TA-PC (TopologyAware Pipelined Chain)通信机制:充分利用多GPU节点间的可用PCIe链路.为了验证提出的新型广播设计,分别在三种配置多样化的GPU集群上进行了实验:GPU密集型集群RX1、节点密集型集群RX2、均衡型集群RX3.实验中,将新的设计与MPI+NCCL1 MPI_Bcast进行对比实验,对于节点内通信和节点间的通信,分别取得了14倍和16.6倍左右的性能提升;与NCCL2的对比试验中,小中型消息取得10倍左右的性能提升,大型消息取得与其相当的性能水平,同时TA-PC设计相比于PC设计,在64GPU集群上实现50%左右的性能提升.实验结果充分说明,提出的解决方案在可移植性以及性能方面有较大的优势.  相似文献   

9.
为提高大规模并行计算的并行效率,充分发挥CPU与GPU的功能特点,特别是体现GPU强大的运算能力,提出了用消息传递接口(MPI)将一组GPU连接起来。使GPU通用计算与计算流体力学中的LBM(latticeBoltzmannmethod)算法相结合。根据GPU通用计算与LBM算法的原理,使MPI作为计算分配的机制,CUDA(compute unified device architecture)作为主要的计算执行引擎,建立支持CUDA的GPU集群,在集群上对LBM算法中的D2Q9模型进行二维方腔流数值模拟。实验结果表明,利用GPU组模拟与CPU模拟结果一致,更充分发挥了GPU的计算能力,提高了并行效率。  相似文献   

10.
本文介绍了GPU并行计算的优越性,并对基于GPU平台的开发框架和编程环境CUDA给予概述;在CUDA环境中开发DCT算法代码,实现了DCT算法代码从CPU平台向GPU平台的移植;并通过对比两个计算平台上DCT算法的计算耗时,分析了GPU计算平台的优越性。  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号