首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
一种新的网格环境模型——TGrid Model   总被引:1,自引:0,他引:1  
在分析了现有网格环境不足的基础上,提出一种新的网格环境模型——基于树形结构的网格体系与环境TGrid,支持高性能计算,面向主题的资源共享和新一代的需求建模。它以树结构来组织网格节点和集成各种资源,实现了自底向上、多级、面向需求的资源抽象和多种资源融合。而且树型结构符合自然层次组织关系,容易实现网格系统的层次化管理,有利于减轻中心节点的负载和实现大规模应用的负载平衡,提高资源查找效率。同时,TGrid以虚拟资源的形式实现网格资源的共享,利用分布式JVM(TJVM)虚拟网格节点上CPU和主存资源,利用多数据库中间件(TDOD)实现数据库级资源集成和共享,利用Globus网格服务(GService)实现其他软件和数据资源共享。该树型网格为日益增长的网格应用的需求提供了新的解决方案。  相似文献   

2.
The increasing demand for performance has stimulated the wide adoption of many-core accelerators like Intel® Xeon PhiTM Coprocessor, which is based on Intel’s Many Integrated Core architecture. While many HPC applications running in native mode have been tuned to run efficiently on Xeon Phi, it is still unclear how a managed runtime like JVM performs on such an architecture. In this paper, we present the first measurement study of a set of Java HPC applications on Xeon Phi under JVM. One key obstacle to the study is that there is currently little support of Java for Xeon Phi. This paper presents the result based on the first porting of OpenJDK platform to Xeon Phi, in which the HotSpot virtual machine acts as the kernel execution engine. The main difficulty includes the incompatibility between Xeon Phi ISA and the assembly library of Hotspot VM. By evaluating the multithreaded Java Grande benchmark suite and our ported Java Phoenix benchmarks, we quantitatively study the performance and scalability issues of JVM on Xeon Phi and draw several conclusions from the study. To fully utilize the vector computing capability and hide the significant memory access latency on the coprocessor, we present a semi-automatic vectorization scheme and software prefetching model in HotSpot. Together with 60 physical cores and tuning, our optimized JVM achieves averagely 2.7x and 3.5x speedup compared to Xeon CPU processor by using vectorization and prefetching accordingly. Our study also indicates that it is viable and potentially performance-beneficial to run applications written for such a managed runtime like JVM on Xeon Phi.  相似文献   

3.
Hybrid CPU/GPU cluster recently has drawn lots of attention from high performance computing because of excellent execution performance and energy efficiency. Many supercomputing sites in the newest TOP 500 and Green 500 are built by hybrid CPU/GPU clusters instead of CPU clusters. However, the programming complexity of hybrid CPU/GPU clusters is so high such that most of users usually hesitate to move toward to this new cluster computing platform. To resolve this problem, we propose a distributed PTX virtual machine called BigGPU on heterogeneous clusters in this paper. As named, this virtual machine physically is a distributed system which is aimed at parallel re-compiling and executing the PTX codes by aggregating CPUs and GPUs available in a computational cluster. With the support of this virtual machine, users can regard a hybrid CPU/GPU as a single large-scale GPU. Consequently, they can develop applications by using only CUDA without combining MPI and multithreading APIs while can simultaneously use distributed CPUs and GPUs for resolving the same problem. Moreover, they need not handle the problem of load balance among heterogeneous processors and the constraints of device memory and thread configuration existing in physical GPUs because BigGPU supports large-scale virtual device memory space and thread configuration. On the other hand, we have evaluated the execution performance of BigGPU in this paper. Our experimental results have shown that BigGPU indeed can effectively exploit the computational power of CPUs and GPUs for enhancing the execution performance of user's CUDA programs.  相似文献   

4.
Ghahramani  B. Pauley  M.A. 《Computer》2003,36(9):109-111
Java programs are executed by a Java virtual machine (JVM), which interprets intermediate compiled bytecode that is nominally platform independent. Although early versions of Java interpreted unoptimized bytecode in a relatively unsophisticated manner, recent developments including static analysis, just-in-time compilation, JVM optimization, and instruction-level optimizations have improved execution efficiency. Consequently, Java is now competitive with C and C++ for some applications and on some platforms. Despite Java's increasing popularity, there is a lingering perception that deficiencies in the language make it unsuitable for high-performance computing. In this paper we address some of those deficiencies and discuss the suitability of using Java in a distributed environment.  相似文献   

5.
供水管网仿真广泛应用于城市供水输配调度,是城市供水管网监测与维护的重要技术手段。由于在面向城市级的大规模管网中产生了海量的计算数据,因此在一般计算平台上无法满足管网仿真计算的算力需求。为提升城市级供水管网仿真的计算效率,提出一种有效的并行化方案。基于“嵩山”超级计算机系统采用中央处理器+数据缓存单元(CPU+DCU)架构,利用其在密集数据计算方面的优势,对“嵩山”超级计算机进行供水管网仿真。参照可移植性异构计算接口(HIP)异构编程模型,在“嵩山”超级计算机上实现供水管网仿真的异构计算,并结合管道数据分割方案,使用消息传递接口开启多进程以实现DCU加速数据通信传递。通过重定义数据类型解决计算过程中结构体传输问题,实现单节点内多DCU的大规模密集计算。在不同计算平台和多种计算策略仿真上的对比结果表明,与传统x86平台相比,该优化方案在小规模数据与大规模数据上的加速比分别达到5.269、10.760,与采用计算统一设备架构异构编程模型的传统GPU异构平台相比,计算性能有明显提高。  相似文献   

6.
边缘智能计算对硬件资源的需求复杂多元,传统计算平台难以为继,异构并行计算平台成为边缘智能算法落地的关键途径之一。以深度学习算法和边缘计算为牵引,对异构并行计算平台展开研究。一方面,阐述了传统计算平台适配实现边缘智能计算的优缺点,指出边缘端应用场景中传统计算平台算力与功耗矛盾突出等局限性,并以指令模型、通讯机制和存储体系三个关键技术为线索梳理技术发展脉络。另一方面,从运算速度、功耗等角度重点对比分析了近年来典型异构平台较新的代表性产品,然后针对不同应用场景和约束条件给出了异构平台的选择建议:优先选择CPU+X组合的异构平台。功耗要求严格约束下的应用建议优先选择CPU+FPGA组合;功能迭代更新快的场景建议优先选择CPU+GPU组合;算法成熟且对实时性和功耗均具有高要求的应用优先选择ASIC计算平台。提出了异构并行计算平台在指令模型统一、通讯机制轻量化、存储体系灵活性以及开发生态完备化四个方面的问题与挑战,期望能为该领域研究人员带来一定的启发。  相似文献   

7.
边缘计算有高实时性和大数据交互处理的需求,边缘异构节点间的调度时耗长、通信时延高以及负载不均衡是影响边缘计算性能的核心问题,传统的云计算平台难以满足新的要求。文中研究了在边缘计算环境下Storm边缘节点的调度优化方法,建立了面向边缘计算的Storm任务卸载调度模型。针对拓扑任务在边缘异构节点间的实时动态分配问题,提出了一种启发式动态规划算法(Inspire Dynamic Programming,IDP),通过改变Storm的Task实例的排序分配方式以及Task实例和Slot任务槽的映射关系实现全局的优化调度;同时,针对拓扑任务的并发度受限于JVM栈深度的缺陷,提出了一种基于蝙蝠算法的调度策略。实验结果表明,与Storm调度算法相比,所提算法在边缘节点CPU利用率指标上平均提升了约60%,在集群的吞吐量指标上平均提升了约8.2%,因此能够满足边缘节点之间的高实时性处理要求。  相似文献   

8.
The increased popularity of Grid systems and cycle sharing across organizations requires scalable systems that provide facilities to locate resources, to be fair in the use of those resources, to allow resource providers to host untrusted applications safely, and to allow resource consumers to monitor the progress and correctness of jobs executing on remote machines. This paper presents such a framework that locates computational resources with a peer-to-peer network, assures fair resource usage with a distributed credit accounting system, provides resource contributors a safe environment, for example Java Virtual Machine (JVM), to host untrusted applications, and provides the resource consumers a monitoring system, GridCop, to track the progress and correctness of remotely executing jobs. We present the details of the credit accounting subsystem and the GridCop remote job monitoring subsystem. GridCop and the distributed credit accounting system together enable incremental payments so that the risk for both resource providers and resource consumers is bounded.*This work was supported by NSF CAREER award grant ACI-0238379 and NSF grants CCR-0313026 and CCR-0313033.  相似文献   

9.
传统的Java程序利用软件Java虚拟机(Java Virtual Machine,JVM)对Java字节码文件进行解释或二次编译后交由本地CPU执行,其运行速度大大受限,而硬件JVM处理器可直接执行Java字节码,因而大幅提高了Java程序的运行速度,所以硬件JVM处理器是突破Java程序性能瓶颈的最有效方法.本文以Jop Java及picoJava为例,根据Java虚拟机的规范分析了硬件JVM处理器中最重要的流水线结构、堆栈结构及操作的实现方式、指令折叠技术和字节码与微码的映射技术,并提出了改进措施.  相似文献   

10.
赵姗  郝春亮  翟健  李明树 《软件学报》2020,31(9):2965-2979
近年来,在移动计算环境中,异构多核处理器已经逐渐成为主流.与传统同构的处理器设计相比,此类异构多核处理器以更低的功耗成本满足设备的计算需求.但是异构环境下CPU核之间的微架构差异,也为操作系统中的一些基本方法提出了新的挑战.面向性能非对称异构多核环境下调度的负载均衡问题,从系统层面提出了一种负载均衡机制S-Bridge,可以减少处理器微架构差异以及任务执行需求差异对传统负载均衡带来的影响.S-Bridge的主要贡献是从系统层提供了通用的、适配异构性的负载均衡相关接口,使任意调度器都能方便地与异构多核处理器系统进行适配.基于CFS和HMP调度器在ARM平台上进行实验,同时在X86平台上进行S-Bridge通用性的验证,结果表明:S-Bridge可以支持不同真实平台和内核版本的快速实现,平均性能提升超过15%,部分情况下可达65%.  相似文献   

11.
《Parallel Computing》2014,40(8):425-447
EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components.Using the hybrid OpenMP–OpenCL model of parallel programming opens the way to harness the power of CPU–GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU–GPU platforms. The main contributions of the paper are:
  • •method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations;
  • •method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques;
  • •method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources;
  • •approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs.
Hybrid platforms tested in this study contain different numbers of CPUs and GPUs – from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems – both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively.  相似文献   

12.
异构计算中的负载共享   总被引:18,自引:0,他引:18  
曾国荪  陆鑫达 《软件学报》2000,11(4):551-556
在基于消息传递的异构并行计算系统中 ,各处理器或计算机具有自制和独立地调度、执行作业的能力 .当一个可划分的作业初始位于一个处理器上时 ,为了提高计算性能 ,该处理器可以请求其他异构处理器负载共享 ,参与协同计算 ,减少作业的完成时间 .该文提出了异构计算负载共享的一种方案 .首先 ,调用负载共享协议 ,收集当前各处理器参与负载共享的许可数据 ,包括共享时间段、计算能力等 .然后 ,构造一个作业量与作业完成时间之间的关系函数 .该函数是选择一组合适的处理器群、优化作业划分、作业完成时间最小的理论基础 .最  相似文献   

13.
网格技术及其应用   总被引:6,自引:0,他引:6  
网格实现了将计算机网络作为统一的计算资源的可能性。这是一种动态的、多机构虚拟组织的资源协调共享和问题解决的崭新技术,与传统分布式计算的区别,主要在于其重点是大规模的资源共享、创新的应用以及高性能的目标。文章阐述了网格的基本概念,并介绍了网格的应用与发展。  相似文献   

14.
一种面向异构计算的结构化并行编程框架   总被引:1,自引:0,他引:1  
随着人工智能时代的到来,异构计算在深度学习、科学计算等领域发挥着越来越重要的作用。目前异构计算系统在应用上的瓶颈之一在于缺少高效的软件开发框架,已有的OpenCL、CUDA等支持GPU、DSP及FPGA的编程框架基于C/C++语言和传统的并行编程方法,导致软件开发效率较低,软件推理和调试困难,难以灵活处理计算设备之间的协作和调度。提出一种面向异构计算平台的基于脚本语言的结构化并行编程框架,提供结构化的并行编程接口,支持计算任务到异构计算设备的映射,便于并行程序的推理和验证。设计并实现了基于遗传算法的结构化调度算法,充分利用异构计算系统的计算能力,提高了异构计算系统的软件开发效率。实验结果表明,提出的编程框架在CPU+GPU平台上实现了相对于单处理器1.5到2.5倍的加速比。  相似文献   

15.
安鑫  康安  夏近伟  李建华  陈田  任福继 《计算机应用》2020,40(10):3081-3087
异构多核处理器已成为现代嵌入式系统的主流解决方案,而好的在线映射或调度方法对其充分发挥高性能和低功耗的优势起着至关重要的作用。针对异构多核处理系统上的应用程序动态映射和调度问题,提出一种基于机器学习、能快速准确评估程序性能和程序行为阶段变化的检测技术来有效确定重映射时机从而最大化系统性能的映射和调度解决方案。该方案一方面通过合理选择处理核和程序运行时的静态和动态特征来有效感知异构处理所带来的计算能力和工作负载运行行为的差异,从而能够构建更加准确的预测模型;另一方面通过引入阶段检测来尽可能减少在线映射计算的次数,从而能够提供更加高效的调度方案。最后,在SPLASH-2数据集上验证了所提出调度方案的有效性。实验结果表明,与Linux默认的完全公平调度(CFS)方法相比,所提出的方法在系统计算性能方面提高了52%,在CPU资源利用率上提高了9.4%。这表明所提方法在系统计算性能和CPU资源利用率方面具备优良的性能,可以有效提升异构多核系统的应用动态映射和调度效果。  相似文献   

16.
网格计算主要关注大规模的资源共享,且这种共享是高度可控的。为解决网格环境下文件资源共享与管理的问题,提出了一个网格文件资源共享模型FsvGrid。该模型引入注册通知机制,并采用确定性算法与非确定性算法相结合的消息传递机制,使得网格中的各个节点之间能够高效协作;采用分层结构,屏蔽了文件资源的多样性;增加了共享的安全性,可以对共享进行控制;提出了一种依靠虚拟组织来对文件资源进行管理的方式,解决分布式资源难以管理的问题。  相似文献   

17.
安鑫  康安  夏近伟  李建华  陈田  任福继 《计算机应用》2005,40(10):3081-3087
异构多核处理器已成为现代嵌入式系统的主流解决方案,而好的在线映射或调度方法对其充分发挥高性能和低功耗的优势起着至关重要的作用。针对异构多核处理系统上的应用程序动态映射和调度问题,提出一种基于机器学习、能快速准确评估程序性能和程序行为阶段变化的检测技术来有效确定重映射时机从而最大化系统性能的映射和调度解决方案。该方案一方面通过合理选择处理核和程序运行时的静态和动态特征来有效感知异构处理所带来的计算能力和工作负载运行行为的差异,从而能够构建更加准确的预测模型;另一方面通过引入阶段检测来尽可能减少在线映射计算的次数,从而能够提供更加高效的调度方案。最后,在SPLASH-2数据集上验证了所提出调度方案的有效性。实验结果表明,与Linux默认的完全公平调度(CFS)方法相比,所提出的方法在系统计算性能方面提高了52%,在CPU资源利用率上提高了9.4%。这表明所提方法在系统计算性能和CPU资源利用率方面具备优良的性能,可以有效提升异构多核系统的应用动态映射和调度效果。  相似文献   

18.
Ensuring adequate use of the computing resources for highly fluctuating availability in multi-user computational environments requires effective prediction models, which play a key role in achieving application performance for large-scale distributed applications. Predicting the processor availability for scheduling a new process or task in a distributed environment is a basic problem that arises in many important contexts. The present paper aims at developing a model for single-step-ahead CPU load prediction that can be used to predict the future CPU load in a dynamic environment. Our prediction model is based on the control of multiple Local Adaptive Network-based Fuzzy Inference Systems Predictors (LAPs) via the Naïve Bayesian Network inference between clusters states of CPU load time points obtained by the C-means clustering process. Experimental results show that our model performs better and has less overhead than other approaches reported in the literature.  相似文献   

19.
Middleware solutions for Heterogeneous Distributes System aim to respond to high requirements of large scale distributed applications related to performance, flexibility, extensibility, portability, availability, reliability, safety, security, trust, and scalability, in the context of high number of users, and large geographic distribution of heterogeneous hardware and software resources. The solutions used in the design, implementation, and deployment of systems with such capabilities are based on monitoring, scheduling, optimization, sharing, balancing, discovery, and synchronization methods and techniques that are continuously improved.This special issue presents advances in virtual machine management solutions in Clouds, object storage platforms, HPC heterogeneous platforms, middleware for Android systems and reliability and performances in large scale distributed applications.  相似文献   

20.
Over recent years, peer-to-peer (P2P) systems have become an important part of Internet. Millions of users have been attracted to their structures and services. P2P computing is a distributed computing paradigm that uses Internet to connect thousands, or even millions, of users into a single large virtual computer based on the sharing of computational resources. One of the most critical aspects to the design of P2P computing systems is the development of scheduling techniques to manage the computational resources efficiently and in a scalable way. This paper proposes a cooperative scheduling mechanism with a two-level topology designed to work on large-scale distributed computing P2P systems. Our main contribution is proposing three criteria that only use local information to schedule tasks thus providing scalability to the overall scheduling system. By setting up these three criteria, the system can be easily adapted to work efficiently with very different kinds of distributed applications. The extensive experimentation carried out justifies the importance of good scheduling in such heterogeneous systems, but also emphasizes the importance of having a scheduling algorithm capable of being adapted to the requirements of different kinds of application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号