首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 164 毫秒
1.
高效能计算机系统虚拟化技术研究   总被引:1,自引:0,他引:1       下载免费PDF全文
高效能计算机对系统的性能、安全性、可靠性和易用性等方面提出了更高的要求。虚拟机技术由于具有安全性好、配置和管理灵活等特点,已广泛应用于服务整合和安全管理等领域。但是,由于虚拟机技术在性能、管理和体系结构适用性等方面原因,在高性能并行计算机系统上尚未真正实用。我们提出了一种面向高性能并行计算机的虚拟化技术:高性能虚拟计算域(HPVZ)。HPVZ技术在保证系统性能的前提下,提供了具有用户独立可定制运行环境、服务质量管理、安全隔离和动态迁移的虚拟化高性能计算环境。测试表明,HPVZ在保证用户可以获得高性能计算机的原始计算能力的基础上,方便了用户使用,并改变了高性能计算机的传统使用模式。  相似文献   

2.
邹秀件 《网友世界》2014,(18):10-10
介绍了虚拟化技术在整合异构资源、资源管理、系统容错、虚拟化系统环境、并行编程环境等系统的应用。指出应用中存在的不足,并提出相应的完善对策。希望能够为虚拟化技术在高性能计算机系统中得到更好应用提供指导与借鉴。  相似文献   

3.
功耗问题是未来高性能计算机系统性能提高面临的最突出问题之一,本文调查典型的低功耗技术动态电压调节应用于高性能计算机系统的有效性。建立了动态电压调节技术在高性能计算领域的能耗模型,提出了程序运行时钟能耗和真实能耗的概念。在三种典型的计算机系统上,使用智能功率仪表测试使用动态电压调节技术后的系统能耗,说明了动态电压调节技术在高性能计算领域节能降耗的有效性。  相似文献   

4.
周铁成 《福建电脑》2009,25(5):111-112
集群是目前高性能计算机系统主要的解决方案。随着集群规模的扩大。也出现了不易安装与管理、故障率高、缺乏方便的并行程序开发调试环境等问题。本文在集群安装软件包Rocks的基础上。结合Xen虚拟化技术构建了一个高性能虚拟集群,从而简化了集群的组建与管理,并提高了系统可靠性与容错性能及并行程序开发效率。  相似文献   

5.
高性能计算是气象业务及科研应用的重要的基础平台,中国气象局(CMA)近年来相继引进多套高性能计算机系统用以提高气象服务和研发能力。随着用户和应用的增加,如何有效管理高性能系统资源成为一个重要课题。本文详解介绍CMA高性能计算机系统统一资源管理平台的设计和实现,该系统可以对多套异构的高性能计算机系统进行统一的资源精细化统计分析和计费管理,通过该平台,系统管理员动态掌握系统的运行和资源使用情况,并据此调整系统资源分配调度策略,从而更合理高效的利用系统资源,有效提高系统运行效益。  相似文献   

6.
高性能计算机在气象部门得到了广泛应用,发挥了重要作用,对高性能计算集群的科学高效的运维管理是确保高性能计算机系统正常运行的首要任务。本文结合武汉军运会气象高性能计算机的实际情况,对高性能计算机在业务应用、运行监控与维护管理等方面做了介绍,对业务科研人员和运维管理人员具有一定参考借鉴意义。  相似文献   

7.
大规模并行处理技术是构建高性能计算的重要途径。该文论述了大规模并行处理技术的技术要点,并讨论了该技术在高性能计算机、高性能处理器及高性能系统芯片设计中的应用。  相似文献   

8.
高性能计算机系统的持续性能是反映实际领域应用中高性能计算机系统性能强弱的重要度量标准。简单介绍了高性能计算机系统常用的性能评价方法,结合应用基准程序集,提出了相对持续性能的度量模型。实验基于高性能计算气象应用评测,结果表明应用相对持续性能度量模型可区分不同厂商的高性能计算机系统的性能强弱,为高性能计算机系统的选择提供参考,并在一定程度上反映了气象应用本身的可扩展性。  相似文献   

9.
互连系统是构成高性能计算机系统和决定系统通信性能的关键部分 ,其主要功能是实现系统中大量结点机间的消息传送。因而其通信带宽和延迟将直接影响高性能计算机系统计算能力和效率的发挥。本文重点研究高性能计算机“高带宽、低延迟”互连系统技术 ,以支持高性能计算机系统计算能力和效率的更好发挥。本文研究了高性能计算机系统的性能度量和提升途径 ,分析并找出了影响系统加速比的关键因素。分析了互连系统中的拓扑结构、切换技术、流控策略和路由算法等方面采用的技术和研究现状 ,总结了提高互连通信系统性能的技术途径。本文对高性能计…  相似文献   

10.
基于校园网的集群服务系统   总被引:2,自引:0,他引:2  
随着计算机技术和网络技术的迅猛发展,高性能计算机集群系统的应用也变得越来越普及。该文首先讨论了校园网中基于Linux环境的集群服务系统,随后以曙光TC4000集群系统为实例重点说明校园网中集群系统的管理、分布以及在校园网中结点的高性能应用。  相似文献   

11.
Yongpeng Liu  Hong Zhu 《Software》2010,40(11):943-964
This paper surveys the research on power management techniques for high‐performance systems. These include both commercial high‐performance clusters and scientific high‐performance computing (HPC) systems. Power consumption has rapidly risen to an intolerable scale. This results in both high operating costs and high failure rates so it is now a major cause for concern. It has imposed new challenges to the development of high‐performance systems. In this paper, we first review the basic mechanisms that underlie power management techniques. Then we survey two fundamental techniques for power management: metrics and profiling. After that, we review the research for the two major types of high‐performance systems: commercial clusters and supercomputers. Based on this, we discuss the new opportunities and problems presented by the recent adoption of virtualization techniques, and again we present the most recent research on this. Finally, we summarize and discuss the future research directions. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.  相似文献   

13.
Prospects for applying virtualization technology in high-performance computations on the x64 systems are studied. Principal reasons for performance degradation when parallel programs are running in virtual environments are considered. The KVM/QEMU and Palacios virtualization systems are considered in detail, with the HPC Challenge and NAS Parallel Benchmarks used as benchmarks. A modern computing cluster built on the Infiniband high-speed interconnect is used in testing. The results of the study show that, in general, virtualization is reasonable for a wide class of high-performance applications. Fine tuning of the virtualization systems involved made it possible to reduce overheads from 10–60% to 1–5% on the majority of tests from the HPC Challenge and NAS Parallel Benchmarks suites. The main bottlenecks of virtualization systems are reduced performance of the memory system (which is critical only for a narrow class of problems), costs associated with hardware virtualization, and the increased noise caused by the host operating system and hypervisor. Noise can have a negative effect on performance and scalability of fine-grained applications (applications with frequent small-scale communications). The influence of noise significantly increases as the number of nodes in the system grows.  相似文献   

14.
High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems.  相似文献   

15.
The large scale emergence in the last decade of various cloud solutions, ranging from software-as-a-service (SaaS) based solutions for business process management and implementation to very sophisticated private cloud solutions capable of high performance computing (HPC) and efficient virtualization, constitute the building blocks for engineering the next generation of flexible enterprise systems that can respond with great agility to changes in their environment. These new technologies are adopted at a certain level by manufacturing enterprises in order to advance in a new era of mass customization where flexibility, scalability and agility are the differentiating factors. In this context, this paper introduces the virtualized manufacturing execution system (vMES), an intermediate layer in the manufacturing stack, and discusses the advantages and limitations offered by this approach for manufacturing enterprises. A classification of MES workloads based on the ISA-95 function model is presented, focusing on the virtualization techniques suitable for each workload, considering the algorithms and technologies used and the virtualization overhead. A pilot vMES implementation using a parallel process for smart resource provisioning and automatic scaling is also presented. The pilot implementation using six Adept robots and one IBM CloudBurst 2.1 private cloud and an ISA-95 based MES is described; the virtualization sequence is analyzed in several scenarios of resource workload collocation on physical cloud blades with and without perturbations.  相似文献   

16.
李春艳  张学杰 《计算机应用》2013,33(12):3580-3585
云计算是一种提供各种IT服务的互联网资源利用的新模式,已经广泛地应用在包括高性能计算的各种领域。然而,虚拟化带来了一些性能开销;同时,不同的云平台实施虚拟化技术的不同,使得在这些云平台上应用高性能计算服务的性能也千差万别。通过HPC Challenge (HPCC) Benchmark和NAS Parallel Benchmark(NPB)分别对CPU、内存、网络、扩展性和高性能计算真实负载进行评估,比较并分析了诸如Nimbus、OpenNebula和OpenStack实施高性能计算的性能,实验显示OpenStack对计算密集型的高性能应用负载表现出较好的性能,因此,OpenStack是实施高性能计算的开源云平台的一个好的选择。  相似文献   

17.
Parallel and distributed simulation is a powerful tool for developing complex agent-based simulation. Complex simulations require parallel and distributed high performance computing solutions. It is necessary because their sequential solutions are not able to give answers in a feasible total execution time. Therefore, for the advance of computing science, it is important that High Performance Computing (HPC) techniques and solutions be proposed and studied. In literature, we can find some agent-based modeling and simulation tools that use HPC. However, none of these tools are designed to enable the HPC expert to be able to propose new techniques and solutions without great effort. In this paper, we introduce Care High Performance Simulation (HPS), which is a scientific instrument that enables researchers to: (1) develop techniques and solutions of high performance distributed simulations for agent-based models; and, (2) study, design and implement complex agent-based models that require HPC solutions. Care HPS was designed to easily and quickly develop new agent-based models. It was also designed to extend and implement new solutions for the main issues of parallel and distributed solutions such as: synchronization, communication, load and computing balancing, and partitioning algorithms. We conducted some experiments with the aim of showing the completeness and functionality of Care HPS. As a result, we show that Care HPS can be used as a scientific instrument for the advance of the agent-based parallel and distributed simulations field.  相似文献   

18.
Multicore systems are widely deployed in both the embedded and the high end computing infrastructures. However, traditional virtualization systems can not effectively isolate shared micro architectural resources among virtual machines (VMs) running on multicore systems. CPU and memory intensive VMs contending for these resources will lead to serious performance interference, which makes virtualization systems less efficient and VM performance less stable. In this paper, we propose a contention-aware performance prediction model on the virtualized multicore systems to quantify the performance degradation of VMs. First, we identify the performance interference factors and design synthetic micro-benchmarks to obtain VM’s contention sensitivity and intensity features that are correlated with VM performance degradation. Second, based on the contention features, we build VM performance prediction model using machine learning techniques to quantify the precise levels of performance degradation. The proposed model can be used to optimize VM performance on multicore systems. Our experimental results show that the performance prediction model achieves high accuracy and the mean absolute error is 2.83%.  相似文献   

19.
In recent years, High Performance Computing (HPC) systems have been shifting from expensive massively parallel architectures to clusters of commodity PCs to take advantage of cost and performance benefits. Fault tolerance in such systems is a growing concern for long-running applications. In this paper, we briefly review the failure rates of HPC systems and also survey the fault tolerance approaches for HPC systems and issues with these approaches. Rollback-recovery techniques which are most often used for long-running applications on HPC clusters are discussed because they are widely used for long-running applications on HPC systems. Specifically, the feature requirements of rollback-recovery are discussed and a taxonomy is developed for over twenty popular checkpoint/restart solutions. The intent of this paper is to aid researchers in the domain as well as to facilitate development of new checkpointing solutions.  相似文献   

20.
GPU虚拟化相关技术研究综述   总被引:1,自引:0,他引:1  
因为计算密集型应用的增多,亚马逊和阿里巴巴等公司的云平台开始引入GPU(Graphic processing unit)加速计算. 云平台支持多用户共享GPU的使用,可以提升GPU的利用效率,降低成本;也有利于GPU的有效管理. 通过虚拟机监视器以及各种软硬件的帮助,GPU虚拟化技术为云平台共享GPU提供了一种可行方案. 本文综合分析了GPU虚拟化技术的最近进展,先根据技术框架的共同点进行分类;然后从拓展性、共享性、使用透明性、性能、扩展性等方面对比分析,最后总结了GPU虚拟化的问题和发展方向.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号