期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud

Ifeanyi P. Egwutuoha Shiping Chen David Levy Bran Selic Rafael Calvo 《International Journal of Parallel, Emergent and Distributed Systems》2014,29(4):363-378

Cloud computing offers new computing paradigms, capacity and flexible solutions to high performance computing (HPC) applications. For example, Hardware as a Service (HaaS) allows users to provide a large number of virtual machines (VMs) for computation-intensive applications using the HaaS model. Due to the large number of VMs and electronic components in HPC system in the cloud, any fault during the execution would result in re-running the applications, which will cost time, money and energy. In this paper we presented a proactive fault tolerance (FT) approach to HPC systems in the cloud to reduce the wall-clock execution time and dollar cost in the presence of faults. We also developed a generic FT algorithm for HPC systems in the cloud. Our algorithm does not rely on a spare node prior to prediction of a failure. We also developed a cost model for executing computation-intensive applications on HPC systems in the cloud. We analysed the dollar cost of provisioning spare nodes and checkpointing FT to assess the value of our approach. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in cloud can be reduced by as much as 30%. The frequency of checkpointing of computation-intensive applications can be reduced up to 50% with our FT approach for HPC in the cloud compared with current FT approaches. 相似文献

2.

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Daniel de Oliveira Kary A. C. S. Oca?a Fernanda Bai?o Marta Mattoso 《Journal of Grid Computing》2012,10(3):521-552

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution. 相似文献

3.

Energy-credit scheduler: An energy-aware virtual machine scheduler for cloud systems

《Future Generation Computer Systems》2014

Virtualization facilitates the provision of flexible resources and improves energy efficiency through the consolidation of virtualized servers into a smaller number of physical servers. As an increasingly essential component of the emerging cloud computing model, virtualized environments bill their users based on processor time or the number of virtual machine instances. However, accounting based only on the depreciation of server hardware is not sufficient because the cooling and energy costs for data centers will exceed the purchase costs for hardware. This paper suggests a model for estimating the energy consumption of each virtual machine without dedicated measurement hardware. Our model estimates the energy consumption of a virtual machine based on in-processor events generated by the virtual machine. Based on this estimation model, we also propose a virtual machine scheduling algorithm that can provide computing resources according to the energy budget of each virtual machine. The suggested schemes are implemented in the Xen virtualization system, and an evaluation shows that the suggested schemes estimate and provide energy consumption with errors of less than 5% of the total energy consumption. 相似文献

4.

LATOC: an enhanced load balancing algorithm based on hybrid AHP-TOPSIS and OPSO algorithms in cloud computing

Moori Ayeh Barekatain Behrang Akbari Mehdi 《The Journal of supercomputing》2022,78(4):4882-4910

Providing required level of service quality in cloud computing is one of the most significant cloud computing challenges because of software and hardware complexities, different features of tasks and computing resources and also, lack of appropriate distribution of tasks in cloud computing environments. The recent research in this field show that lack of smart prioritization and ordering of tasks in scheduling (as an NP-hard problem) has been very effective and resulted in lack of load balancing, response time increase, total execution time increase and also, average resource use decrease. In line with this, the proposed method of this research called LATOC considered first the key criteria of an input task like required processing unit, data length of task and execution time. Then, it addressed task prioritization in separate queues using the technique for order preference by similarity to ideal solution (TOPSIS) and analytic hierarchy process (AHP) in figure of a hybrid intelligent algorithm (AHP-TOPSIS). Each ordered task in separate priority queues was placed based on its priority level, and then, to assign each task from each priority queue to virtual machines, optimized particle swarm optimization was used. Many simulations based on various scenarios in Cloudsim simulator show that smart assignment of prioritized tasks by LATOC resulted in improvement of important cloud computing parameters such as total execution time and average resource use comparing similar methods.

相似文献

5.

Security-Preserving Live Migration of Virtual Machines in the Cloud

Fengzhe Zhang Haibo Chen 《Journal of Network and Systems Management》2013,21(4):562-587

Hypervisor-based process protection is a novel approach that provides isolated execution environments for applications running on untrusted commodity operating systems. It is based on off-the-shelf hardware and trusted hypervisors while it meets the requirement of security and trust for many cloud computing models, especially third-party data centers and a multi-tenant public cloud, in which sensitive data are out of the control of the users. However, as the hypervisor extends semantic protection to the process granularity, such a mechanism also breaks the platform independency of virtual machines and thus prohibits live migration of virtual machines, which is another highly desirable feature in the cloud. In this paper, we extend hypervisor-based process protection systems with live migration capabilities by migrating the protection-related metadata maintained in the hypervisor together with virtual machines and protecting sensitive user contents using encryption and hashing. We also propose a security-preserving live migration protocol that addresses several security threats during live migration procedures including timing-related attacks, replay attacks and resumption order attacks. We implement a prototype system base on Xen and Linux. Evaluation results show that performance degradation in terms of both total migration time and downtime are reasonably low compared to the unmodified Xen live migration system. 相似文献

6.

An elasticity model for High Throughput Computing clusters

Ruben S. Montero Author VitaeRafael Moreno-VozmedianoAuthor Vitae Ignacio M. Llorente^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(6):750-757

Different methods have been proposed to dynamically provide scientific applications with execution environments that hide the complexity of distributed infrastructures. Recently virtualization has emerged as a promising technology to provide such environments. In this work we present a generic cluster architecture that extends the classical benefits of virtual machines to the cluster level, so providing cluster consolidation, cluster partitioning and support for heterogeneous environments. Additionally the capacity of the virtual clusters can be supplemented with resources from a commercial cloud provider. The performance of this architecture has been evaluated in the execution of High Throughput Computing workloads. Results show that, in spite of the overhead induced by the virtualization and cloud layers, these virtual clusters constitute a feasible and performing HTC platform. Additionally, we propose a performance model to characterize these variable capacity (elastic) cluster environments. The model can be used to dynamically dimension the cluster using cloud resources, according to a fixed budget, or to estimate the cost of completing a given workload in a target time. 相似文献

7.

面向云计算模式运行环境可信性动态验证机制 总被引：1，自引：0，他引：1

刘川意林杰唐博《软件学报》2014,25(3):662-674

如何为用户提供一个可证明、可验证的可信运行环境,是云计算模式面临的重要问题.提出一种动态的用户运行环境可信性验证机制TCEE（trusted cloud execution environment）.通过扩展现有可信链,将可信传递到用户虚拟机内部,并周期性地对用户运行环境的内存和文件系统进行完整性验证.TCEE引入可信第三方TTP（trusted third party）,针对用户虚拟机运行环境的可信性进行远程验证和审计,避免了由用户维护可信验证的相关信息和机制,同时也能够避免云平台敏感信息的泄露.实现了基于TCEE的原型系统,对TCEE的有效性和性能代价进行定量测试和评价.实验结果表明,该机制可以有效检测针对内存和文件系统的典型威胁,且对用户运行环境引入的性能代价较小. 相似文献

8.

Accelerator Virtualization Framework Based on Inter-VM Exitless Communication

下载免费PDF全文

Dingji Li Zeyu Mi Baodong Wu Xun Chen Yongwang Zhao Zuohua Ding Haibo Chen 《International Journal of Software and Informatics》2021,11(2):169-193

The increasing deployment of artificial intelligence has placed unprecedent requirements on the computing power of cloud computing. Cloud service providers have integrated accelerators with massive parallel computing units in the data center. These accelerators need to be combined with existing virtualization platforms to partition the computing resources. The current mainstream accelerator virtualization solution is through the PCI passthrough approach, which however does not support fine-grained resource provisioning. Some manufacturers also start to provide time-sliced multiplexing schemes and use drivers to cooperate with specific hardware to divide resources and time slices to different virtual machines, which unfortunately suffer from poor portability and flexibility. One alternative but promising approach is based on API forwarding, which forwards the virtual machine''s request to the back-end driver for processing through a separate driver model. Yet, the communication due to API forwarding can easily become the performance bottleneck. This paper proposes Wormhole, an accelerator virtualization framework based on the C/S architecture that supports rapid delegated execution across virtual machines. It aims to provide upper-level users with an efficient and transparent way to accelerate the virtualization of accelerators with API forwarding while ensuring strong isolation between multiple users. By leveraging hardware virtualization feature, the framework minimizes performance degradation through exitless inter-VM control flow switch. Experimental results show that Wormhole''s prototype system can achieve up to 5 times performance improvement over the traditional open-source virtualization solution such as GVirtuS in the training test of the classic model. 相似文献

9.

Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure 总被引：1，自引：0，他引：1

di Costanzo A. de Assuncao M.D. Buyya R. 《Internet Computing, IEEE》2009,13(5):24-33

The InterGrid system aims to provide an execution environment for running applications on top of interconnected infrastructures. The system uses virtual machines as building blocks to construct execution environments that span multiple computing sites. Such environments can be extended to operate on cloud infrastructures, such as Amazon EC2. This article provides an abstract view of the proposed architecture and its implementation; experiments show the scalability of an InterGrid-managed infrastructure and how the system can benefit from using the cloud. 相似文献

10.

Evaluation of messaging middleware for high-performance cloud computing

Roberto R. Expósito Guillermo L. Taboada Sabela Ramos Juan Touriño Ramón Doallo 《Personal and Ubiquitous Computing》2013,17(8):1709-1719

Cloud computing is posing several challenges, such as security, fault tolerance, access interface singularity, and network constraints, both in terms of latency and bandwidth. In this scenario, the performance of communications depends both on the network fabric and its efficient support in virtualized environments, which ultimately determines the overall system performance. To solve the current network constraints in cloud services, their providers are deploying high-speed networks, such as 10 Gigabit Ethernet. This paper presents an evaluation of high-performance computing message-passing middleware on a cloud computing infrastructure, Amazon EC2 cluster compute instances, equipped with 10 Gigabit Ethernet. The analysis of the experimental results, confronted with a similar testbed, has shown the significant impact that virtualized environments still have on communication performance, which demands more efficient communication middleware support to get over the current cloud network limitations. 相似文献

11.

Mobile agent middleware for mobile computing

Bellavista P. Corradi A. Stefanelli C. 《Computer》2001,34(3):73-81

Mobile computing requires an advanced infrastructure that integrates suitable support protocols, mechanisms, and tools. This mobility middleware should dynamically reallocate and trace mobile users and terminals and permit communication and coordination of mobile entities. In addition, open and untrusted environments must overcome system heterogeneity and grant the appropriate security level. Solutions to these issues require compliance with standards to interoperate with different systems and legacy components and a reliable security infrastructure based on standard cryptographic mechanisms and tools. Many proposals suggest using mobile agent technology middleware to address these issues. A mobile agent moves entities in execution together with code and achieved state, making it possible to upgrade distributed computing environments without suspending service. We propose three mobile computing services: user virtual environment (UVE), mobile virtual terminal (MVT), and virtual resource management (VRM). UVE provides users with a uniform view of their working environments independent of current locations and specific terminals. MVT extends traditional terminal mobility by preserving the terminal execution state for restoration at new locations, including active processes and subscribed services. VRM permits mobile users and terminals to maintain access to resources and services by automatically requalifying the bindings and moving specific resources or services to permit load balancing and replication 相似文献

12.

随机任务在云计算平台中能耗的优化管理方法 总被引：5，自引：0，他引：5

谭一鸣曾国荪王伟《软件学报》2012,23(2):266-278

针对云计算系统在运行过程中由于计算节点空闲而产生大量空闲能耗,以及由于不匹配任务调度而产生大量“奢侈”能耗的能耗浪费问题,提出一种通过任务调度方式的能耗优化管理方法.首先,用排队模型对云计算系统进行建模,分析云计算系统的平均响应时间和平均功率,建立云计算系统的能耗模型.然后提出基于大服务强度和小执行能耗的任务调度策略,分别针对空闲能耗和“奢侈”能耗进行优化控制.基于该调度策略,设计满足性能约束的最小期望执行能耗调度算法ME3PC(minimum expectation execution energy with performance constraints).实验结果表明,该算法在保证执行性能的前提下,可大幅度降低云计算系统的能耗开销. 相似文献

13.

A high performance scientific cloud computing environment for materials simulations

K. Jorissen F.D. Vila J.J. Rehr 《Computer Physics Communications》2012,183(9):1911-1919

We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications. 相似文献

14.

Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization

Hong Jun Choi Dong Oh Son Jong Myon Kim Cheol Hong Kim 《The Journal of supercomputing》2014,69(1):330-356

Hardware parallelism should be exploited to improve the performance of computing systems. Single instruction multiple data (SIMD) architecture has been widely used to maximize the throughput of computing systems by exploiting hardware parallelism. Unfortunately, branch divergence due to branch instructions causes underutilization of computational resources, resulting in performance degradation of SIMD architecture. Graphics processing unit (GPU) is a representative parallel architecture based on SIMD architecture. In recent computing systems, GPUs can process general-purpose applications as well as graphics applications with the help of convenient APIs. However, contrary to graphics applications, general-purpose applications include many branch instructions, resulting in serious performance degradation of GPU due to branch divergence. In this paper, we propose concurrent warp execution (CWE) technique to reduce the performance degradation of GPU in executing general-purpose applications by increasing resource utilization. The proposed CWE enables selecting co-warps to activate more threads in the warp, leading to concurrent execution of combined warps. According to our simulation results, the proposed architecture provides a significant performance improvement (5.85 % over PDOM, 91 % over DWF) with little hardware overhead. 相似文献

15.

Adapting grid computing environments dependable with virtual machines: design, implementation, and evaluations

Xuanhua Shi Hai Jin Song Wu Wei Zhu Li Qi 《The Journal of supercomputing》2013,66(3):1152-1166

Due to its potential, using virtual machines in grid computing is attracting increasing attention. Most of the researches focus on how to create or destroy a virtual execution environments for different kinds of applications, while the policy of managing the virtual environments is not widely discussed. This paper proposes the design, implementation, and evaluation of an adaptive and dependable virtual execution environment for grid computing, ADVE, which focuses on the policy of managing virtual machines in grid environments. To build a dependable virtual execution environments for grid applications, ADVE provides an set of adaptive policies managing virtual machine, such as when to create and destroy a new virtual execution environment, when to migrate applications from one virtual execution environment to a new virtual execution environment. We conduct experiments over a cluster to evaluate the performance of ADVE, and the experimental results show that ADVE can improve the throughput and the reliability of grid resources with the adaptive management of virtual machines. 相似文献

16.

基于动态虚拟机模型的云计算调度方法

下载免费PDF全文

左利云《计算机工程与应用》2011,47(23):71-75

将虚拟机加入云计算环境,可充分利用云计算的资源共享优势及其并行、分布计算功能;提出了一种可根据需要动态添加或删除虚拟机的模型系统,可有效节约云计算的使用费用,提高成本效率;研究了可用于本模型系统的两种资源调度算法——自适应先到先得（Adaptive First Come First Serve,AFCFS）和最大者优先（Largest Job First Served,LJFS）算法,尽量避免不必要的延迟,最大可能地提高系统性能,因为这对于分布式系统资源调度算法十分重要;模拟实验中采用了响应时间、等待时间、到达率等性能指标及性价比这一成本指标,比较了几种算法的性能效率,研究验证了模型系统的成本效率。实验结果表明几种算法可高效地运用于云计算环境,并能提高系统性能效率和成本效率。相似文献

17.

Overhead Analysis of Scientific Workflows in Grid Environments

Prodan R. Fahringer T. 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(3):378-393

Scientific workflows are a topic of great interest in the grid community that sees in the workflow model an attractive paradigm for programming distributed wide-area grid infrastructures. Traditionally, the grid workflow execution is approached as a pure best effort scheduling problem that maps the activities onto the grid processors based on appropriate optimization or local matchmaking heuristics such that the overall execution time is minimized. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable grid environments is prone to severe performance losses that must be understood for minimizing the completion time or for the efficient use of high-performance resources. In this paper, we propose a new systematic approach to help the scientists and middleware developers understand the most severe sources of performance losses that occur when executing scientific workflows in dynamic grid environments. We introduce an ideal model for the lowest execution time that can be achieved by a workflow and explain the difference to the real measured grid execution time based on a hierarchy of performance overheads for grid computing. We describe how to systematically measure and compute the overheads from individual activities to larger workflow regions and adjust well-known parallel processing metrics to the scope of grid computing, including speedup and efficiency. We present a distributed online tool for computing and analyzing the performance overheads in real time based on event correlation techniques and introduce several performance contracts as quality-of-service parameters to be enforced during the workflow execution beyond traditional best effort practices. We illustrate our method through postmortem and online performance analysis of two real-world workflow applications executed in the Austrian grid environment. 相似文献

18.

Energy efficiency of dynamic management of virtual cluster with heterogeneous hardware

Jukka Kommeri Tapio Niemi Jukka K. Nurminen 《The Journal of supercomputing》2017,73(5):1978-2000

Cloud computing is an essential part of today’s computing world. Continuously increasing amount of computation with varying resource requirements is placed in large data centers. The variation among computing tasks, both in their resource requirements and time of processing, makes it possible to optimize the usage of physical hardware by applying cloud technologies. In this work, we develop a prototype system for load-based management of virtual machines in an OpenStack computing cluster. Our prototype is based on an idea of ‘packing’ idle virtual machines into special park servers optimized for this purpose. We evaluate the method by running real high-energy physics analysis software in an OpenStack test cluster and by simulating the same principle using the Cloudsim simulator software. The results show a clear improvement, 9–48 % , in the total energy efficiency when using our method together with resource overbooking and heterogeneous hardware. 相似文献

19.

面向长作业环境中的云调度策略

蒋维成李兰英郭俊徐草草《计算机工程与科学》2017,39(8):1431-1437

随着云计算的普及,大量的数据处理选择云服务来完成。现有算法较少考虑异构型系统中虚拟机计算能力的不同,导致某些任务等待时间过长。提出了虚拟机负载大小实时调整的算法。对云计算中资源虚拟化特征,给出一种评估虚拟机计算能力的方法。根据虚拟机能力和运行过程中的状态变化,自适应进行任务量大小调整,满足实时要求。通过任务调度,协调任务完成时间,保持各虚拟机负载的动态均衡,缩短长作业的总执行时间,提高了系统的吞吐量和整体服务能力,提升了效益。实验结果表明,本文算法能自适应地调整任务量大小,进行调度,以维持虚拟机负载均衡。相似文献

20.

ContainerCloudSim: An environment for modeling and simulation of containers in cloud data centers

下载免费PDF全文

Sareh Fotuhi Piraghaj Amir Vahid Dastjerdi Rodrigo N. Calheiros Rajkumar Buyya 《Software》2017,47(4):505-521

Containers are increasingly gaining popularity and becoming one of the major deployment models in cloud environments. To evaluate the performance of scheduling and allocation policies in containerized cloud data centers, there is a need for evaluation environments that support scalable and repeatable experiments. Simulation techniques provide repeatable and controllable environments, and hence, they serve as a powerful tool for such purpose. This paper introduces ContainerCloudSim, which provides support for modeling and simulation of containerized cloud computing environments. We developed a simulation architecture for containerized clouds and implemented it as an extension of CloudSim. We described a number of use cases to demonstrate how one can plug in and compare their container scheduling and provisioning policies in terms of energy efficiency and SLA compliance. Our system is highly scalable as it supports simulation of large number of containers, given that there are more containers than virtual machines in a data center. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献