期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MUC: Updating cloud applications dynamically via multi-version execution

《Future Generation Computer Systems》2017

Cloud applications usually need to provide service continuously, however, updating them to fix bugs or add new features will interrupt the service. Conventional Dynamic Software Updating (DSU) systems try to update applications while running, but they are hardly able to consider the communication of the application to be updated with other programs, and could then lead to some inconsistency. Therefore, the DSU systems could not be directly applied into cloud where one application normally interacts with the other side.We propose an improved DSU system to update cloud applications dynamically, and utilize the multi-version execution approach to handle the inconsistent issue. When a new update arrives, instead of updating the application to the new version, we fork a new process of the old version and dynamically update it to the new version, then make these two versions run concurrently until the update finishes. To show the feasibility of the proposed solution, a prototype system called MUC (Multi-version execution for Updating of Cloud) is implemented on Linux, and MUC is applied to update three cloud applications, Redis, Memcached, and Icecast. 相似文献

2.

A simulator for adaptive parallel applications

Basile Schaeli Sebastian Gerlach Roger D. Hersch 《Journal of Computer and System Sciences》2008,74(6):983-999

Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. Detailed simulations can help identify allocation strategies and problem decomposition parameters that increase the efficiency of parallel applications. We describe a simulation framework supporting dynamic node allocation which, given a simple cluster model, predicts the running time of parallel applications taking CPU and network sharing into account. Simulations can be carried out without needing to modify the application code. Thanks to partial direct execution, simulation times and memory requirements are reduced. In partial direct execution simulations, the application's parallel behavior is retrieved via direct execution, and the duration of individual operations is obtained from a performance prediction model or from prior measurements. Simulations may then vary cluster model parameters, operation durations and problem decomposition parameters to analyze their impact on the application performance and identify the limiting factors. We implemented the proposed techniques by adding direct execution simulation capabilities to the Dynamic Parallel Schedules parallelization framework. We introduce the concept of dynamic efficiency to express the resource utilization efficiency as a function of time. We verify the accuracy of our simulator by comparing the effective running time, respectively the dynamic efficiency, of parallel program executions with the running time, respectively the dynamic efficiency, predicted by the simulator under different parallelization and dynamic node allocation strategies. 相似文献

3.

Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the SolarisTM

Kelvin K. Yue David J. Lilja 《Concurrency and Computation》2001,13(6):449-464

Parallel applications typically do not perform well in a multiprogrammed environment that uses time‐sharing to allocate processor resources to the applications' parallel threads. Co‐scheduling related parallel threads, or statically partitioning the system, often can reduce the applications' execution times, but at the expense of reducing the overall system utilization. To address this problem, there has been increasing interest in dynamically allocating processors to applications based on their resource demands and the dynamically varying system load. The Loop‐Level Process Control (LLPC) policy (Yue K, Lilja D. Efficient execution of parallel applications in multiprogrammed multiprocessor systems. 10th International Parallel Processing Symposium, 1996; 448–456) dynamically adjusts the number of threads an application is allowed to execute based on the application's available parallelism and the overall system load. This study demonstrates the feasibility of incorporating the LLPC strategy into an existing commercial operating system and parallelizing compiler and provides further evidence of the performance improvement that is possible using this dynamic allocation strategy. In this implementation, applications are automatically parallelized and enhanced with the appropriate LLPC hooks so that each application interacts with the modified version of the Solaris operating system. The parallelism of the applications are then dynamically adjusted automatically when they are executed in a multiprogrammed environment so that all applications obtain a fair share of the total processing resources. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

4.

An efficient grid scheduling strategy for data parallel applications

Kashif Hesham Khan Kalim Qureshi Mostafa Abd-El-Barr 《The Journal of supercomputing》2014,68(3):1487-1502

Scheduling large-scale application in heterogeneous grid systems is a fundamental NP-complete problem that is critical to obtain good performance and execution cost. To achieve high performance in a grid system it requires effective task partitioning, resource management and load balancing. The heterogeneous and dynamic nature of a grid, as well as the diverse demands of applications running on the grid, makes grid scheduling a major task. Existing schedulers in wide-area heterogeneous systems require a large amount of information about the application and the grid environment to produce reasonable schedules. However, this required information may not be available, may be too expensive to collect, or may increase the runtime overhead of the scheduler such that the scheduler is rendered ineffective. We believe that no one scheduler is appropriate for all grid systems and applications. This is because while data parallel applications in which further data partitioning is possible can be further improved by efficient management of resources, smart selection of resources and load balancing can be possible, in functional/not-dividable-task parallel applications such partitioning is either not possible or difficult or expensive in term of performance. In this paper, we propose a scheduler for data parallel applications (SDPA) which offers an efficient task partitioning and load balancing strategy for data parallel applications in grid environment. The proposed SDPA offers two major features: maintaining job priority even if insufficient number of free resources is available and pre-task assignment to cut the idle time of nodes. The SDPA selects nodes smartly according to the nature of task and the nodes’ resources availability. Simulation results conducted reveal that SDPA achieves performance improvement over reported strategies in the reviewed literature in terms of execution time, throughput and waiting time. 相似文献

5.

Adaptive middleware supporting scalable performance for high-end network services

Byoung-Dai Lee Jon B. Weissman Young-Kwang Nam 《Journal of Network and Computer Applications》2009,32(3):510-524

Network service-based computation is a promising paradigm for both scientific and engineering, and enterprise computing. The network service allows users to focus on their application and obtain services when needed, simply by invoking the service across the network. In this paper, we show that an adaptive, general-purpose run-time infrastructure in support of effective resource management can be built for a wide range of high-end network services running in a single-site cluster and in a Grid. The primary components of the run-time infrastructure are: (1) dynamic performance prediction; (2) adaptive intra-site resource management; and (3) adaptive inter-site resource management. The novel aspect of our approach is that the run-time system is able to dynamically select the most appropriate performance predictor or resource management strategy over time. This capability not only improves the performance, but also makes the infrastructure reusable across different high-end services. To evaluate the effectiveness and applicability of our approach, we have transformed two different classes of high-end applications—data parallel and distributed applications—into network services using the infrastructure. The experimental results show that the network services running on the infrastructure significantly reduce the overall service times under dynamically varying circumstances. 相似文献

6.

Comparing Processor Allocation Strategies in Multiprogrammed Shared-Memory Multiprocessors

Kelvin K. Yue David J. Lilja 《Journal of Parallel and Distributed Computing》1998,49(2):183

Small-scale shared-memory multiprocessors are commonly used in a workgroup environment where multiple applications, both parallel and sequential, are executed concurrently while sharing the processors and other system resources. To utilize the processors efficiently, an effective allocation strategy is required. In this paper, we use performance data obtained from an SGI multiprocessor to evaluate several processor allocation strategies when running two parallel programs simultaneously. We examine gang scheduling (coscheduling), static space-sharing (space partitioning), and a dynamic allocation scheme called loop-level process control (LLPC) with three different dynamic allocation heuristics. We use regression analysis to quantify the measured data and thereby explore the relationship between the degree of parallelism of the application, specific system parameters (such as the size of the system), the processor allocation strategy, and the resulting performance. This study shows that dynamically partitioning the system using LLPC or similar heuristics provides better performance for applications with a high degree of parallelism than either gang scheduling or static space-sharing. 相似文献

7.

Dynamic web worker pool management for highly parallel javascript web applications

Javier Verdú Juan Jos Costa Alex Pajuelo 《Concurrency and Computation》2016,28(13):3525-3539

JavaScript web applications are improving performance mainly thanks to the inclusion of new standards by HTML5. Among others, web workers API allows multithreaded JavaScript web apps to exploit parallel processors. However, developers have difficulties to determine the minimum number of web workers that provide the highest performance. But even if developers found out this optimal number, it is a static value configured at the beginning of the execution. Because users tend to execute other applications in background, the estimated number of web workers could be non‐optimal, because it may overload or underutilize the system. In this paper, we propose a solution for highly parallel web apps to dynamically adapt the number of running web workers to the actual available resources, avoiding the hassle to estimate a static optimal number of threads. The solution consists in the inclusion of a web worker pool and a simple management algorithm in the web app. Even though there are co‐running applications, the results show our approach dynamically enables a number of web workers close to the optimal. Our proposal, which is independent of the web browser, overcomes the lack of knowledge of the underlying processor architecture as well as dynamic resources availability changes. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

8.

G‐BLAST: a Grid‐based solution for mpiBLAST on computational Grids

Chao‐Tung Yang Tsu‐Fen Han Heng‐Chuan Kan 《Concurrency and Computation》2009,21(2):225-255

Over the past few years, research and development in bioinformatics (e.g. genomic sequence alignment) has grown with each passing day fueling continuing demands for vast computing power to support better performance. This trend usually requires solutions involving parallel computing techniques because cluster computing technology reduces execution times and increases genomic sequence alignment efficiency. One example, mpiBLAST is a parallel version of NCBI BLAST that combines NCBI BLAST with message passing interface (MPI) standards. However, as most laboratories cannot build up powerful cluster computing environments, Grid computing framework concepts have been designed to meet the need. Grid computing environments coordinate the resources of distributed virtual organizations and satisfy the various computational demands of bioinformatics applications. In this paper, we report on designing and implementing a BioGrid framework, called G‐BLAST, that performs genomic sequence alignments using Grid computing environments and accessible mpiBLAST applications. G‐BLAST is also suitable for cluster computing environments with a server node and several client nodes. G‐BLAST is able to select the most appropriate work nodes, dynamically fragment genomic databases, and self‐adjust according to performance data. To enhance G‐BLAST capability and usability, we also employ a WSRF Grid Service Portal and a Grid Service GUI desk application for general users to submit jobs and host administrators to maintain work nodes. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

9.

Dynamically adapting to system load and program behavior in multiprogrammed multiprocessor systems

Iffat H. Kazi David J. Lilja 《Concurrency and Computation》2002,14(12):957-985

Parallel execution of application programs on a multiprocessor system may lead to performance degradation if the workload of a parallel region is not large enough to amortize the overheads associated with the parallel execution. Furthermore, if too many processes are running on the system in a multiprogrammed environment, the performance of the parallel application may degrade due to resource contention. This work proposes a comprehensive dynamic processor allocation scheme that takes both program behavior and system load into consideration when dynamically allocating processors. This mechanism was implemented on the Solaris operating system to dynamically control the execution of parallel C and Java application programs. Performance results show the effectiveness of this scheme in dynamically adapting to the current execution environment and program behavior, and that it outperforms a conventional time‐shared system. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

10.

Parallel Algorithms for Dynamic Shortest Path Problems

Ismail Chabini & Sridevi Ganugapati 《International Transactions in Operational Research》2002,9(3):279-302

The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications. 相似文献

11.

A survey on dynamic graph processing on GPUs: concepts,terminologies and systems

Hongru GAO Xiaofei LIAO Zhiyuan SHAO Kexin LI Jiajie CHEN Hai JIN 《Frontiers of Computer Science》2024,18(4):184106

Graphs that are used to model real-world entities with vertices and relationships among entities with edges, have proven to be a powerful tool for describing real-world problems in applications. In most real-world scenarios, entities and their relationships are subject to constant changes. Graphs that record such changes are called dynamic graphs. In recent years, the widespread application scenarios of dynamic graphs have stimulated extensive research on dynamic graph processing systems that continuously ingest graph updates and produce up-to-date graph analytics results. As the scale of dynamic graphs becomes larger, higher performance requirements are demanded to dynamic graph processing systems. With the massive parallel processing power and high memory bandwidth, GPUs become mainstream vehicles to accelerate dynamic graph processing tasks. GPU-based dynamic graph processing systems mainly address two challenges: maintaining the graph data when updates occur (i.e., graph updating) and producing analytics results in time (i.e., graph computing). In this paper, we survey GPU-based dynamic graph processing systems and review their methods on addressing both graph updating and graph computing. To comprehensively discuss existing dynamic graph processing systems on GPUs, we first introduce the terminologies of dynamic graph processing and then develop a taxonomy to describe the methods employed for graph updating and graph computing. In addition, we discuss the challenges and future research directions of dynamic graph processing on GPUs. 相似文献

12.

一种面向PaaS的实例级应用动态更新技术

张婕曹春余东亮《计算机科学》2015,42(12):60-64

云计算是当前信息技术的重要技术领域,而平台即服务(PaaS)已成为业界研究的热点之一。PaaS平台为用户提供高可用、高可扩展的应用开发、部署和运行环境。然而当部署到云端的应用需要不断更新以修复错误、增加功能时,当前主流PaaS平台却因缺乏对应用在线更新的有效支持而削弱了其自身的高可用特性。为解决该问题,提出一个面向PaaS平台的动态更新技术框架。基于现有软件动态更新技术的研究,通过对PaaS平台中应用的事务管理、动态依赖管理、版本管理等机制的扩展,为PaaS平台提供运行时实例级的应用动态更新支撑,并在Cloud Foundry上进行实现和实验,结果证明了该动态更新技术的有效性。相似文献

13.

Case for dynamic deployment in a grid-based distributed query processor

A. MukherjeeAuthor Vitae P. Watson Author Vitae 《Future Generation Computer Systems》2012,28(1):171-183

Grid computing enables users to perform computationally expensive applications on distributed resources acquired dynamically. Users are allowed to combine structured data and analysis components into new applications from distributed sites into new applications. Distributed query processing offers an established way of structuring such computations, and well-known tools like OGSA-DAI and OGSA-DQP provide respectively a common interface to heterogeneous databases, and a way of exploiting distributed resources. Such significant benefits are however often undermined by high communication costs due to the need to move data between distributed resources. This paper describes an approach that addresses this by dynamically deploying query processing engines, analysis services and databases within virtual machines, on an internet-scale, so as to reduce communication costs. Results of internet-scale experiments are presented to demonstrate the performance benefits. Further, the use of dynamic deployment features based on requirements allows the creation of an ad-hoc runtime engine and thus opens up the possibility of creating a virtual marketplace for software and hardware resources. 相似文献

14.

Flow‐sensitive runtime estimation: an enhanced hot spot detection heuristics for embedded Java just‐in‐time compilers

下载免费PDF全文

Seong‐Won Lee Soo‐Mook Moon Seong‐Moo Kim 《Software》2016,46(6):841-864

Java just‐in‐time compilers often compile only hot methods because the compilation overhead is a part of the running time. This requires precise and efficient hot spot detection, which includes distinguishing hot methods from cold ones, detecting them as early as possible, and paying a small detection overhead. Hot spot detection is especially important in embedded applications because they show more of a start‐up phase behavior of a regular application where methods are not executed heavily, so the hot methods are not definite. Because a long‐running method is likely to be a hot method, we can detect a hot method by measuring its running time during interpretation. However, precise measurement of the running time during execution is too expensive, especially in embedded systems, so many counter‐based heuristics have been proposed to estimate it such as Oracle's HotSpot heuristic. One problem is that although the overhead of these heuristics is low, they do not estimate the running time precisely, which may lead to imprecise hot spot detection.This paper proposes a new hot spot detection heuristic called flow‐sensitive runtime estimation, which can estimate the running time more precisely than others with a relatively low overhead. It only counts important bytecode instructions dynamically, but it can obtain the precise count of all interpreted bytecode instructions with a simple arithmetic calculation. We also propose a static analysis technique to predict those hot methods which spends a huge execution time once invoked, so as to compile them at their first invocation. Our experimental results show that these techniques can improve the performance by as much as an average of 7.4% compared with the HotSpot heuristic for the benchmarks when they run once, which is often regarded as showing the start‐up phase behavior. Even for real embedded Java applications such as the digital TV Java Xlet applications, our techniques can improve the user response time by an average of 7.1%. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

15.

ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing

Ciprian Docan Fan Zhang Tong Jin Hoang Bui Qian Sun Julian Cummings Norbert Podhorszki Scott Klasky Manish Parashar 《Concurrency and Computation》2015,27(14):3724-3745

Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership‐class resources has become a critical challenge. The data have to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, and so on. Several recent research efforts have addressed data‐related challenges at different levels. One attractive approach is to offload expensive input/output operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still have to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data‐processing code to the staging area instead of moving the data to the data‐processing code. Specifically, we describe the ActiveSpaces framework, which provides (1) programming support for defining the data‐processing routines to be downloaded to the staging area and (2) runtime mechanisms for transporting codes associated with these routines to the staging area, executing the routines on the nodes that are part of the staging area, and returning the results. We also present an experimental performance evaluation of ActiveSpaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade‐offs between transporting data and transporting the code required for data processing during coupling, and we characterize sweet spots for each option. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

16.

A Hybrid Analysis of an Optimization Approach for Cluster Applications

Ming Zhu Wentong Cai Bu-Sung Lee Xudong Wu 《The Journal of supercomputing》2005,32(3):191-215

Cluster/distributed computing has become a popular, cost-effective alternative to high-performance parallel computers. Many parallel programming languages and related programming models have become widely accepted on clusters. However, the high communication overhead is a major shortcoming of running parallel applications on cluster/distributed computing environments. To reduce the communication overhead and thus the completion time of a parallel application, this paper introduces and evaluates an efficient Key Message (KM) approach to support parallel computing on cluster computing environments. In this paper, we briefly present the model and algorithm, and then analytical and simulation methods are adopted to evaluate the performance of the algorithm. It demonstrates that when network background load increases or the computation to communication ratio decreases, the analysis results show better improvement on communication of a parallel application over the system which does not use the KM approach. 相似文献

17.

计算密集型大流量数据的接力计算与动态分流处理

廖佳陈扬包秋兰廖雪花朱洲森《计算机应用》2021,41(9):2646-2651

针对当前大流量数据计算速度慢、服务器端计算压力大等问题,提出一套计算密集型大流量数据的接力计算与动态分流处理模型。首先,在分布式环境下,使用内存型数据存储技术确定计算任务的运算量与复杂等级,同时利用节点资源能力对节点进行排序;然后,动态分配任务到不同节点进行并行计算,并采用一种接力处理模式完成计算任务的分解,以有效保证高流量复杂运算任务的性能和精度要求。通过分析对比,可知在万级以上数据量的情况下,多个节点比单个节点的运行时间更短、计算速度更快;而且,将该模型应用于实际时,发现它不仅能在高并发场景下减少运行时间,而且也能节省更多计算资源。相似文献

18.

Tuple switching network—When slower may be better

Justin Y. Shi Moussa Taifi Abdallah Khreishah Jie Wu 《Journal of Parallel and Distributed Computing》2012

This paper reports an application dependent network design for extreme scale high performance computing (HPC) applications. Traditional scalable network designs focus on fast point-to-point transmission of generic data packets. The proposed network focuses on the sustainability of high performance computing applications by statistical multiplexing of semantic data objects. For HPC applications using data-driven parallel processing, a tuple is a semantic object. We report the design and implementation of a tuple switching network for data parallel HPC applications in order to gain performance and reliability at the same time when adding computing and communication resources. We describe a sustainability model and a simple computational experiment to demonstrate extreme scale application’s sustainability with decreasing system mean time between failures (MTBF). Assuming three times slowdown of statistical multiplexing and 35% time loss per checkpoint, a two-tier tuple switching framework would produce sustained performance and energy savings for extreme scale HPC application using more than 1024 processors or less than 6 hour MTBF. Higher processor counts or higher checkpoint overheads accelerate the benefits. 相似文献

19.

Low‐latency Java communication devices on RDMA‐enabled networks

Roberto R. Expsito Guillermo L. Taboada Sabela Ramos Juan Tourio Ramn Doallo 《Concurrency and Computation》2015,27(17):4852-4879

Providing high‐performance inter‐node communication is a key capability for running high performance computing applications efficiently on parallel architectures. In fact, current systems deployments are aggregating a significant number of cores interconnected via advanced networking hardware with Remote Direct Memory Access (RDMA) mechanisms, that enable zero‐copy and kernel‐bypass features. The use of Java for parallel programming is becoming more promising thanks to some useful characteristics of this language, particularly its built‐in multithreading support, portability, easy‐to‐learn properties, and high productivity, along with the continuous increase in the performance of the Java virtual machine. However, current parallel Java applications generally suffer from inefficient communication middleware, mainly based on protocols with high communication overhead that do not take full advantage of RDMA‐enabled networks. This paper presents efficient low‐level Java communication devices that overcome these constraints by fully exploiting the underlying RDMA hardware, providing low‐latency and high‐bandwidth communications for parallel Java applications. The performance evaluation conducted on representative RDMA networks and parallel systems has shown significant point‐to‐point performance increases compared with previous Java communication middleware, allowing to obtain up to 40% improvement in application‐level performance on 4096 cores of a Cray XE6 supercomputer. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

20.

Implementation and performance evaluation of a scheduling algorithm for divisible load parallel applications in a cloud computing environment

Leila Ismail Latifur Khan 《Software》2015,45(6):765-781

Cloud computing is an emerging technology in which information technology resources are virtualized to users in a set of computing resources on a pay‐per‐use basis. It is seen as an effective infrastructure for high performance applications. Divisible load applications occur in many scientific and engineering applications. However, dividing an application and deploying it in a cloud computing environment face challenges to obtain an optimal performance due to the overheads introduced by the cloud virtualization and the supporting cloud middleware. Therefore, we provide results of series of extensive experiments in scheduling divisible load application in a Cloud environment to decrease the overall application execution time considering the cloud networking and computing capacities presented to the application's user. We experiment with real applications within the Amazon cloud computing environment. Our extensive experiments analyze the reasons of the discrepancies between a theoretical model and the reality and propose adequate solutions. These discrepancies are due to three factors: the network behavior, the application behavior and the cloud computing virtualization. Our results show that applying the algorithm result in a maximum ratio of 1.41 of the measured normalized makespan versus the ideal makespan for application in which the communication to computation ratio is big. They show that the algorithm is effective for those applications in a heterogeneous setting reaching a ratio of 1.28 for large data sets. For application following the ensemble clustering model in which the computation to communication ratio is big and variable, we obtained a maximum ratio of 4.7 for large data set and a ratio of 2.11 for small data set. Applying the algorithm also results in an important speedup. These results are revealing for the type of applications we consider under experiments. The experiments also reveal the impact of the choice of the platforms provided by Amazon on the performance of the applications under study. Considering the emergence of cloud computing for high performance applications, the results in this paper can be widely adopted by cloud computing developers. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献