期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reliable and Efficient Distributed Checkpointing System for Grid Environments

Tanzima?Zerin?Islam Email author Saurabh?Bagchi Rudolf?Eigenmann 《Journal of Grid Computing》2014,12(4):593-613

In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as their performance degradation is tolerable. However, unpredictable evictions of guest jobs lead to fluctuating completion times. Checkpoint-recovery is an attractive mechanism for recovering from such “failures”. Today’s FGCS systems often use expensive, high-performance dedicated checkpoint servers. However, in geographically distributed clusters, this may incur high checkpoint transfer latencies. In this paper we present a distributed checkpointing system called Falcon that uses available disk resources of the FGCS machines as shared checkpoint repositories. However, an unavailable storage host may lead to loss of checkpoint data. Therefore, we model the failures of a storage host and develop a prediction algorithm for choosing reliable checkpoint repositories. We experiment with Falcon in the university-wide Condor testbed at Purdue and show improved and consistent performance for guest jobs in the presence of irregular resource availability. 相似文献

2.

Double auction-inspired meta-scheduling of parallel applications on global grids

Saurabh Kumar Garg Srikumar Venugopal James Broberg Rajkumar Buyya 《Journal of Parallel and Distributed Computing》2013

Meta-schedulers map jobs to computational resources that are part of a Grid, such as clusters, that in turn have their own local job schedulers. Existing Grid meta-schedulers either target system-centric metrics, such as utilisation and throughput, or prioritise jobs based on utility metrics provided by the users. The system-centric approach gives less importance to users’ individual utility, while the user-centric approach may have adverse effects such as poor system performance and unfair treatment of users. Therefore, this paper proposes a novel meta-scheduler, based on the well-known double auction mechanism that aims to satisfy users’ service requirements as well as ensuring balanced utilisation of resources across a Grid. We have designed valuation metrics that commodify both the complex resource requirements of users and the capabilities of available computational resources. Through simulation using real traces, we compare our scheduling mechanism with other common mechanisms widely used by both existing market-based and traditional meta-schedulers. The results show that our meta-scheduling mechanism not only satisfies up to 15% more user requirements than others, but also improves system utilisation through load balancing. 相似文献

3.

Grid Resource Availability Prediction-Based Scheduling and Task Replication

Brent Rood Michael J. Lewis 《Journal of Grid Computing》2009,7(4):479-500

The frequent and volatile unavailability of volunteer-based Grid computing resources challenges Grid schedulers to make effective job placements. The manner in which host resources become unavailable will have different effects on different jobs, depending on their runtime and their ability to be checkpointed or replicated. A multi-state availability model can help improve scheduling performance by capturing the various ways a resource may be available or unavailable to the Grid. This paper uses a multi-state model and analyzes a machine availability trace in terms of that model. Several prediction techniques then forecast resource transitions into the model’s states. We analyze the accuracy of our predictors, which outperform existing approaches. We also propose and study several classes of schedulers that utilize the predictions, and a method for combining scheduling factors. We characterize the inherent tradeoff between job makespan and the number of evictions due to failure, and demonstrate how our schedulers can navigate this tradeoff under various scenarios. Lastly, we propose job replication techniques, which our schedulers utilize to replicate those jobs that are most likely to fail. Our replication strategies outperform others, as measured by improved makespan and fewer redundant operations. In particular, we define a new metric for replication efficiency, and demonstrate that our multi-state availability predictor can provide information that allows our schedulers to be more efficient than others that blindly replicate all jobs or some static percentage of jobs. 相似文献

4.

Resource policing to support fine-grain cycle stealing in networks of workstations

Ryu K.D. Hollingsworth J.K. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(10):878-892

We present the design, implementation, and performance evaluation of a suite of resource policing mechanisms that allow guest processes to efficiently and unobtrusively exploit otherwise idle workstation resources. Unlike traditional policies that harvest cycles only from unused machines, we employ fine-grained cycle stealing to exploit resources even from machines that have active users. We developed a suite of kernel extensions that enable these policies to operate without significantly impacting host processes: 1) a new starvation-level CPU priority for guest jobs, 2) a new page replacement policy that imposes hard bounds on physical memory usage by guest processes, and 3) a new I/O scheduling mechanism called rate windows that throttle guest processes' usage of I/O and network bandwidth. We evaluate both the individual impacts of each mechanism, and their utility for our fine-grain cycle stealing. 相似文献

5.

Adaptive Execution of Jobs in Computational Grid Environment 总被引：1，自引：0，他引：1

下载免费PDF全文

Sarbani Roy Nandini Mukherjee Member IEEE 《计算机科学技术学报》2009,24(5):925-938

In a computational grid, jobs must adapt to the dynamically changing heterogeneous environment with an objective of maintaining the quality of service. In order to enable adaptive execution of multiple jobs running concurrently in a computational grid, we propose an integrated performance-based resource management framework that is supported by a multi-agent system (MAS). The multi-agent system initially allocates the jobs onto different resource providers based on a resource selection algorithm. Later, during runtime, if performance of any job degrades or quality of service cannot be maintained for some reason (resource failure or overloading), the multi-agent system assists the job to adapt to the system. This paper focuses on a part of our framework in which adaptive execution facility is supported. Adaptive execution facility is availed by reallocation and local tuning of jobs. Mobile, as well as static agents are employed for this purpose. The paper provides a summary of the design and implementation and demonstrates the efficiency of the framework by conducting experiments on a local grid test bed. 相似文献

6.

A Resource Selection System for Cycle Stealing in GPU Grids

Y. Kotani F. Ino K. Hagihara 《Journal of Grid Computing》2008,6(4):399-416

This paper presents a resource selection system for exploiting graphics processing units (GPUs) as general-purpose computational resources in desktop Grid environments. Our system allows Grid users to share remote GPUs, which are traditionally dedicated to local users who directly see the display output. The key contribution of the paper is to develop this novel system for non-dedicated environments. We first show criteria for defining idle GPUs from the Grid users’ point of view. Based on these criteria, our system uses a screensaver approach with some sensors that detect idle resources at a low overhead. The idea for this lower overhead is to avoid GPU intervention during resource monitoring. Detected idle GPUs are then selected according to a matchmaking service, making the system adaptive to the rapid advance of GPU architecture. Though the system itself is not yet interoperable with current desktop Grid systems, our idea can be applied to screensaver-based systems such as BOINC. We evaluate the system using Windows PCs with three generations of nVIDIA GPUs. The experimental results show that our system achieves a low overhead of at most 267 ms, minimizing interference to local users while maximizing the performance delivered to Grid users. Some case studies are also performed in an office environment to demonstrate the effectiveness of the system in terms of the amount of detected idle time. 相似文献

7.

Multicriteria,multi-user scheduling in grids with advance reservation

Krzysztof Kurowski Ariel Oleksiak Jan Weglarz 《Journal of Scheduling》2010,13(5):493-508

In this paper, we propose a new method for multi-user, multicriteria job scheduling in Grid environments with QoS guarantees concerning time and cost. The main goal of our method is to find a fair schedule of jobs that were submitted by multiple users. To obtain a schedule which is satisfactory for each user we aim at finding a set of advance reservations (ARs) for multiple users at once. This goal is achieved by adequate use of the Ordered Weighted Averaging (OWA) operator and Multiobjective Evolutionary Algorithm (MOEA) with carefully designed problem representation and operators. We also propose a data structure and algorithm used to manage and search for resource availability time slots. Efficiency and usefulness of our approach was demonstrated by computational experiments conducted within a simulation environment. 相似文献

8.

Scalable dimensioning of resilient Lambda Grids

Pieter Marc Bruno Filip Bart Piet 《Future Generation Computer Systems》2008,24(6):549-560

Grids consist of the aggregation of numerous dispersed computational, storage and network resources, able to satisfy even the most demanding computing jobs. Due to the data-intensive nature of Grid jobs, there is an increasing interest in Grids using optical transport networks as this technology allows for the timely delivery of large amounts of data. Such Grids are commonly referred to as Lambda Grids.

An important aspect of Grid deployment is the allocation and activation of installed network capacity, needed to transfer data and jobs to and from remote resources. However, the exact nature of a Grid’s network traffic depends on the way arriving workload is scheduled over the various Grid sites. As Grids possibly feature high numbers of resources, jobs and users, solving the combined Grid network dimensioning and workload scheduling problem requires the use of scalable mathematical methods such as Divisible Load Theory (DLT). Lambda Grids feature additional complexity such as wavelength granularity and continuity or conversion constraints must be enforced. Additionally, Grid resources cannot be expected to be available at all times. Therefore, the extra complexity of resilience against possible resource failures must be taken into account when modelling the combined Grid network dimensioning and workload scheduling problem, enforcing the need for scalable solution methods. In this work, we tackle the Lambda Grid combined dimensioning and workload scheduling problem and incorporate single-resource failure or unavailability scenarios. We use Divisible Load Theory to tackle the scalability problem and compare non-resilient lambda Grid dimensioning to the dimensions needed to survive single-resource failures. We distinguish three failure scenarios relevant to lambda Grid deployment: computational element, network link and optical cross-connect failure. Using regular network topologies, we derive analytical bounds on the dimensioning cost. To validate these bounds, we present comparisons for the resulting Grid dimensions assuming a 2-tier Grid operation as a function of varying wavelength granularity, fiber/wavelength cost models, traffic demand asymmetry and Grid scheduling strategy for a specific set of optical transport networks. 相似文献

9.

基于网格计算市场模型的资源与作业描述语言的研究 总被引：1，自引：0，他引：1

陈颖杨寿保《计算机科学》2005,32(2):90-92

网格计算市场模型是把经济学的概念应用到网格的资源管理和作业调度中的模型。本文分析了网格计算市场模型中资源和作业描述语言的需求,简要介绍了资源和作业描述语言Classified Advertisements(Classad),指出它在网格计算市场模型中描述资源和作业的不足之处,对它做了相应的改进和扩充．以实现在经济模型下对资源和作业更加灵活、细枉度的描述。相似文献

10.

Online algorithms for advance resource reservations 总被引：2，自引：0，他引：2

C. CastilloAuthor Vitae G.N. Rouskas^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(7):963-973

We consider the problem of providing QoS guarantees to Grid users through advance reservation of resources. Advance reservation mechanisms provide the ability to allocate resources to users based on agreed-upon QoS requirements and increase the predictability of a Grid system, yet incorporating such mechanisms into current Grid environments has proven to be a challenging task due to the resulting resource fragmentation. We use concepts from computational geometry to present a framework for tackling the resource fragmentation, and for formulating a suite of scheduling strategies. We also develop efficient implementations of the scheduling algorithms that scale to large Grids. We conduct a comprehensive performance evaluation study using simulation, and we present numerical results to demonstrate that our strategies perform well across several metrics that reflect both user- and system-specific goals. Our main contribution is a timely, practical, and efficient solution to the problem of scheduling resources in emerging on-demand computing environments. 相似文献

11.

基于经济模型的资源代理

冯元董国卿童维勤《计算机应用与软件》2005,22(10):81-83

网格计算是为解决大规模资源密集型问题而提出的新一代计算平台,是当前并行和分布处理技术的一个发展方向,而资源管理是计算网格的关键技术之一。对各种各样可利用资源的整合和管理是网格应用的基础,而资源的分布性、动态性、异构性、自治性和需要协调一致性使得网格资源的管理调度成为一个棘手的问题。目前基于市场的经济资源管理和调度算法非常适合计算网格中的资源管理问题,但有调度价格不能更改、负载平衡等问题。文中提出了“网格环境下基于经济模型的资源代理”,依靠多维QoS指导的调度策略和经济模型的启发式调节资源价格,改进和优化计算网格资源的分配。相似文献

12.

On the complexity of interval scheduling with a resource constraint

Enrico Angelelli 《Theoretical computer science》2011,412(29):3650-3657

We consider a scheduling problem where jobs have to be carried out by parallel identical machines. The attributes of a job j are: a fixed start time s_j, a fixed finish time f_j, and a resource requirement r_j. Every machine owns R units of a renewable resource necessary to carry out jobs. A machine can process more than one job at a time, provided the resource consumption does not exceed R. The jobs must be processed in a non-preemptive way. Within this setting, the problem is to decide whether a feasible schedule for all jobs exists or not.We discuss such a decision problem and prove that it is strongly NP-complete even when the number of resources are fixed to any value R≥2. Moreover, we suggest an implicit enumeration algorithm which has O(nlogn) time complexity in the number n of jobs when the number m of machines and the number R of resources per machine are fixed.The role of storage layout and preemption are also discussed. 相似文献

13.

Affinity-aware modeling of CPU usage with communicating virtual machines

Sujesha Sudevalayam Purushottam Kulkarni 《Journal of Systems and Software》2013

Use of virtualization in Infrastructure as a Service (IaaS) environments provides benefits to both users and providers: users can make use of resources following a pay-per-use model and negotiate performance guarantees, whereas providers can provide quick, scalable and hardware-fault tolerant service and also utilize resources efficiently and economically. With increased acceptance of virtualization-based systems, an important issue is that of virtual machine migration-enabled consolidation and dynamic resource provisioning. Effective resource provisioning can result in higher gains for users and providers alike. Most hosted applications (for example, web services) are multi-tiered and can benefit from their various tiers being hosted on different virtual machines. These mutually communicating virtual machines may get colocated on the same physical machine or placed on different machines, as part of consolidation and flexible provisioning strategies. In this work, we argue the need for network affinity-awareness in resource provisioning for virtual machines. First, we empirically quantify the change in CPU resource usage due to colocation or dispersion of communicating virtual machines for both Xen and KVM virtualization technologies. Next, we build models based on these empirical measurements to predict the change in CPU utilization when transitioning between colocated and dispersed placements. Due to the modeling process being independent of virtualization technology and specific applications, the resultant model is generic and application-agnostic. Via extensive experimentation, we evaluate the applicability of our models for synthetic and benchmark application workloads. We find that the models have high prediction accuracy — maximum prediction error within 2% absolute CPU usage. 相似文献

14.

Market-oriented Grids and Utility Computing: The State-of-the-art and Future Directions 总被引：2，自引：0，他引：2

James Broberg Srikumar Venugopal Rajkumar Buyya 《Journal of Grid Computing》2008,6(3):255-276

Traditional resource management techniques (resource allocation, admission control and scheduling) have been found to be inadequate for many shared Grid and distributed systems, that consist of autonomous and dynamic distributed resources contributed by multiple organisations. They provide no incentive for users to request resources judiciously and appropriately, and do not accurately capture the true value, importance and deadline (the utility) of a user’s job. Furthermore, they provide no compensation for resource providers to contribute their computing resources to shared Grids, as traditional approaches have a user-centric focus on maximising throughput and minimising waiting time rather than maximising a providers own benefit. Consequently, researchers and practitioners have been examining the appropriateness of ‘market-inspired’ resource management techniques to address these limitations. Such techniques aim to smooth out access patterns and reduce the chance of transient overload, by providing a framework for users to be truthful about their resource requirements and job deadlines, and offering incentives for service providers to prioritise urgent, high utility jobs over low utility jobs. We examine the recent innovations in these systems (from 2000–2007), looking at the state-of-the-art in price setting and negotiation, Grid economy management and utility-driven scheduling and resource allocation, and identify the advantages and limitations of these systems. We then look to the future of these systems, examining the emerging ‘Catallaxy’ market paradigm. Finally we consider the future directions that need to be pursued to address the limitations of the current generation of market oriented Grids and Utility Computing systems. 相似文献

15.

Predictive performance modeling for distributed batch processing using black box monitoring and machine learning

《Information Systems》2019

In many domains, the previous decade was characterized by increasing data volumes and growing complexity of data analyses, creating new demands for batch processing on distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e. g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i. e., methods that can be applied to any workload without modification, since the workload is usually a black box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, limitations and challenges, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field. 相似文献

16.

Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems

Prakash Murali Sathish Vadhiyar 《Concurrency and Computation》2016,28(9):2685-2710

Production parallel systems are space‐shared, and resource allocation on such systems is usually performed using a batch queue scheduler. Jobs submitted to the batch queue experience a variable delay before the requested resources are granted. Predicting this delay can assist users in planning experiment time‐frames and choosing sites with less turnaround times and can also help meta‐schedulers make scheduling decisions. In this paper, we present an integrated adaptive framework, Qespera, for prediction of queue waiting times on parallel systems. We propose a novel algorithm based on spatial clustering for predictions using history of job submissions and executions. The framework uses adaptive set of strategies for choosing either distributions or summary of features to represent the system state and to compare with history jobs, varying the weights associated with the features for each job prediction, and selecting a particular algorithm dynamically for performing the prediction depending on the characteristics of the target and history jobs. Our experiments with real workload traces from different production systems demonstrate up to 22% reduction in average absolute error and up to 56% reduction in percentage prediction error over existing techniques. We also report prediction errors of less than 1 h for a majority of the jobs. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

17.

计算网格的资源分发和发现机制 总被引：1，自引：0，他引：1

武秀川鞠九滨《计算机科学》2003,30(1):12-13

1 引言计算网格的资源管理系统是为实现计算网格系统资源共享所应提供的最主要的服务之一。计算网格资源管理系统的基本功能是接受来自计算网格范围内的机器的资源请求,并且把特定的资源分配给资源请求者,并且合理地调度相应的资源,使请求资源的作业得以运行。资源分发、资源发现和资源的调度构成了计算网格资源管理系统的最主要的内容。资源分发和资源发现提供方法,通过该方法,在计算网格内部的机器能够形成一个可用的资源和其状态的一个视图。资源相似文献

18.

Multi economic agent interaction for optimizing the aggregate utility of grid users in computational grid

Li Chunlin Li Layuan 《Applied Intelligence》2006,25(2):147-158

This paper investigates the interactions between agents representing grid users and the providers of grid resources to maximize the aggregate utilities of all grid users in computational grid. It proposes a price-based resource allocation model to achieve maximized utility of grid users and providers in computational grid. Existing distributed resource allocation schemes assume the resource provider to be capable of measuring user’s resource demand, calculating and communicating price, none of which actually exists in reality. This paper addresses these challenges as follows. First, the grid user utility is defined as a function of the grid user’s the resource units allocated. We formalize resource allocation using nonlinear optimization theory, which incorporates both grid resource capacity constraint and the job complete times. An optimal solution maximizes the aggregate utilities of all grid users. Second, this paper proposes a new optimization-based grid resource pricing algorithm for allocating resources to grid users while maximizing the revenue of grid providers. Simulation results show that our proposed algorithm is more efficient than compared allocation scheme. Li Chunlin received the ME in computer science from Wuhan Transportation University in 2000, and PhD degree in Computer Software and Theory from Huazhong University of Science and Technology in 2003. She now is an associate professor of Computer Science in Wuhan University of Technology. Her research interests include computational grid, distributed computing and mobile agent. She has published over 15 papers in international journals. Li Layuan received the BE degree in Communication Engineering from Harbin Institute of Military Engineering, China in 1970 and the ME degree in Communication and Electrical Systems from Huazhong University of Science and Technology, China in 1982. Since 1982, he has been with the Wuhan University of Technology, China, where he is currently a Professor and PhD tutor of Computer Science, and Editor in Chief of the Journal of WUT. He is Director of International Society of High-Technol and Paper Reviewer of IEEE INFOCOM, ICCC and ISRSDC. His research interests include high speed computer networks, protocol engineering and image processing. Professor Li has published over 150 technical papers and is the author of six books. He also was awarded the National Special Prize by the Chinese Government in 1993. 相似文献

19.

A novel resource scheduling algorithm for QoS-aware services on the Internet

Fariza Sabrina^{Author Vitae} 《Computers & Electrical Engineering》2010,36(4):718-734

The popularity and availability of Internet connection has opened up the opportunity for network-centric collaborative work that was impossible a few years ago. Contending traffic flows in this collaborative scenario share different kinds of resources such as network links, buffers, and router CPU. The goal should hence be overall fairness in the allocation of multiple resources rather than a specific resource. In this paper, firstly, we present a novel QoS-aware resource scheduling algorithm called Weighted Composite Bandwidth and CPU Scheduler (WCBCS), which jointly allocates the fair share of the link bandwidth as well as processing resource to all competing flows. WCBCS also uses a simple and adaptive online prediction scheme for reliably estimating the processing times of the incoming data packets. Secondly, we present some analytical results, extensive NS-2 simulation work, and experimental results from our implementation on Intel IXP2400 network processor. The simulation and implementation results show that our low complexity scheduling algorithm can efficiently maximise the CPU and bandwidth utilisation while maintaining guaranteed Quality of Service (QoS) for each individual flow. 相似文献

20.

Single machine batch scheduling to minimize total completion time and resource consumption costs

Dvir Shabtay George Steiner 《Journal of Scheduling》2007,10(4-5):255-261

In this paper we study the single-machine batch scheduling problem under batch availability, where both setup and job processing times are controllable by allocating a continuously divisible nonrenewable resource. Under batch availability a set of jobs is processed contiguously and completed together, when the processing of the last job in the batch is finished. We present polynomial time algorithms to find the job sequence, the partition of the job sequence into batches and the resource allocation, which minimize the total completion time or the total production cost (inventory plus resource costs). 相似文献