首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
All existing fault-tolerance job scheduling algorithms for computational grids were proposed under the assumption that all sites apply the same fault-tolerance strategy. They all ignored that each grid site may have its own fault-tolerance strategy because each site is itself an autonomous domain. In fact, it is very common that there are multiple fault-tolerance strategies adopted at the same time in a large-scale computational grid. Various fault-tolerance strategies may have different hardware and software requirements. For instance, if a grid site employs the job checkpointing mechanism, each computation node must have the following ability. Periodically, the computational node transmits the transient state of the job execution to the server. If a job fails, it will migrate to another computational node and resume from the last stored checkpoint. Therefore, in this paper we propose a genetic algorithm for job scheduling to address the heterogeneity of fault-tolerance mechanisms problem in a computational grid. We assume that the system supports four kinds fault-tolerance mechanisms, including the job retry, the job migration without checkpointing, the job migration with checkpointing, and the job replication mechanisms. Because each fault-tolerance mechanism has different requirements for gene encoding, we also propose a new chromosome encoding approach to integrate the four kinds of mechanisms in a chromosome. The risk nature of the grid environment is also taken into account in the algorithm. The risk relationship between jobs and nodes are defined by the security demand and the trust level. Simulation results show that our algorithm has shorter makespan and more excellent efficiencies on improving the job failure rate than the Min–Min and sufferage algorithms.  相似文献   

2.
Grid computing utilizes the distributed heterogeneous resources in order to support complicated computing problems. Grid can be classified into two types: computing grid and data grid. Job scheduling in computing grid is a very important problem. To utilize grids efficiently, we need a good job scheduling algorithm to assign jobs to resources in grids.In the natural environment, the ants have a tremendous ability to team up to find an optimal path to food resources. An ant algorithm simulates the behavior of ants. In this paper, we propose a Balanced Ant Colony Optimization (BACO) algorithm for job scheduling in the Grid environment. The main contributions of our work are to balance the entire system load while trying to minimize the makespan of a given set of jobs. Compared with the other job scheduling algorithms, BACO can outperform them according to the experimental results.  相似文献   

3.
Executing large-scale applications in distributed computing infrastructures (DCI), for example modern Cloud environments, involves optimization of several conflicting objectives such as makespan, reliability, energy, or economic cost. Despite this trend, scheduling in heterogeneous DCIs has been traditionally approached as a single or bi-criteria optimization problem. In this paper, we propose a generic multi-objective optimization framework supported by a list scheduling heuristic for scientific workflows in heterogeneous DCIs. The algorithm approximates the optimal solution by considering user-specified constraints on objectives in a dual strategy: maximizing the distance to the user’s constraints for dominant solutions and minimizing it otherwise. We instantiate the framework and algorithm for a four-objective case study comprising makespan, economic cost, energy consumption, and reliability as optimization goals. We implemented our method as part of the ASKALON environment (Fahringer et al., 2007) for Grid and Cloud computing and demonstrate through extensive real and synthetic simulation experiments that our algorithm outperforms related bi-criteria heuristics while meeting the user constraints most of the time.  相似文献   

4.
如何进一步实现云计算环境下的资源利用最大化是目前研究的热点.建立云计算环境下的资源分配模型,云计算资源调度使用蝙蝠算法,同时引入膜计算概念,提出一种基于膜计算的蝙蝠算法,将膜系统内部分解为主膜和辅助膜,在辅助膜内进行蝙蝠的个体局部寻优,将优化后的个体传送到主膜间进行全局优化,从而达到了云计算资源优化分配要求.通过CloudSim平台与其他算法进行仿真对比表明算法提高了云计算环境下的系统处理时间和效率,使得云计算环境下的资源分配更加合理.  相似文献   

5.
In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dynamic contents of the distributed caching infrastructure. In this paper, we propose and discuss several distributed query scheduling policies that directly consider the available cache contents by employing distributed multidimensional indexing structures and an exponential moving average approach to predicting cache contents. These approaches are shown to produce better query plans and faster query response times than traditional scheduling policies that do not predict dynamic contents in distributed caches. We experimentally demonstrate the utility of the scheduling policies using MQO, which is a distributed, Grid-enabled, multiple query processing middleware system we developed to optimize query processing for data analysis and visualization applications.  相似文献   

6.
A hybrid genetic algorithm for the job shop scheduling problems   总被引:19,自引:0,他引:19  
The Job Shop Scheduling Problem (JSSP) is one of the most general and difficult of all traditional scheduling problems. The goal of this research is to develop an efficient scheduling method based on genetic algorithm to address JSSP. We design a scheduling method based on Single Genetic Algorithm (SGA) and Parallel Genetic Algorithm (PGA). In the scheduling method, the representation, which encodes the job number, is made to be always feasible, the initial population is generated through integrating representation and G&T algorithm, the new genetic operators and selection method are designed to better transmit the temporal relationships in the chromosome, and island model PGA are proposed. The scheduling methods based on genetic algorithm are tested on five standard benchmark JSSP. The results are compared with other proposed approaches. Compared to traditional genetic algorithm, the proposed approach yields significant improvement in solution quality. The superior results indicate the successful incorporation of a method to generate initial population into the genetic operators.  相似文献   

7.
云计算环境下的资源合理调度是当前的研究热点,针对粒子群优化算法的不足,引入膜计算理论,提出一种基于膜计算改进粒子群优化算法的云资源调度算法(PSO-MC)。对云资源调度问题进行分析,建立云资源调度的目标函数,受到膜计算的启发,将粒子放入膜中,主膜内粒子进行精细化局部寻优,辅助膜内的粒子进行全局搜索,通过膜区域之间信息传递搜索结果,找到云资源调度问题的最优解,在CloudSim平台对算法进行仿真实验。结果表明,PSO-MC算法减少了任务的平均完成时间,提高了任务处理的效率,使云计算资源调度更加合理。  相似文献   

8.
徐骁勇  潘郁  凌晨 《计算机应用》2012,32(7):1913-1915
在云计算环境下,如何在有效地进行资源调度,缩短任务执行时间的同时,降低能耗,已经成为一个重要问题。对此,以任务执行时间与能耗作为优化目标,建立了一个节能调度模型,并通过采用特殊的种群初始化方法以及引入学习机制等方法对非支配排序遗传算法(NSGA-Ⅱ)进行改进,将其应用于云计算的节能调度问题。最后通过算例测试,验证了所提算法能够在减少任务执行时间的同时,有效降低能耗。  相似文献   

9.
By using the notion of elite pool, this paper presents an effective asexual genetic algorithm for solving the job shop scheduling problem. Based on mutation operations, the algorithm selectively picks the solution with the highest quality from the pool and after its modification, it can replace the solution with the lowest quality with such a modified solution. The elite pool is initially filled with a number of non-delay schedules, and then, in each iteration, the best solution of the elite pool is removed and mutated in a biased fashion through running a limited tabu search procedure. A decision strategy which balances exploitation versus exploration determines (i) whether any intermediate solution along the run of tabu search should join the elite pool, and (ii) whether upon joining a new solution to the pool, the worst solution should leave the pool. The genetic algorithm procedure is repeated until either a time limit is reached or the elite pool becomes empty. The results of extensive computational experiments on the benchmark instances indicate that the success of the procedure significantly depends on the employed mechanism of updating the elite pool. In these experiments, the optimal value of the well-known 10 × 10 instance, ft10, is obtained in 0.06 s. Moreover, for larger problems, solutions with the precision of less than one percent from the best known solutions are achieved within several seconds.  相似文献   

10.
Large scale distributed systems typically comprise hundreds to millions of entities (applications, users, companies, universities) that have only a partial view of resources (computers, communication links). How to fairly and efficiently share such resources between entities in a distributed way has thus become a critical question.  相似文献   

11.
Assembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, and greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software.  相似文献   

12.
This paper investigates the design of fault-tolerant TDMA-based data aggregation scheduling (DAS) protocols for wireless sensor networks (WSNs). DAS is a fundamental pattern of communication in wireless sensor networks where sensor nodes aggregate and relay data to a sink node. However, any such DAS protocol needs to be cognisant of the fact that crash failures can occur. We make the following contributions: (i) we identify a necessary condition to solve the DAS problem, (ii) we introduce a strong and weak version of the DAS problem, (iii) we show several impossibility results due to the crash failures, (iv) we develop a modular local algorithm that solves stabilising weak DAS and (v) we show, through simulations and an actual deployment on a small testbed, how specific instantiations of parameters can lead to the algorithm achieving very efficient stabilisation.  相似文献   

13.
Easy proofs are given, of the impossibility of solving several consensus problems (Byzantine agreement, weak agreement, Byzantine firing squad, approximate agreement and clock synchronization) in certain communication graphs.It is shown that, in the presence ofm faults, no solution to these problems exists for communication graphs with fewer than 3m+1 nodes or less than 2m+1 connectivity. While some of these results had previously been proved, the new proofs are much simpler, provide considerably more insight, apply to more general models of computation, and (particularly in the case of clock synchronization) significantly strengthen the results.Michael J. Fischer is currently Professor of Computer Science at Yale University, New Haven, CT, where he heads the Theory of Computation Group. He is also Editor-in-Chief of the Journal of the Association for Computing Machinery. His research interests include theory of distributed systems, cryptographic protocols, and computational complexity.Dr. Fischer received the B. S. degree in matheamtics from the University of Michigan, Ann Arbor, in 1963, and the M. A. and Ph. D. degrees in applied mathematics from Harvard University, Cambridge, MA, in 1965 and 1968, respectively. He has taught previously at Carnegie-Mellon University, the Massachusetts Institute of Technology, and University of Washington.Nancy Lynch is currently Associate professor of Computer Science at M.I.T., and heads the Theory of Distributed Systems group in M.I.T.'s Laboratory for Computer Science. Her interests are in all aspects of distributed computing theory, including formal models, algorithms, analysis, and correctness proofs. Dr. Lynch received the B.S. degree in mathematics from Brooklyn College in 1968 and the Ph. D. degree in mathematics from M.I.T. in 1972. She has served on the faculty of Tufts University, the University of Southern California, Florida International University, Georgia Tech.Michael Merritt is currently a member of the technical staff with AT&T Bell Laboratories. During the 1984 –85 academic year, he was a visiting lecturer at M.I.T., sponsered by Bell Labs. His research interests include distributed computation, cryptography and security. Dr. Merritt received the B. S. degree in computer science and philosophy from Yale in 1978 and the M. Sc. and Ph. D. degrees in 1980 and 1983, respectively, both in information and computer science from Georgia Tech. He is a member of SIGACT and of Computer Professionals for Social Responsibility.This paper has appeared in the ACM Conference Proceedings of PODC 1985. © 1985, Association for Computing Machinery, reprinted by permission  相似文献   

14.
In its simplest structure, cloud computing technology is a massive collection of connected servers residing in a datacenter and continuously changing to provide services to users on-demand through a front-end interface. The failure of task during execution is no more an accident but a frequent attribute of scheduling systems in a large-scale distributed environment. Recently, some computational intelligence techniques have been mostly utilized to decipher the problems of scheduling in the cloud environment, but only a few emphasis on the issue of fault tolerance. This research paper puts forward a Checkpointed League Championship Algorithm (CPLCA) scheduling scheme to be used in the cloud computing system. It is a fault-tolerance aware task scheduling mechanisms using the checkpointing strategy in addition to tasks migration against unexpected independent task execution failure. The simulation results show that, the proposed CPLCA scheme produces an improvement of 41%, 33% and 23% as compared with the Ant Colony Optimization (ACO), Genetic Algorithm (GA) and the basic league championship algorithm (LCA) respectively as parametrically measured using the total average makespan of the schemes. Considering the total average response time of the schemes, the CPLCA scheme produces an improvement of 54%, 57% and 30% as compared with ACO, GA and LCA respectively. It also turns out significant failure decrease in jobs execution as measured in terms of failure metrics and performance improvement rate. From the results obtained, CPLCA provides an improvement in both tasks scheduling performance and failure awareness that is more appropriate for scheduling in the cloud computing model.  相似文献   

15.
16.
In fault diagnosis, the set of minimal diagnoses is commonly calculated. However, due to for example limited computation resources, the search for the set of minimal diagnoses is in some applications focused on to the smaller set of diagnoses with minimal cardinality. The key contribution in this paper is an algorithm that calculates the diagnoses with minimal cardinality in a distributed system. The algorithm is constructed such that the computationally intensive tasks are distributed to the different units in the distributed system, and thereby reduces the need for a powerful central diagnostic unit.  相似文献   

17.
This paper deals with the problem of distributed job shop scheduling in which the classical single-facility job shop is extended to the multi-facility one. The mathematical formulation of the problem is comprehensively discussed. Two different mixed integer linear programming models in form of sequence and position based variables are proposed. Using commercial software of CPLEX, the small sized problems are optimally solved. To solve large sized problems, besides adapting three well-known heuristics, three greedy heuristics are developed. The basic idea behind the developed heuristics is to iteratively insert operations (one at each iteration) into a sequence to build up a complete permutation of operations. The permutation scheme, although having several advantages, suffers from redundancy which is having many different permutations representing the same schedule. The issue is analyzed to recognize the redundant permutation. That improves efficiency of heuristics. Comprehensive experiments are conducted to evaluate the performance of the two models and the six heuristics. The results show sequence based model and greedy heuristics equipped with redundancy exclusion are effective for the problem.  相似文献   

18.
Job shop scheduling problem is a typical NP-hard problem. To solve the job shop scheduling problem more effectively, some genetic operators were designed in this paper. In order to increase the diversity of the population, a mixed selection operator based on the fitness value and the concentration value was given. To make full use of the characteristics of the problem itself, new crossover operator based on the machine and mutation operator based on the critical path were specifically designed. To find the critical path, a new algorithm to find the critical path from schedule was presented. Furthermore, a local search operator was designed, which can improve the local search ability of GA greatly. Based on all these, a hybrid genetic algorithm was proposed and its convergence was proved. The computer simulations were made on a set of benchmark problems and the results demonstrated the effectiveness of the proposed algorithm.  相似文献   

19.
This paper studies the problem of broadcasting in synchronous point-to-point networks, where one initiator owns a piece of information that has to be transmitted to all other vertices as fast as possible. The model of fractional dynamic faults with threshold is considered: in every step either a fixed number c(G)−1c(G)1, where c(G)c(G) is the edge connectivity of the communication graph, or a fraction αα of sent messages can be lost depending on which quantity is larger.  相似文献   

20.
This paper presents a dynamic scheduling for real-time tasks in multicore processors to tolerate single and multiple transient faults. The scheduling is performed based on three important issues: (1) current released tasks, (2) current available processor cores, and (3) consideration of the number of faults and their occurrences. Using tasks utilization along with a defined criticality threshold in the proposed scheduling method, current ready tasks are divided into critical- and noncritical ones. Based on whether a task is critical or noncritical, an appropriate fault-tolerance policy is exploited. Moreover, scheduling decisions are made to fulfill two key goals: (1) increasing scheduling feasibility and (2) decreasing the total tasks execution time. Several simulation experiments are carried out to compare the proposed method with two well-known methods, called checkpointing with rollback recovery and hardware replication. Experimental results reveal that in the presence of multiple transient faults, the feasibility rate of the proposed method is considerably higher than the other well-known fault-tolerance methods. Moreover, the average timing overhead of this method is lower than the traditional methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号