Scheduling large-scale applications in heterogeneous distributed computing systems is a fundamental NP-complete problem that is critical to obtaining good performance and execution cost. In this paper, we address the scheduling problem of an important class of large-scale Grid applications inspired by the real world, characterized by a huge number of homogeneous, concurrent, and computationally intensive tasks that are the main sources of performance, cost, and storage bottlenecks. We propose a new formulation of this problem based on a cooperative distributed game-theory-based method applied using three algorithms with low time complexity for optimizing three important metrics in scientific computing: execution time, economic cost, and storage requirements. We present comprehensive experiments using simulation and real-world applications that demonstrate the effectiveness of our approach in terms of time and fairness compared to other related algorithms.  相似文献   

This work presents a novel parallel micro evolutionary algorithm for scheduling tasks in distributed heterogeneous computing and grid environments. The scheduling problem in heterogeneous environments is NP-hard, so a significant effort has been made in order to develop an efficient method to provide good schedules in reduced execution times. The parallel micro evolutionary algorithm is implemented using MALLBA, a general-purpose library for combinatorial optimization. Efficient numerical results are reported in the experimental analysis performed on both well-known problem instances and large instances that model medium-sized grid environments. The comparative study of traditional methods and evolutionary algorithms shows that the parallel micro evolutionary algorithm achieves a high problem solving efficacy, outperforming previous results already reported in the related literature, and also showing a good scalability behavior when facing high dimension problem instances.  相似文献   

Contemporary operating systems for single-ISA (instruction set architecture) multi-core systems attempt to distribute tasks equally among all the CPUs. This approach works relatively well when there is no difference in CPU capability. However, there are cases in which CPU capability differs from one another. For instance, static capability asymmetry results from the advent of new asymmetric hardware, and dynamic capability asymmetry comes from the operating system (OS) outside noise caused from networking or I/O handling. These asymmetries can make it hard for the OS scheduler to evenly distribute the tasks, resulting in less efficient load balancing. In this paper, we propose a user-level load balancer for parallel applications, called the ’capability balancer’, which recognizes the difference of CPU capability and makes subtasks share the entire CPU capability fairly. The balancer can coexist with the existing kernel-level load balancer without detrimenting the behavior of the kernel balancer. The capability balancer can fairly distribute CPU capability to tasks with very little overhead. For real workloads like the NAS Parallel Benchmark (NPB), we have accomplished speedups of up to 9.8% and 8.5% in dynamic and static asymmetries, respectively. We have also experienced speedups of 13.3% for dynamic asymmetry and 24.1% for static asymmetry in a competitive environment. The impacts of our task selection policies, FIFO (first in, first out) and cache, were compared. The use of the cache policy led to a speedup of 5.3% in overall execution time and a decrease of 4.7% in the overall cache miss count, compared with the FIFO policy, which is used by default.  相似文献   

The effectiveness of loop self-scheduling schemes has been shown on traditional multiprocessors in the past and computing clusters in the recent years. However, parallel loop scheduling has not been widely applied to computing grids, which are characterized by heterogeneous resources and dynamic environments. In this paper, a performance-based approach, taking the two characteristics above into consideration, is proposed to schedule parallel loop iterations on grid environments. Furthermore, we use a parameter, SWR, to estimate the proportion of the workload which can be scheduled statically, thus alleviating the effect of irregular workloads. Experimental results on a grid testbed show that the proposed approach can reduce the completion time for applications with regular or irregular workloads. Consequently, we claim that parallel loop scheduling can benefit applications on grid environments.  相似文献   

大规模数据分析环境中,经常存在一些持续时间较短、并行度较大的任务。如何调度这些低延迟要求的并发作业是目前研究的一个热点。现有的一些集群资源管理框架中,集中式调度器由于主节点的瓶颈无法达到低延迟的要求,而一些分布式调度器虽然达成了低延迟的任务调度,但在最优资源分配以及资源分配冲突方面存在一定的不足。从大规模实时作业的需求出发,设计和实现了一个分布式的集群资源调度框架,以满足大规模数据处理的低延迟要求。首先提出了两阶段调度框架以及优化后的两阶段多路调度框架;然后针对两阶段多路调度过程中存在的一些资源冲突问题,提出了基于负载平衡的任务转移机制,从而解决了各个计算节点的负载不平衡问题;最后使用实际负载以及一个模拟调度器对大规模集群中的任务调度框架进行了模拟和验证。对于实际负载,所提框架的调度延迟控制在理想调度的12%以内;在模拟环境下,该框架与集中式调度器相比在短时间任务的延迟上能够减少40%以上。  相似文献   

The mapping of tasks of a parallel program onto nodes of a parallel computing system has a remarkable impact on application performance. In this paper we propose an optimization framework to solve the mapping problem, which takes into account the communication matrix of the application and a cost matrix that depends on the topology of the parallel system. This cost matrix is usually a distance matrix (the classic approach), but we propose a novel definition of the cost criterion, applicable to torus networks, that tries to distribute traffic evenly over the different axes; we call this the Traffic Distribution criterion. As the mapping problem can be seen as a particular instance of the Quadratic Assignment Problem (QAP), we can apply any QAP solver to this problem. In particular, we use a greedy randomized algorithm. Using simulation, we test the performance levels of the optimization-based mappings, and compare them with those of trivial mappings (consecutive, random), in two different environments: single application (one application uses all system resources all the time) and space sharing (several applications run simultaneously, on different system partitions), using systems with 2D and 3D topologies and real application traffic. Experimental results show that some applications do not benefit from optimization-based mappings: those in which there is a match between virtual and physical topologies, and those that carry out massive all-to-all communications. In other cases, optimization-based mappings with the TD criterion provide excellent performance levels.  相似文献   

This paper suggests a hybrid resource management approach for efficient parallel distributed computing on the Grid. It operates on both application and system levels, combining user-level job scheduling with dynamic workload balancing algorithm that automatically adapts a parallel application to the heterogeneous resources, based on the actual resource parameters and estimated requirements of the application. The hybrid environment and the algorithm for automated load balancing are described, the influence of resource heterogeneity level is measured, and the speedup achieved with this technique is demonstrated for different types of applications and resources.  相似文献   

Although various strategies have been developed for scheduling parallel applications with independent tasks, very little work exists for scheduling tightly coupled parallel applications on cluster environments. In this paper, we compare four different strategies based on performance models of tightly coupled parallel applications for scheduling the applications on clusters. In addition to algorithms based on existing popular optimization techniques, we also propose a new algorithm called Box Elimination that searches the space of performance model parameters to determine the best schedule of machines. By means of real and simulation experiments, we evaluated the algorithms on single cluster and multi‐cluster setups. We show that our Box Elimination algorithm generates up to 80% more efficient schedules than other algorithms. We also show that the execution times of the schedules produced by our algorithm are more robust against the performance modeling errors. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

Advances in network technologies and the emergence of Grid computing have both increased the need and provided the infrastructure for computation and data intensive applications to run over collections of heterogeneous and autonomous nodes. In the context of database query processing, existing parallelisation techniques cannot operate well in Grid environments because the way they select machines and allocate tasks compromises partitioned parallelism. The main contribution of this paper is the proposal of a low-complexity, practical resource selection and scheduling algorithm that enables queries to employ partitioned parallelism, in order to achieve better performance in a Grid setting. The evaluation results show that the scheduler proposed outperforms current techniques without sacrificing the efficiency of resource utilisation. Recommended by: Ioannis Vlahavas  相似文献   

Grids consist of both dedicated and non-dedicated clusters. For effective mapping of parallel applications on grid resources, a grid metascheduler has to evaluate different sets of resources in terms of predicted execution times for the applications when executed on the sets of resources. In this work, we have developed a comprehensive set of performance modeling strategies for predicting execution times of parallel applications on both dedicated and non-dedicated environments. Our strategies adapt to changing network and CPU loads on the grid resources. We have evaluated our strategies on 8, 16, 24 and 32-node clusters with random loads and load traces from a grid system. Our strategies give less than 30% average percentage prediction errors in all cases, which, to our knowledge, is the best reported for non-dedicated environments. We also found that grid scheduling using predictions of execution times from our performance modeling techniques will lead to perfect mapping of applications to resources in many cases.  相似文献   

Scheduling of message passing for synchronous communication is found to be equivalent to colouring the edges of a graph without conflict. The graph edge-colouring problem, which has other applications, is studied. An algorithm which colours the graph with no more than deg + 1 colours, where deg is the degree of the graph, is implemented. The problem of minimising the sum of the largest weight for each colour is also investigated and an algorithm suggested. These algorithms are used to organise the communication as part of a finite element Euler solver. Different communication schemes and their effect on the performance of the flow solver are compared.  相似文献   

An efficient parallel algorithm to obtain maximum matchings in convex bipartite graphs is developed. This algorithm can be used to obtain efficient parallel algorithms for several scheduling problems. Some examples are: job scheduling with release times and deadlines; scheduling to minimize maximum cost; and preemptive scheduling to minimize maximum completion time.  相似文献   

Contemporary parallel debuggers allow users to control more than one processing thread while supporting the same examination and visualisation operations of that of sequential debuggers. This approach restricts the use of parallel debuggers when it comes to large scale scientific applications run across hundreds of thousands compute cores. First, manually observing the runtime data to detect error becomes impractical because the data is too big. Second, performing expensive but useful debugging operations becomes infeasible as the computational codes become more complex, involving larger data structures, and as the machines become larger. This study explores the idea of a data‐centric debugging approach, which could be used to make parallel debuggers more powerful. It discusses the use of ad hoc debug‐time assertions that allow a user to reason about the state of a parallel computation. These assertions support the verification and validation of program state at runtime as a whole rather than focusing on that of only a single process state. Furthermore, the debugger's performance can be improved by exploiting the underlying parallel platform because the available compute cores can execute parallel debugging functions, while a program is idling at a breakpoint. We demonstrate the system with several case studies and evaluate the performance of the tool on a 20 000 cores Cray XE6. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

在大规模的Hadoop集群中,良好的任务调度策略对提高数据本地性、减小网络传输开销、减少作业执行时间以及提高集群的作业吞吐量都有着重要的影响。本文针对Hadoop架构中Reduce任务的数据本地性较低问题,提出了一种基于延迟调度策略的Reduce任务调度优化算法,通过提高Reduce任务的数据本地性来减少作业执行时间以及提高作业吞吐量,该算法在Hadoop架构的Early Shuffle阶段,使用多级延迟调度策略来提高Reduce任务的数据本地性。最后重写原生公平调度器代码实现了该调度算法,并与原生公平调度器进行了对比实验分析,实验结果表明该算法明显减少了作业执行时间,提高了集群的作业吞吐量。  相似文献   

Scheduling large-scale application in heterogeneous grid systems is a fundamental NP-complete problem that is critical to obtain good performance and execution cost. To achieve high performance in a grid system it requires effective task partitioning, resource management and load balancing. The heterogeneous and dynamic nature of a grid, as well as the diverse demands of applications running on the grid, makes grid scheduling a major task. Existing schedulers in wide-area heterogeneous systems require a large amount of information about the application and the grid environment to produce reasonable schedules. However, this required information may not be available, may be too expensive to collect, or may increase the runtime overhead of the scheduler such that the scheduler is rendered ineffective. We believe that no one scheduler is appropriate for all grid systems and applications. This is because while data parallel applications in which further data partitioning is possible can be further improved by efficient management of resources, smart selection of resources and load balancing can be possible, in functional/not-dividable-task parallel applications such partitioning is either not possible or difficult or expensive in term of performance. In this paper, we propose a scheduler for data parallel applications (SDPA) which offers an efficient task partitioning and load balancing strategy for data parallel applications in grid environment. The proposed SDPA offers two major features: maintaining job priority even if insufficient number of free resources is available and pre-task assignment to cut the idle time of nodes. The SDPA selects nodes smartly according to the nature of task and the nodes’ resources availability. Simulation results conducted reveal that SDPA achieves performance improvement over reported strategies in the reviewed literature in terms of execution time, throughput and waiting time.  相似文献   

Data-intensive Grid applications need access to large data sets that may each be replicated on different resources. Minimizing the overhead of transferring these data sets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we consider the problem of scheduling an application composed of a set of independent tasks, each of which requires multiple data sets that are each replicated on multiple resources. We break this problem into two parts: one, to match each task (or job) to one compute resource for executing the job and one storage resource each for accessing each data set required by the job and two, to assign the set of tasks to the selected resources. We model the first part as an instance of the well-known Set Covering Problem (SCP) and apply a known heuristic for SCP to match jobs to resources. The second part is tackled by extending existing MinMin and Sufferage algorithms to schedule the set of distributed data-intensive tasks. Through simulation, we experimentally compare the SCP-based matching heuristic to others in conjunction with the task scheduling algorithms and present the results.  相似文献   

In this work, we tackle the problem of scheduling a set of jobs on a set of unrelated parallel machines with minimising the total weighted completion times as performance criteria. The iterated greedy metaheuristic generates a sequence of solutions by iterating over a constructive heuristic using destruction and construction phases. In the last few years, iterated greedy has been employed to solve a considerable number of problems. This is because it is based on a very simple principle, it is easy to implement, and it often exhibits an excellent performance. Moreover, scalability for high-dimensional problems becomes an essential requirement for modern optimisation algorithms. This paper proposes an iterated greedy model for the above-mentioned scheduling problem to tackle large-size instances. The benefits of our proposal in comparison to existing metaheuristics proposed in the literature are experimentally shown.  相似文献   

为解决谱聚类在大规模数据集上存在的计算耗时和无法聚类等性能瓶颈制约,提出了基于Spark技术的大规模数据集谱聚类的并行化算法。首先,通过单向循环迭代优化相似矩阵的构建,避免重复计算;然后,通过位置变换和标量乘法替换来优化Laplacian矩阵的构建与正规化,降低存储需求;最后,采用近似特征向量计算来进一步减少计算量。不同测试数据集上的实验结果表明:随着测试数据集的规模增加,所提算法的单向循环迭代和近似特征值计算的运行时间呈线性增长,增长缓慢,其近似特征向量计算与精确特征向量计算取得相近的聚类效果,并且算法在大规模数据集上表现出良好的可扩展性。在获得较好的谱聚类性能的基础上,改进算法提高了运行效率,有效缓解了谱聚类的计算耗时及无法聚类问题。  相似文献   

In this paper we investigate parallel searches on m concurrent rays for a point target t located at some unknown distance along one of the rays. A group of p agents or robots moving at unit speed searches for t. The search succeeds when an agent reaches the point t. Given a strategy S the competitive ratio is the ratio of the time needed by the agents to find t using S and the time needed if the location of t had been known in advance. We provide a strategy with competitive ratio of 1+2(m/p−1)(m/(mp))m/p and prove that this is optimal. This problem has applications in multiple heuristic searches in AI as well as robot motion planning. The case p = 1 is known in the literature as the cow path problem.  相似文献   

