期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Requirement-aware strategies for scheduling real-time divisible loads on clusters

Menglan Hu Bharadwaj Veeravalli 《Journal of Parallel and Distributed Computing》2013

This paper investigates the real-time scheduling problem for handling heterogeneous divisible loads on cluster systems. Divisible load applications occur in many fields of science and engineering. Such applications can be easily parallelized in a master–worker fashion, but pose several scheduling challenges. We consider divisible loads associated with deadlines to enhance quality-of-service (QoS) and provide performance guarantees in distributed computing environments. In addition, since the divisible loads to be performed may widely vary in terms of their required hardware and software, we capture the loads’ various processing requirements in our load distribution strategies, a unique feature that is applicable for running proprietary applications only on certain eligible processing nodes. Thus in our problem formulation each load can only be processed by certain processors as both the loads and processors are heterogeneous. We propose scheduling algorithms referred to as Requirements-Aware Real-Time Scheduling (RARTS) algorithms, which consist of a novel scheduling policy, referred to as Minimum Slack Capacity First (MSCF), and two multi-round load distribution strategies, referred to as All Eligible Processors (AEP) and Least Capability First (LCF). We perform rigorous performance evaluation studies to quantify the performance of our strategies on a variety of scenarios. 相似文献

2.

Scheduling divisible loads on heterogeneous linear daisy chain networks with arbitrary processor release times

Veeravalli B. Wong Han Min 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(3):273-288

The problem of distributing and processing a divisible load in a heterogeneous linear network of processors with arbitrary processors release times is considered. A divisible load is very large in size and has computationally intensive CPU requirements. Further, it has the property that the load can be partitioned arbitrarily into any number of portions and can be scheduled onto processors independently for computation. The load is assumed to arrive at one of the farthest end processors, referred to as boundary processors, for processing. The processors in the network are assumed to have nonzero release times, i.e., the time instants from which the processors are available for processing the divisible load. Our objective is to design a load distribution strategy by taking into account the release times of the processors in such a way that the entire processing time of the load is a minimum. We consider two generic cases in which all processors have identical release times and when all processors have arbitrary release times. We adopt both the single and multiinstallment strategies proposed in the divisible load scheduling literature in our design of load distribution strategies, wherever necessary, to achieve a minimum processing time. Finally, when optimal strategies cannot be realized, we propose two heuristic strategies, one for the identical case, and the other for nonidentical release times case, respectively. Several conditions are derived to determine whether or not optimal load distribution exists and illustrative examples are provided for the ease of understanding. 相似文献

3.

A new load distribution strategy for linear network with communication delays

S. Suresh V. Mani S.N. Omkar H.J. Kim N. Sundararajan 《Mathematics and computers in simulation》2009

In this paper, we propose a new load distribution strategy called ‘send-and-receive’ for scheduling divisible loads, in a linear network of processors with communication delay. This strategy is designed to optimally utilize the network resources and thereby minimizes the processing time of entire processing load. A closed-form expression for optimal size of load fractions and processing time are derived when the processing load originates at processor located in boundary and interior of the network. A condition on processor and link speed is also derived to ensure that the processors are continuously engaged in load distributions. This paper also presents a parallel implementation of ‘digital watermarking problem’ on a personal computer-based Pentium Linear Network (PLN) topology. Experiments are carried out to study the performance of the proposed strategy and results are compared with other strategies found in literature. 相似文献

4.

并行燃烧数值模拟计算优化——面向自适应非结构网格的动态负载平衡方法

王姝王小鸽杨广文《计算机工程与应用》2013,(21):220-225

燃烧数值模拟计算通常采用非结构网格模拟计算区域。在非结构网格上进行并行模拟计算时,其自适应方式使得不同进程上的计算负载频繁变动,且差异巨大,导致并行计算效率低下。为了提高并行计算的效率,一个有效的方法是采用动态负载平衡技术。提出一种针对燃烧的化学反应状态的动态负载平衡方法,该方法采用不同策略对化学反应不同阶段各进程上的计算负载进行预测,根据预测结果平均进程间的计算任务,达到负载平衡。实验分析表明,该方法能有效地降低进程间的负载不平衡程度,使得模拟计算的总体运行时间降低了10％。相似文献

5.

Competitive Analysis of Network Load Balancing

Xiaotie Deng Hai-Ning Liu JunSheng Long Bing Xiao 《Journal of Parallel and Distributed Computing》1997,40(2):360

This paper presents a theoretical analysis of the Load Balancing Problem (LBP) in a network of processing units. The performance objective is to minimize the makespan, i.e., the time spent to finish all jobs in a network of processing units. Because of the communication delay that results from the network topology, it is impossible to have a strategy which obtains the exact optimum under all load distributions. Instead, we measure the information efficiency of a load balancing policy by the worst case ratio of the solution (for each load distribution) of a load balancing policy to the optimal solution (for the same load distribution) assuming that processors have complete information about the load distribution over the network. This ratio is called the competitive ratio of this strategy [17, 24, 34]. In particular, a policy is calledcompetitiveif this ratio is bounded by a constant. As a first step, we discuss the centralized LBP, where all the processors have complete information of the load distribution over a network. Its solution serves as a benchmark to compare with realistic strategies, both in theoretical analysis, and experimental and simulational studies of distributed algorithms. We show that when jobs have different sizes, even with preemptive scheduling, LBP is NP–complete. When the jobs are of the same size, we give a polynomial algorithm, using network–flow techniques, which extends to approximate solutions for jobs of different sizes. We apply this benchmark solution in order to analyze the competitiveness for three network topologies: completely connected graphs, rings, and hierarchical completek-ary trees. The constant competitive ratio results for complete network and hierarchical completek-ary trees are applied to a study on the issues of network designs suitable for the LBP. We further discuss the problem for general networks with jobs of different sizes for slightly weaker results than those for the constant competitive ratio requirement. Finally, we comment on the related issues of job partitioning over parallel/distributed systems. 相似文献

6.

A Heuristic Algorithm for Task Scheduling Based on Mean Load on Grid

下载免费PDF全文

Li-Na Ni Jin-Quan Zhang Chun-Gang Yan and Chang-Jun Jiang 《计算机科学技术学报》2006,21(4):559-564

Efficient task scheduling is critical to achieving high performance on grid computing environment. The task scheduling on grid is studied as optimization problem in this paper. A heuristic task scheduling algorithm satisfying resources load balancing on grid environment is presented. The algorithm schedules tasks by employing mean load based on task predictive execution time as heuristic information to obtain an initial scheduling strategy. Then an optimal scheduling strategy is achieved by selecting two machines satisfying condition to change their loads via reassigning their tasks under the heuristic of their mean load. Methods of selecting machines and tasks are given in this paper to increase the throughput of the system and reduce the total waiting time. The efficiency of the algorithm is analyzed and the performance of the proposed algorithm is evaluated via extensive simulation experiments. Experimental results show that the heuristic algorithm performs significantly to ensure high load balancing and achieve an optimal scheduling strategy almost all the time. Furthermore, results show that our algorithm is high efficient in terms of time complexity. 相似文献

7.

Performance limits of divisible load processing in systems with limited communication buffers

《Journal of Parallel and Distributed Computing》2004,64(8):960-973

In this work, we study influence of limited size of communication buffer on the efficiency of divisible loads processing. Divisible loads are computations which can be divided into parts of arbitrary sizes, and these parts can be processed in parallel. To finish processing in the shortest possible time an optimum distribution of the load must be calculated. The method of determining load distribution must take into account not only computing speed, but also interconnection system topology, communication medium speed and startup time. In this work, we include one more parameter: communication buffer size. We propose a general method of studying the influence of the communication buffer size on the interaction between the communication and computations. Three archetypal interconnection topologies are examined: stars, ordinary trees, and binomial trees. The results of modeling the performance of parallel systems show that the influence of communication buffer size is indirect and qualitative in nature. Buffer size affects the performance by causing message fragmentation, or changing load balance among the processors. We analyze performance of several communication algorithms and their interaction with the computations. The simulations show that these classic algorithms are limited. 相似文献

8.

Task Allocation in a Multi-Server System

Sem Borst Onno Boxma Jan Friso Groote Sjouke Mauw 《Journal of Scheduling》2003,6(5):423-436

We consider a slotted queueing system with C servers (processors) that can handle tasks (jobs). Tasks arrive in batches of random size at the start of every slot. Any task can be executed by any server in one slot with success probability . If a task execution fails, then the task must be handled in some later time slot until it has been completed successfully. Tasks may be processed by several servers simultaneously. In that case, the task is completed successfully if the task execution is successful on at least one of the servers.We examine the impact of various allocation strategies on the mean number of tasks in the system and the mean response time of tasks. It is proven that both these performance measures are minimized by the strategy which always distributes the tasks over the servers as evenly as possible. Subsequently, we determine the distribution of the number of tasks in the system for a broad class of task allocation strategies, which includes the above optimal strategy as a special case. Some numerical experiments are performed to illustrate the performance characteristics of the various strategies. 相似文献

9.

Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach

Veeravalli Han Min 《Journal of Parallel and Distributed Computing》2009,69(10):854-865

In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high degree of parallelism. Using the DLT paradigm, we propose a strategy in which we carefully partition the computation work load among the processors in the system so as to minimize the overall computation time of determining the maximum similarity between the DNA/protein sequences. We consider handling such a computational problem on networked computing platforms connected as a linear daisy chain. We derive the individual load quantum to be assigned to the processors according to computation and communication link speeds along the chain. We consider two cases of sequence alignment where post-processes, i.e., trace-back processes that are required to determine an optimal alignment, may or may not be done at individual processors in the system. We derive some critical conditions to determine if our strategies are able to yield an optimal processing time. We apply three different heuristic strategies proposed in the literature to generate sub-optimal solutions for processing time when the above conditions cannot be satisfied. To testify the proposed schemes, we use real-life DNA samples of house mouse mitochondrion and the DNA of human mitochondrion obtained from the public database GenBank [GenBank, http://www.ncbi.nlm.nih.gov] in our simulation experiments. By this study, we conclusively demonstrate the applicability and potential of the DLT paradigm to such biological sequence related computational problems. 相似文献

10.

A beam search heuristics to solve the parcel hub scheduling problem

Douglas L. McWilliams Maranda E. McBride 《Computers & Industrial Engineering》2012

In this paper, a beam search scheduling heuristic (BSSH) is presented to solve the parcel hub scheduling problem (PHSP), which is a scheduling problem that is common in the parcel delivery industry (PDI). Companies in the PDI include the United States Postal Service, United Parcel Services, Federal Express, and Deutsche Post. Together, these companies move more than one trillion dollars of the United States’ Gross Domestic Product. The PHSP involves scheduling a set of inbound trailers each containing a set of heterogeneous parcels to a set of unload docks. At the unload docks, the parcels are unloaded, sorted, and moved to the appropriate outbound trailers at the load docks. At the load docks, the parcels are loaded onto the outbound trailers. The objective is to minimize the timespan of the transfer operation at the transshipment terminal. The BSSH is compared to various scheduling approaches: random scheduling algorithm (RSA), genetic-based scheduling algorithm (GBSA), and simulation-based scheduling algorithm (SBSA). While GBSA and SBSA offer solutions that are superior to BSSH for smaller size problems, BSSH outperforms these algorithms on larger size problems requiring much less computational time. The results show that for larger size problems the BSSH is able to produce solutions that are from 4% to 8% of the known optimum solutions. In contrast, GBSA and SBSA, respectively offer solutions from 23% to 38% and from 6% to 47% of the known optimum solutions. The contribution of this paper is a scheduling heuristic to solve the PHSP. 相似文献

11.

On the Performance of Randomized Embedding of Reproduction Trees in Static Networks

Keqin Li 《International journal of parallel programming》2003,31(5):393-406

High performance computing requires high quality load distribution of processes of a parallel application over processors in a parallel computer at runtime such that both the maximum load and dilation are minimized. The performance of a simple randomized tree embedding algorithm that dynamically supports tree-structured parallel computations on arbitrary static networks is analyzed in this paper. The algorithm spreads newly created tree nodes to neighboring processors, which actually provides randomized dilation-1 tree embedding in static networks. We develop a linear system of equations that characterizes expected loads on all processors under the reproduction tree model, which can generate trees of arbitrary size and shape. It is shown that as the tree size becomes large, the asymptotic performance ratio of the randomized tree embedding algorithm is the ratio of the maximum processor degree to the average processor degree. This implies that the simple randomized tree embedding algorithm is able to generate high quality load distributions on virtually all static networks commonly employed in parallel and distributed computing. 相似文献

12.

Scheduling algorithms for heterogeneous batch processors with incompatible job-families

M. Mathirajan A. I. Sivakumar V. Chandru 《Journal of Intelligent Manufacturing》2004,15(6):787-803

We consider the problem of scheduling heterogeneous batch processors (i.e., batch processors with different capacity) with incompatible job-families and non-identical job sizes to maximize the utilization of the batch processors. We analyzed the computational complexity of this problem and showed that it is NP-hard and proposed eight variants of a fast greedy heuristic. A series of computational experiments were carried out to compare the performance of the heuristics and showed that the heuristics are capable of consistently obtaining near (estimated) optimal solutions with very low-computational burden for large-scale problems. We also carried out a study to find the effect of family processing time changes on the performance of the heuristics. This sensitivity analysis indicated that the processing time set of job-families influences the performance of the heuristic algorithms. 相似文献

13.

Improved methods for scheduling flexible manufacturing systems based on Petri nets and heuristic search 总被引：2，自引：0，他引：2

Bo HUANG Yamin SUN 《控制理论与应用(英文版)》2005,3(2):139-144

This paper proposes and evaluates two improved Petri net （PN）-based hybrid search strategies and their applications to flexible manufacturing system （FMS） scheduling. The algorithms proposed in some previous papers, which combine PN simulation capabilities with A＊ heuristic search within the PN reachability graph,may not find an optimum solution even with an admissible heuristic function. To remedy the defects an improved heuristic search strategy is proposed, which adopts a different method for selecting the promising markings and reserves the admissibility of the algorithm. To speed up the search process, another algorithm is also proposed which invokes faster termination conditions and still guarantees that the solution found is optimum. The scheduling results are compared through a simple FMS between our algorithms and the previous methods. They are also applied and evaluated in a set of randomly-generated FMSs with such characteristics as multiple resources and alternative routes. 相似文献

14.

Improved methods f or scheduling flexible manufacturing systems based on Pet ri nets and heuristic search

Bo HUANG Yamin SUN 《控制理论与应用》2005,3(2):139-144

This paper proposes and evaluates two improved Petri net (PN) - based hybrid search strategies and their applications to flexible manufacturing system (FMS) scheduling. The algorithms proposed in some previous papers ,which combine PN simulation capabilities with A 3 heuristic search within the PN reachability graph ,may not find an optimum solution even with an admissible heuristic function. To remedy the defects an improved heuristic search strategy is proposed ,which adopts a different method for selecting the promising markings and reserves the admissibility of the algorithm. To speed up the search process ,another algorithm is also proposed which invokes faster termination conditions and still guarantees that the solution found is optimum. The scheduling results are compared through a simple FMS between our algorithms and the previous methods. They are also applied and evaluated in a set of randomly- generated FMSs with such characteristics as multiple resources and alternative routes. 相似文献

15.

Comparing Processor Allocation Strategies in Multiprogrammed Shared-Memory Multiprocessors

Kelvin K. Yue David J. Lilja 《Journal of Parallel and Distributed Computing》1998,49(2):183

Small-scale shared-memory multiprocessors are commonly used in a workgroup environment where multiple applications, both parallel and sequential, are executed concurrently while sharing the processors and other system resources. To utilize the processors efficiently, an effective allocation strategy is required. In this paper, we use performance data obtained from an SGI multiprocessor to evaluate several processor allocation strategies when running two parallel programs simultaneously. We examine gang scheduling (coscheduling), static space-sharing (space partitioning), and a dynamic allocation scheme called loop-level process control (LLPC) with three different dynamic allocation heuristics. We use regression analysis to quantify the measured data and thereby explore the relationship between the degree of parallelism of the application, specific system parameters (such as the size of the system), the processor allocation strategy, and the resulting performance. This study shows that dynamically partitioning the system using LLPC or similar heuristics provides better performance for applications with a high degree of parallelism than either gang scheduling or static space-sharing. 相似文献

16.

一维高效动态负载平衡方法：多层均权法 总被引：6，自引：0，他引：6

莫则尧《计算机学报》2001,24(2):183-190

提出了一个适合同构和异构并行计算环境的高效一维动态负载平衡方法;多层均权法,并成功地解决了多物质非定常流体力学Lagrange法并行数值模拟过程中的动态负载不平衡问题。文中给出了详细的理论分析以及两台并行机上结合某实际物理问题组织的并行数值实验。相似文献

17.

An asynchronous and iterative load balancing algorithm for discrete load model 总被引：1，自引：0，他引：1

A. Corts A. Ripoll F. Ced M. A. Senar E. Luque 《Journal of Parallel and Distributed Computing》2002,62(12)

Diffusion algorithms are some of the most popular algorithms for dynamic load balancing in which loads move from heavily loaded processors to lightly loaded neighbor processors. To achieve a global load balance in a parallel computer, the algorithm is iterated until the load difference between any two processors is smaller than a specified value. Therefore, one fundamental property to be studied is algorithm convergence. Several analytical works on the convergence of different diffusion load balancing algorithms have been carried out, but they treat loads as non-negative real quantities. In this paper, we describe the Diffusion Algorithm Searching Unbalanced Domains (DASUD) algorithm, which uses loads as non-negative integer values and, unlike existing algorithms, reaches a local balance situation where the maximum load difference between any two processor in the set of neighbor processors for each processor is one load unit. The convergence property of an asynchronous implementation of DASUD using integer loads is proven theoretically. 相似文献

18.

Optimizing array-intensive applications for on-chip multiprocessors

Kadayif I. Kandemir M. Chen G. Ozturk O. Karakoy M. Sezer U. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(5):396-411

With energy consumption becoming one of the first-class optimization parameters in computer system design, compilation techniques that consider performance and energy simultaneously are expected to play a central role. In particular, compiling a given application code under performance and energy constraints is becoming an important problem. In this paper, we focus on an on-chip multiprocessor architecture and present a set of code optimization strategies. We first evaluate an adaptive loop parallelization strategy (i.e., a strategy that allows each loop nest to execute using a different number of processors if doing so is beneficial) and measure the potential energy savings when unused processors during execution of a nested loop are shut down (i.e., placed into a power-down or sleep state). Our results show that shutting down unused processors can lead to as much as 67 percent energy savings at the expense of up to 17 percent performance loss in a set of array-intensive applications. To eliminate this performance penalty, we also discuss and evaluate a processor preactivation strategy based on compile-time analysis of nested loops. Based on our experiments, we conclude that an adaptive loop parallelization strategy combined with idle processor shut down and preactivation can be very effective in reducing energy consumption without increasing execution time. We then generalize our strategy and present an application parallelization strategy based on integer linear programming (ILP). Given an array-intensive application, our optimization strategy determines the number of processors to be used in executing each loop nest based on the objective function and additional compilation constraints provided by the user/programmer. Our initial experience with this constraint-based optimization strategy shows that it is very successful in optimizing array-intensive applications on on-chip multiprocessors under multiple energy and performance constraints. 相似文献

19.

Parallel processing in the elastic nonlinear analysis of high-rise frameworks

C.M. Foley S. Vinnakota 《Computers & Structures》1994,52(6):1169-1179

The method of substructuring and the parallel processing technique of multitasking are applied in the analysis of high-rise structural frameworks. The performance of the proposed method is measured in both wall-clock time and connect time when run in a batch environment on a Cray Y-MP C90 supercomputer. An attempt is made to quantify the optimum number of processors that should be used in the analysis of rectangular frameworks based on the partitioning algorithm employed. Several high-rise planar structures are analyzed and the load-deformation response when subject to proportional and nonproportional loads is given. 相似文献

20.

PLUM : Parallel Load Balancing for Adaptive Unstructured Meshes

Leonid Oliker Rupak Biswas 《Journal of Parallel and Distributed Computing》1998,52(2):75

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method calledPLUMto dynamically balance the processor workloads with a global view. This paper describes the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate thatPLUMis an effective dynamic load balancing strategy which remains viable on a large number of processors. 相似文献