期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

罗红兵张宝印曹立强《计算机工程与应用》2007,43(24):41-46

结合回填的FCFS策略是超级计算机上使用最为普遍的调度策略,针对该策略在响应时间和系统利用率等方面的不足,提出了改进其性能的DGA方法。该方法利用并行作业的可塑性,通过调度时对作业平均响应时间的预测来选择适合的作业请求规模,并利用遗传算法来解决最优作业资源请求的搜索问题。模拟器上实际作业流的模拟结果表明：该方法可以显著地改进结合回填的FCFS策略的调度效果,也优于已有的可塑性作业调度策略。相似文献

2.

基于Backfilling调度算法的“扩履适足”改进算法 总被引：2，自引：0，他引：2

下载免费PDF全文

付云虹白树仁方俊《计算机工程与科学》2006,28(9):94-96

在众多的并行作业调度算法中,Backfilling通常被广泛认为是有效提高CPU利用率的一种算法。该算法是在FCFS算法的基础上,将队列中较小的作业回填（Backfill）到空闲 CPU,以提高CPU利用率。但是,当空闲CPU数量仍然无法满足Backfilling算法中小作业的回填要求时,系统仍有部分CPU闲置,因而也难以达到更好地提高CPU利用率的目的。。对于共享内存体系结构的并行计算机系统,本文提出了基于Backfilling算法的“扩履适足”的改进算法。该算法以正在运行的作业的CPU利用率为依据,通过动态调整正在运行作业的CPU数,扩大可供回填（backfill）的CPU空间,使得Backfilling算法无法回填的作业得到运行,弥补了Backfilling算法的不足,大大提高了共享内存体系结构并
并行计算机系统的CPU利用率。相似文献

3.

LOMARC: Lookahead Matchmaking for Multiresource Coscheduling on Hyperthreaded CPUs

Sodan A.C. Lei Lan 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(11):1360-1375

Job scheduling typically focuses on the CPU with little work existing to include I/O or memory. Time-shared execution provides the chance to hide I/O and long-communication latencies though potentially creating a memory conflict. Hyperthreaded CPUs support coscheduling without any context switches and provide additional options for CPU-internal resource sharing. We present an approach that includes all possible resources into the schedule optimization and improves utilization by coscheduling two jobs if feasible. Our LOMARC approach partially reorders the queue by lookahead to increase the potential to find good matches. In simulations based on the workload model of Lublin and Feitelson, we have obtained improvements between 30 percent and 50 percent in both response times and relative bounded response times on hyperthreaded CPUs (i.e., cut times to two third or to half) 相似文献

4.

Utilization, predictability, workloads, and user runtime estimatesin scheduling the IBM SP2 with backfilling 总被引：1，自引：0，他引：1

Mu'alem A.W. Feitelson D.G. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(6):529-543

Scheduling jobs on the IBM SP2 system and many other distributed-memory MPPs is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order in which the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. This situation led to the development of the EASY scheduler which uses aggressive backfilling: Small jobs are moved ahead to fill in holes in the schedule, provided they do not delay the first job in the queue. We compare this approach with a more conservative approach in which small jobs move ahead only if they do not delay any job in the queue and show that the relative performance of the two schemes depends on the workload. For workloads typical on SP2 systems, the aggressive approach is indeed better, but, for other workloads, both algorithms are similar. In addition, we study the sensitivity of backfilling to the accuracy of the runtime estimates provided by the users and find a very surprising result. Backfilling actually works better when users overestimate the runtime by a substantial factor 相似文献

5.

A new window-based job scheduling scheme for 2D mesh multicomputers

Ismail Ababneh Saad Bani-Mohammad 《Simulation Modelling Practice and Theory》2011,19(1):482-493

Allocating submeshes to jobs in mesh-connected multicomputers in a FCFS fashion can lead to poor system performance (e.g., long job waiting delays) because the job at the head of the waiting queue can prevent the allocation of free submeshes to other waiting jobs with smaller submesh requirements. However, serving jobs aggressively out-of-order can lead to excessive waiting delays for jobs with large allocation requests. In this paper, we propose a scheduling scheme that uses a window of consecutive jobs from which it selects jobs for allocation and execution. This window starts with the current oldest waiting job and corresponds to the lookahead of the scheduler. The performance of the proposed window-based scheme has been compared to that of FCFS and other previous job scheduling schemes. Extensive simulation results based on synthetic workloads and real workload traces indicate that the new scheduling strategy exhibits good performance when the scheduling window size is large. In particular, it is substantially superior to FCFS in terms of system utilization, average job turnaround times, and maximum waiting delays under medium to heavy system loads. Also, it is superior to aggressive out-of-order scheduling in terms of maximum job waiting delays. Window-based job scheduling can improve both overall system performance and fairness (i.e., maximum job waiting delays) by adopting large lookahead job scheduling windows. 相似文献

6.

Scheduling parallel jobs on multicore clusters using CPU oversubscription

Gladys Utrera Julita Corbalan Jesús Labarta 《The Journal of supercomputing》2014,68(3):1113-1140

相似文献

7.

Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems

Prakash Murali Sathish Vadhiyar 《Concurrency and Computation》2016,28(9):2685-2710

Production parallel systems are space‐shared, and resource allocation on such systems is usually performed using a batch queue scheduler. Jobs submitted to the batch queue experience a variable delay before the requested resources are granted. Predicting this delay can assist users in planning experiment time‐frames and choosing sites with less turnaround times and can also help meta‐schedulers make scheduling decisions. In this paper, we present an integrated adaptive framework, Qespera, for prediction of queue waiting times on parallel systems. We propose a novel algorithm based on spatial clustering for predictions using history of job submissions and executions. The framework uses adaptive set of strategies for choosing either distributions or summary of features to represent the system state and to compare with history jobs, varying the weights associated with the features for each job prediction, and selecting a particular algorithm dynamically for performing the prediction depending on the characteristics of the target and history jobs. Our experiments with real workload traces from different production systems demonstrate up to 22% reduction in average absolute error and up to 56% reduction in percentage prediction error over existing techniques. We also report prediction errors of less than 1 h for a majority of the jobs. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

8.

Adaptive parallel job scheduling with flexible coscheduling

Frachtenberg E. Feitelson G. Petrini F. Fernandez J. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(11):1066-1077

Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios. 相似文献

9.

面积最大优先调度的预约回填算法 总被引：2，自引：1，他引：1

刘素芹孟令芬硕珺李兴盛《微计算机应用》2008,29(12)

传统backfilling算法是在先来先服务基础上,将小作业回填到空闲CPU,以提高CPU利用率。该算法偏向小作业,大作业也会因为长期等待出现饥饿现象。当空闲CPU数无法满足算法中小作业回填要求时,系统仍有部分CPU闲置,难以更好地提高CPU利用率。本文中提出的算法以作业所需CPU数及预估运行时间构成的二维面积作为优先调度的条件,引入二级优先级和预约算法消除大作业的饥饿现象,减少回填作业CPU数,相应增加预估运行时间,更好提高CPU利用率。实验证明,该算法比传统backfilling算法在保证用户公平性,缩短作业平均响应时间及CPU利用率方面有所提高。相似文献

10.

Asymptotic convergence of scheduling policies with respect to slowdown

Mor Karl Adam 《Performance Evaluation》2002,49(1-4):241-256

We explore the performance of an M/GI/1 queue under various scheduling policies from the perspective of a new metric: the slowdown experienced by the largest jobs. We consider scheduling policies that bias against large jobs, towards large jobs, and those that are fair, e.g., processor-sharing (PS). We prove that as job size increases to infinity, all work conserving policies converge almost surely with respect to this metric to no more than 1/(1−ρ), where ρ denotes the load. We also find that the expected slowdown under any work conserving policy can be made arbitrarily close to that under PS, for all job sizes that are sufficiently large. 相似文献

11.

Using a sparse promoting method in linear programming approximations to schedule parallel jobs

Stphane Chrtien Jean‐Marc Nicod Laurent Philippe Veronika Rehn‐Sonigo Lamiel Toch 《Concurrency and Computation》2015,27(14):3561-3586

In this paper, we tackle the well‐known problem of scheduling a collection of parallel jobs on a set of processors either in a cluster or in a multiprocessor computer. For the makespan objective, that is, the completion time of the last job, this problem has been shown to be NP‐hard, and several heuristics have already been proposed to minimize the execution time. In this paper, we consider both rigid and moldable jobs. Our main contribution is the introduction of a new approach to the scheduling problem, based on the recent discoveries in the field of compressed sensing. In the proposed approach, all possible positions and shapes of the jobs are encoded into a matrix, and the scheduling is performed by selecting the best columns under natural constraints. Thus, the solution to the new scheduling formulation is naturally sparse, and we may use appropriate relaxations to achieve the optimization task in the quickest possible way. Among many possible relaxation strategies, we choose to minimize the ℓ_p‐quasi‐norm for p∈(0,1). Minimization of the ℓ_p‐quasi‐norm is implemented via a successive linear programming approximation heuristic. We propose several new algorithms based on this approach, and we assess their efficiency through simulations. The experiments show that the scheme outperforms the classic Largest Task First list based algorithm for scheduling small to medium instances but needs improvements to compete on larger numbers of jobs. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

12.

Exploiting fine-grained idle periods in networks of workstations

Kyung Dong Ryu Hollingsworth J.K. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(7):683-698

Studies have shown that for a significant fraction of the time, workstations are idle. In this paper, we present a new scheduling policy called Linger-Longer that exploits the fine-grained availability of workstations to run sequential and parallel jobs. We present a two-level workload characterization study and use it to simulate a cluster of workstations running our new policy. We compare two variations of our policy to two previous policies: Immediate-Eviction and Pause-and-Migrate. Our study shows that the Linger-Longer policy can improve the throughput of foreign jobs on a cluster by 60 percent with only a 0.5 percent slowdown of local jobs. For parallel computing, we show that the Linger-Longer policy outperforms reconfiguration strategies when the processor utilization by the local process is 20 percent or less in both synthetic bulk synchronous and real data-parallel applications 相似文献

13.

Queueing analysis of manufacturing systems with set ups

PETER B. LUH DEBRA J. HOITOMT 《International journal of systems science》2013,44(4):699-714

A manufacturing system with one server (machine), two classes of jobs, finite buffer sizes, and non-negligible set-up times is analysed. A cycling service discipline with a ’wait and see‘ set-up policy is invoked by the server. The state of the machine, whether in set up or in processing, is explicitly considered in the model. Utilization, mean queue length, and cycle time are derived using markovian analysis, first under a fixed lot size environment and then with random lot sizes. With lot sizes fixed, increasing arrival rates, set-up times, and service times generally increase the utilization, cycle time, and queue lengths. When lot sizes are random, there are negligible variations in the cycle time result, with somewhat smaller queue lengths compared to the fixed lot size case. An approximate mean value analysis is also developed and results are compared to those of the markovian analysis. Numerical studies show the robustness of the mean value analysis for most of the system parameters tested. 相似文献

14.

An exact algorithm for parallel machine scheduling with conflicts

Daniel Kowalczyk Roel Leus 《Journal of Scheduling》2017,20(4):355-372

We consider an extension of classic parallel machine scheduling where a set of jobs is scheduled on identical parallel machines and an undirected conflict graph is part of the input. Each node in the graph represents a job, and an edge implies that its two jobs are conflicting, meaning that they cannot be scheduled on the same machine. The goal is to find an assignment of the jobs to the machines such that the maximum completion time (makespan) is minimized. We present an exact algorithm based on branch and price that combines methods from bin packing, scheduling, and graph coloring, with appropriate modifications. The algorithm has a good computational performance even for parallel machine scheduling without conflicting jobs. 相似文献

15.

A simulation comparison of process and cellular layouts in a dual resource constrained environment 总被引：1，自引：0，他引：1

John S. Morris Richard J. Tersine 《Computers & Industrial Engineering》1994,26(4):733-741

Cellular manufacturing is an integral part of a comprehensive Group Technology program designed to improved the productivity of batch production systems. It has been suggested that shorter throughput times and accompanying reductions in work-in-process inventory are possible due to the inherent flexibility of cellular layouts with respect to worker scheduling. This research examines the impact of a dual resource (labor and equipment) constrained shop on the relative performance of cell layouts vis-a-vis process layouts. In addition, three operator scheduling rules are tested in the cellular layout. The first rule assigns operators on a first come-first served basis to jobs competing for an operator's services. The second rule requires operators in a cell to select a job from the machine queue with the longest queue of jobs. The third rule has operators remaining at a machine queue until it is empty. In the process layout, operators are assinged to waiting jobs on a first come-first served basis. In the initial experiment, the process layout outperformed the cellular layout on both work-in-process levels and throughout time. Additional experiments investigated the sensitivity of the initial results to changes in shop congestion. The process layout outperformed the cellular layout in all of the experiments. The results may be attributed to lower machine and labor utilization in the cellular layout from the dedication of equipment to limited part families. 相似文献

16.

Design and Potential Performance of Goal-Oriented Job Scheduling Policies for Parallel Computer Workloads 总被引：1，自引：0，他引：1

Chiang Su-Hui Vasupongayya Sangsuree 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(12):1642-1656

To balance multiple scheduling performance requirements on parallel computer systems, traditional job schedulers use many parameters that can be configured to define job or queue priorities. Offering many parameters seems flexible, but in reality tuning the values for the parameters is highly challenging. To simplify the task of resource management, we propose goal-oriented policies, which allow system administrators to specify high-level performance objectives, rather than tuning low-level scheduling parameters. We study the design of goal-oriented policies, including (1) appropriate multi-objective models for specifying trade-offs between objectives, (2) efficient search algorithms for searching the best schedule at each scheduling decision point, and (3) appropriate performance measures to be optimized in the objectives with respect to two common performance requirements: preventing starvation and favoring shorter jobs. We compare goal-oriented policies with widely used backfill policies. Policies are evaluated by simulation using ten monthly workloads that ran on a Linux cluster (IA-64) from NCSA. Our results show that by automatically optimizing performance according to the given objectives through search, goal-oriented policies can simultaneously outperform FCFS-backfill and LXF-backfill, which are designed in favor of the maximum wait and average slowdown, respectively. 相似文献

17.

Provably Efficient Online Nonclairvoyant Adaptive Scheduling

Yuxiong He Wen-Jing Hsu Leiserson C.E. 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(9):1263-1279

Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler maps the ready threads of a job onto the allotted processors. We present two provably-efficient two-level scheduling schemes called G-RAD and S-RAD respectively. Both schemes use the same job scheduler RAD for the processor allotments that ensures fair allocation under all levels of workload. In G-RAD, RAD is combined with a greedy thread scheduler suitable for centralized scheduling; in S-RAD, RAD is combined with a work-stealing thread scheduler more suitable for distributed settings. Both G-RAD and S-RAD are non-clairvoyant. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. We also analyze the competitiveness of both G-RAD and S-RAD with respect to an optimal clairvoyant scheduler. In terms of makespan, both schemes can achieve O(1)-competitiveness for any set of jobs with arbitrary release time. In terms of mean response time, both schemes are O(1)-competitive for arbitrary batched jobs. To the best of our knowledge, G-RAD and S-RAD are the first non-clairvoyant scheduling algorithms that guarantee provable efficiency, fairness and minimal overhead. 相似文献

18.

PRNS加法中先行进位算法的研究

祝明腾虹《计算机研究与发展》1997,34(11):817-821

ＰＲＮＳ算法是运算器内部的并行机制。在ＰＲＮＳ数母全加器的研制中，找到了一种新的先行进位算法，提出了ＰＲＮＳ先行进行产生器的逻辑结构。相似文献

19.

A two-class parallel system with general service times of the parallel class

D. Filippopoulos H. Karatza 《Journal of Computer and System Sciences》2008,74(6):942-964

A parallel system with two job classes is analyzed. Type-1 jobs require one server for their execution and have exponentially distributed service times while type-2 jobs need two servers and have general service times. The model consists of a single queue served by two servers that may work either independently or in parallel. It is assumed that all jobs are rigid and share the servers according to pure space sharing. We provide closed-form expressions, exact as well as approximate, for various performance measures of interest. The approximate formula is found to be extremely accurate for various distributions of the service times of type-2 jobs. Furthermore, the maximal occupancy of the servers as well as the maximal throughput are examined. Finally, numerical results to investigate the impact of each parameter on system performance are conducted. 相似文献

20.

A new heuristic algorithm for cuboids packing with no orientation constraints

Wenqi Huang Kun He 《Computers & Operations Research》2009

The three-dimensional cuboids packing is NP-hard and finds many applications in the transportation industry. The problem is to pack a subset of cuboid boxes into a big cuboid container such that the total volume of the packed boxes is maximized. The boxes have no orientation constraints, i.e. they can be rotated by 90^°

90^{°}

in any direction. A new heuristic algorithm is presented that defines a conception of caving degree to judge how close a packing box is to those boxes already packed into the container, and always chooses a packing with the largest caving degree to do. The performance is evaluated on all the 47 related benchmarks from the OR-Library. Experiments on a personal computer show a high average volume utilization of 94.6% with an average computation time of 23 min for the strengthened A1 algorithm, which improves current best records by 3.6%. In addition, the top-10 A2 algorithm achieved an average volume utilization of 91.9% with an average computation time of 55 s, which also got higher utilization than current best records reported in the literature. 相似文献