首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper we present a new asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 + O(K⋅ p⋅ D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its performance using a set of C-based benchmarks that have dynamic or irregular parallelism. We compare the performance of our scheduler with that of two previous schedulers: the thread library's original scheduler (which uses a FIFO queue), and a provably space-efficient depth-first scheduler. At a fine thread granularity, our scheduler outperforms both these previous schedulers, but requires marginally more memory than the depth-first scheduler. We also present simulation results on synthetic benchmarks to compare our scheduler with space-efficient versions of both a work-stealing scheduler and a depth-first scheduler. The results indicate that unlike these previous approaches, the new algorithm covers a range of scheduling granularities and space requirements, and allows the user to trade the space requirement of a program with the scheduling granularity. Received June 11, 2000, and in revised form March 19, 2001, and in final form August 7, 2001. Online publication April 5, 2002.  相似文献   

2.
Latency-rate (LR) schedulers have shown their ability in providing fair and weighted sharing of bandwidth with an upper bound on delivery latency of packets while earliest departure first (EDF) schedulers have shown their ability in providing LR-decoupled service whereby the delivery latency of packets is not bounded by the reserved rate. However, EDF schedulers require traffic shapers to ensure flow protection. We propose quantum-based earliest deadline first scheduling (QEDF), a quantum-based scheduler that provides flow protection, throughput guarantee and delay bound guarantee for flows that require LR-coupled and LR-decoupled types of reservations. It classifies flows into time-critical (TC), jitter-sensitive (JS), and rate-based (RB) classes and uses a quality-of-service forwarding rule to determine the next packet to be serviced by the scheduler. It provides nonpreemptive priority service to TC queues. This allows LR-decoupled reservation for flows that have a low rate and intolerable delay. Packets from JS queues can be delayed by other packets if forwarding the latter will not result in the former missing its deadline. As a quantum-based scheduler, the QEDF scheduler provides throughput guarantees for RB queues. We present both analytical and simulation results of QEDF, whereby we evaluated QEDF in its deployment as a single-class as well as a multiservice scheduler  相似文献   

3.
Cloud computing, an important source of computing power for the scientific community, requires enhanced tools for an efficient use of resources. Current solutions for workflows execution lack frameworks to deeply analyze applications and consider realistic execution times as well as computation costs. In this study, we propose cloud user–provider affiliation (CUPA) to guide workflow’s owners in identifying the required tools to have his/her application running. Additionally, we develop PSO-DS, a specialized scheduling algorithm based on particle swarm optimization. CUPA encompasses the interaction of cloud resources, workflow manager system and scheduling algorithm. Its featured scheduler PSO-DS is capable of converging strategic tasks distribution among resources to efficiently optimize makespan and monetary cost. We compared PSO-DS performance against four well-known scientific workflow schedulers. In a test bed based on VMware vSphere, schedulers mapped five up-to-date benchmarks representing different scientific areas. PSO-DS proved its efficiency by reducing makespan and monetary cost of tested workflows by 75 and 78%, respectively, when compared with other algorithms. CUPA, with the featured PSO-DS, opens the path to develop a full system in which scientific cloud users can run their computationally expensive experiments.  相似文献   

4.
Several studies have shown that Asymmetric Multicore Processors (AMPs) systems, which are composed of processors with different hardware characteristics, present better performance and power when compared to homogeneous systems. With Moore’s law behavior still lasting, core-count growth creates typical non-uniform memory accesses (NUMA). Existing schedulers assume that the underlying architecture is homogeneous, and as consequence, they may not be well suited for AMP and NUMA systems, since they, respectively, do not properly explore hardware elements asymmetry, while improving memory utilization by avoid multi-processes data starvation. In this paper we propose a new scheduler, namely NUMA-aware Scheduler, to accommodate the next generation of AMP architectures in terms of architecture asymmetry and processes starvation. Experimental results show that the average speedup is 1.36 times faster than default Linux scheduler through evaluation using PARSEC benchmarks, demonstrating that the proposed technique is promising when compared to other prior studies.  相似文献   

5.
Rapid advances in sensing and communication technologies connect isolated manufacturing units, which generates large amounts of data. The new trend of mass customization brings a higher level of disturbances and uncertainties to production planning. Traditional manufacturing systems analyze data and schedule orders in a centralized architecture, which is inefficient and unreliable for the overdependence on central controllers and limited communication channels. Internet of things (IoT) and cloud technologies make it possible to build a distributed manufacturing architecture such as the multi-agent system (MAS). Recently, artificial intelligence (AI) methods are used to solve scheduling problems in the manufacturing setting. However, it is difficult for scheduling algorithms to process high-dimensional data in a distributed system with heterogeneous manufacturing units. Therefore, this paper presents new cyber-physical integration in smart factories for online scheduling of low-volume-high-mix orders. First, manufacturing units are interconnected with each other through the cyber-physical system (CPS) by IoT technologies. Attributes of machining operations are stored and transmitted by radio frequency identification (RFID) tags. Second, we propose an AI scheduler with novel neural networks for each unit (e.g., warehouse, machine) to schedule dynamic operations with real-time sensor data. Each AI scheduler can collaborate with other schedulers by learning from their scheduling experiences. Third, we design new reward functions to improve the decision-making abilities of multiple AI schedulers based on reinforcement learning (RL). The proposed methodology is evaluated and validated in a smart factory by real-world case studies. Experimental results show that the new architecture for smart factories not only improves the learning and scheduling efficiency of multiple AI schedulers but also effectively deals with unexpected events such as rush orders and machine failures.  相似文献   

6.
The use of virtualization technology (VT) has become widespread in modern datacenters and Clouds in recent years. In spite of their many advantages, such as provisioning of isolated execution environments and migration, current implementations of VT do not provide effective performance isolation between virtual machines (VMs) running on a physical machine (PM) due to workload interference of VMs. Generally, this interference is due to contention on physical resources that impacts performance in different workload configurations. To investigate the impacts of this interference, we formalize the concept of interference for a consolidated multi-tenant virtual environment. This formulation, represented as a mathematical model, can be used by schedulers to estimate the interference of a consolidated virtual environment in terms of the processing and networking workloads of running VMs, and the number of consolidated VMs. Based on the proposed model, we present a novel batch scheduler that reduces the interference of running tenant VMs by pausing VMs that have a higher impact on proliferation of the interference. The scheduler achieves this by selecting a set of VMs that produce the least interference using a 0–1 knapsack problem solver. The selected VMs are allowed to run and other VMs are paused. Users are not troubled by the pausing and resumption of VMs for a short time because the scheduler has been designed for the execution of batch type applications such as scientific applications. Evaluation results on the makespan of VMs executed under the control of our scheduler have shown nearly 33% improvement in the best case and 7% improvement in the worst case compared to the case in which all VMs are running concurrently. In addition, the results show that our scheduling algorithm outperforms serial and random scheduling of VMs as well.  相似文献   

7.
While high-performance architectures have included some Instruction-Level Parallelism (ILP) for at least 25 years, recent computer designs have exploited ILP to a significant degree. Although a local scheduler is not sufficient for generation of excellent ILP code, it is necessary as many global scheduling and software pipelining techniques rely on a local scheduler. Global scheduling techniques are well-documented, yet practical discussions of local schedulers are notable in their absence. This paper strives to remedy that disparity by describing a list scheduling framework and several important practical details that, taken together, allow implementation of an efficient local instruction scheduler that is easily retargetable for ILP architectures. The foundation of our machine-independent instruction scheduler is a timing model that allows easy retargetability to a wide range of architectures. In addition to describing how a general list-scheduler can be implemented within the framework of our timing model, experimental results indicate that lookahead scheduling can profoundly improve a scheduler's ability to produce a legal schedule. Further experimental data shows that deciding to schedule a data dependence DAG (DDD) in forward or reverse order depends significantly upon that target architecture, suggesting the possibility of scheduling in each direction and using the best of the two schedules. In contrast, experiments demonstrate little difference in code quality for schedules generated by either instruction-driven or operation-driven schedulers. Thus, the inherent flexibility of operation-driven methods suggests including that approach in a retargetable instruction scheduler. List scheduling is, of course, a heuristic scheduling method. A variety of scheduling heuristics are presented. In addition, the paper describes a method, using a genetic algorithm search, to ‘fine-tune’ the weights of twenty-four individual heuristics to form a DDD-node heuristic tuned to a specific architecture. © 1998 John Wiley & Sons, Ltd.  相似文献   

8.
We present an approach to designing cellular automata-based multiprocessor scheduling algorithms in which extracting knowledge about the scheduling process occurs. We consider the simplest case when a multiprocessor system is limited to two-processors. To design cellular automata corresponding to a given program graph, we propose a generic definition of program graph neighborhood, transparent to the various kinds, sizes, and shapes of program graphs. The cellular automata-based scheduler works in two modes: learning mode and operation mode. Discovered rules are typically suitable for sequential cellular automata working as a scheduler, while the most interesting and promising feature of cellular automata are their massive parallelism. To overcome difficulties in evolving parallel cellular automata rules, we propose using coevolutionary genetic algorithm. Discovered this way, rules enable us to design effective parallel schedulers. We present a number of experimental results for both sequential and parallel scheduling algorithms discovered in the context of a cellular automata-based scheduling system  相似文献   

9.
We present a framework, called meta scheduler, for implementing real-time scheduling algorithms. The meta scheduler is a portable middleware layer component designed for implementations over POSIX-compliant operating systems. It accommodates pluggable real-time scheduling algorithms while offering the flexibility of platform independence - the singular underlying OS requirement is the now nearly ubiquitous POSIX compliance. The versatility of pluggable schedulers positions the meta scheduler for deployment in an interoperable heterogeneous real-time environment. We present the design of the meta scheduler and outline its implementation. Furthermore, we present a mechanized correctness verification using the UPPAAL model checker. Prototype implementation of the meta scheduler over QNX Neutrino real-time operating system demonstrates high performance and a small footprint.  相似文献   

10.
It is customary to use open-system trace-driven simulations to evaluate the performance of parallel-system schedulers. As a consequence, all schedulers have evolved to optimize the packing of jobs in the schedule, as a means to improve a number of performance metrics that are conjectured to be correlated with user satisfaction, with the premise that this will result in a higher productivity in reality. We argue that these simulations suffer from severe limitations that lead to suboptimal scheduler designs and to even dismissing potentially good design alternatives. We propose an alternative simulation methodology called site-level simulation, in which the workload for the evaluation is generated dynamically by user models that interact with the system. We present a novel scheduler called CREASY that exploits knowledge on user behavior to directly improve user satisfaction and compare its performance to the original packing-based EASY scheduler. We show that user productivity improves by up to 50 percent under the user-aware design, while according to the conventional metrics, performance may actually degrade.  相似文献   

11.
An algorithm has been developed to dynamically schedule heterogeneous tasks on heterogeneous processors in a distributed system. The scheduler operates in an environment with dynamically changing resources and adapts to variable system resources. It operates in a batch fashion and utilises a genetic algorithm to minimise the total execution time. We have compared our scheduler to six other schedulers, three batch-mode and three immediate-mode schedulers. Experiments show that the algorithm outperforms each of the others and can achieve near optimal efficiency, with up to 100,000 tasks being scheduled  相似文献   

12.
In this paper, we present the Cello disk scheduling framework for meeting the diverse service requirements of applications. Cello employs a two-level disk scheduling architecture, consisting of a class-independent scheduler and a set of class-specific schedulers. The two levels of the framework allocate disk bandwidth at two time-scales: the class-independent scheduler governs the coarse-grain allocation of bandwidth to application classes, while the class-specific schedulers control the fine-grain interleaving of requests. The two levels of the architecture separate application-independent mechanisms from application-specific scheduling policies, and thereby facilitate the co-existence of multiple class-specific schedulers. We demonstrate that Cello is suitable for next generation operating systems since: (i) it aligns the service provided with the application requirements, (ii) it protects application classes from one another, (iii) it is work-conserving and can adapt to changes in work-load, (iv) it minimizes the seek time and rotational latency overhead incurred during access, and (v) it is computationally efficient.  相似文献   

13.
The selection of the right I/O scheduler for a given workload can significantly improve the I/O performance. However, this is not an easy task because several factors should be considered, and even the “best” scheduler can change over the time, specially if the workload’s characteristics change too. To address this problem, we present a Dynamic and Automatic Disk Scheduling framework (DADS) that simultaneously compares two different Linux I/O schedulers, and dynamically selects that which achieves the best I/O performance for any workload at any time. The comparison is made by running two instances of a disk simulator inside the Linux kernel. Results show that, by using DADS, the performance achieved is always close to that obtained by the best scheduler. Thus, system administrators are exempted from selecting a suboptimal scheduler which can provide a good performance for some workloads, but may downgrade the system throughput when the workloads change.  相似文献   

14.
Proportionate fair schedulers provide an effective methodology for scheduling recurrent real-time tasks on multiprocessors. However, a drawback in these schedulers is that they ignore a task’s affinity towards the processor where it was executed last, causing frequent inter-processor task migrations which ultimately results in increased execution times. This paper presents Partition Oriented Frame Based Fair Scheduler (POFBFS), an efficient proportional fair scheduler for periodic firm and soft real-time tasks that ensures a bounded number of task migrations. Experimental results reveal that POFBFS can achieve 3 to 100 times reduction in the number of migrations suffered with respect to the General-ERfair algorithm (for a set of 25 to 100 tasks running on 2 to 8 processors) while simultaneously maintaining high fairness accuracy.  相似文献   

15.
针对Hadoop和Spark等大数据分析系统中无先验知识任务的高效执行问题,设计了基于累计工作量(CRW)的任务调度器CRWScheduler。该调度器根据CRW将任务在低权重队列与高权重队列间切换;在为作业分配资源时,同时考虑到作业所在的队列和其瞬时占用资源量,无需作业先验知识即显著提升系统性能。基于Apache Hadoop YARN实现了CRWScheduler原型,在28个节点的基准测试集群上的实验表明,与YARN的公平调度机制相比,作业流时间(JFT)平均降低21%,其中95百分位的作业流时间(JFT)最多降低了35%,并且在与任务级调度程序协作时可获得进一步的性能提升。  相似文献   

16.
Decay usage scheduling is a priority- and usage-based approach to CPU allocation in which preference is given to processes that have consumed little CPU in the recent past. The author develops an analytic model for decay usage schedulers running compute-bound workloads, such as those found in many engineering and scientific environments; the model is validated from measurements of a Unix system. This model is used in two ways. First, ways to parameterize decay usage schedulers are studied to achieve a wide range of service rates. Doing so requires a fine granularity of control and a large range of control. The results show that, for a fixed representation of process priorities a larger range of control makes the granularity of control coarser, and a finer granularity of control decreases the range of control. A second use of the analytic model is to construct a low overhead algorithms for achieving service rate objectives. Existing approaches require adding a feedback loop to the scheduler. This overhead is avoided by exploiting the feedback already present in decay usage schedulers. Using both empirical and analytical techniques, it is shown that the algorithm is effective and that it provides fairness when the system is over- or under-loaded  相似文献   

17.
近年来研究流簇(Coflow)为单位的调度策略成为改进数据中心网络的新热点.然而现有的信息未知流簇调度器难以快速地推理任务级信息,导致小任务不能被及时调度,以及平均任务完成时间无法最小化.因此数据中心网络需要更加高效的推理模型提升流簇大小判断的准确性和敏感性.提出了一种基于机器学习的流簇大小推理模型(MLcoflow)...  相似文献   

18.
Each job scheduler in large decentralized load balancing systems generally must consider whether it is advantageous to offload jobs to remote computation servers when the local load is too high. Although processing power may appear to be available at a very distant server, two problems arise due to the transmission delay between the scheduler and server. Predictably, the response time of the job is adversely affected as the job spends valuable time in transit, but a more subtle problem involves the value, or reliability, of the state information regarding job queues. The longer the delay between scheduler and server, the less a scheduler should value the state information of the server (given that the state changes over time). We examine the performance of schedulers in topologies with different average proximity and show a probabilistic algorithm that allows schedulers to dynamically form efficient clusters in the network.  相似文献   

19.
在分析现有的资源调度方案及模型的基础上,提出了基于层次化的网格资源三层调度模型.它由主调度器、次级调度器和计算节点组成。主调度器根据任务的性质和需求,并参考下层次级调度器的执行情况,将部分任务分发到各次级调度器上,实现了主调度器与次级调度器之间的并行工作。基于该模型提出轮循任务分发策略。通过分析和模拟.该资源调度模型及任务分发策略在调度性能上明显优于集中式调度方案。  相似文献   

20.
Providing QoS with the Deficit Table Scheduler   总被引:1,自引:0,他引:1  
A key component for networks with Quality of Service (QoS) support is the egress link scheduling algorithm. An ideal scheduling algorithm implemented in a high-performance network with QoS support should satisfy two main properties: good end-to-end delay and implementation simplicity. Table-based schedulers try to offer a simple implementation and good latency bounds. Some of the latest proposals of network technologies, like Advanced Switching and InfiniBand, include in their specifications one of these schedulers. However, these table-based schedulers do not work properly with variable packet sizes, as is usually the case in current network technologies. We have proposed a new table-based scheduler, which we have called Deficit Table (DTable) scheduler, that works properly with variable packet sizes. Moreover, we have proposed a methodology to configure this table-based scheduler in such a way that it permits us to decouple the bounding between the bandwidth and latency assignments. In this paper, we thoroughly review the provision of QoS with the DTable scheduler and our configuration methodology, and evaluate the performance of our proposals in a multimedia scenario. Simulation results show that our proposals are able to provide a similar latency performance than more complex scheduling algorithms. Moreover, we show the advantages of our decoupling configuration methodology over the usual ways of configuring this kind of table-based schedulers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号