期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data‐aware task scheduling on heterogeneous hybrid memory multiprocessor systems

Junjie Chen Kenli Li Zhuo Tang Chubo Liu Yan Wang Keqin Li 《Concurrency and Computation》2016,28(17):4443-4459

In this paper, we propose a method about task scheduling and data assignment on heterogeneous hybrid memory multiprocessor systems for real‐time applications. In a heterogeneous hybrid memory multiprocessor system, an important problem is how to schedule real‐time application tasks to processors and assign data to hybrid memories. The hybrid memory consists of dynamic random access memory and solid state drives when considering the performance of solid state drives into the scheduling policy. To solve this problem, we propose two heuristic algorithms called improvement greedy algorithm and the data assignment according to the task scheduling algorithm, which generate a near‐optimal solution for real‐time applications in polynomial time. We evaluate the performance of our algorithms by comparing them with a greedy algorithm, which is commonly used to solve heterogeneous task scheduling problem. Based on our extensive simulation study, we observe that our algorithms exhibit excellent performance and demonstrate that considering data allocation in task scheduling is significant for saving energy. We conduct experiments on two heterogeneous multiprocessor systems. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

2.

Space efficient execution of deterministic parallel programs

Simpson D.J. Burton F.W. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(6):870-882

We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule “space efficient” if the amount of memory required is at most equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We describe a simple, locally depth-first scheduling algorithm and show that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism 相似文献

3.

A new task scheduling method for distributed programs that require memory management

Hiroshi Koide Yuji Oie 《Concurrency and Computation》2006,18(9):941-958

In parallel and distributed applications, it is very likely that object‐oriented languages, such as Java and Ruby, and large‐scale semistructured data written in XML will be employed. However, because of their inherent dynamic memory management, parallel and distributed applications must sometimes suspend the execution of all tasks running on the processors. This adversely affects their execution on the parallel and distributed platform. In this paper, we propose a new task scheduling method called CP/MM (Critical Path/Memory Management) which can efficiently schedule tasks for applications requiring memory management. The underlying concept is to consider the cost due to memory management when the task scheduling system allocates ready (executable) coarse‐grain tasks, or macro‐tasks, to processors. We have developed three task scheduling modules, including CP/MM, for a task scheduling system which is implemented on a Java RMI (Remote Method Invocation) communication infrastructure. Our experimental results show that CP/MM can successfully prevent high‐priority macro‐tasks from being affected by the garbage collection arising from memory management, so that CP/MM can efficiently schedule distributed programs whose critical paths are relatively long. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

4.

Minimizing write operation for multi-dimensional DSP applications via a two-level partition technique with complete memory latency hiding

《Journal of Systems Architecture》2015,61(2):112-126

Most scientific and digital signal processing (DSP) applications are recursive or iterative. The execution of these applications on a chip multiprocessor (CMP) encounters two challenges. First, as most of the digital signal processing applications are both computation intensive and data intensive, an inefficient scheduling scheme may generate huge amount of write operation, cost a lot of time, and consume significant amount of energy. Second, because CPU speed has been increased dramatically compared with memory speed, the slowness of memory hinders the overall system performance. In this paper, we develop a Two-Level Partition (TLP) algorithm that can minimize write operation while achieving full parallelism for multi-dimensional DSP applications running on CMPs which employ scratchpad memory (SPM) as on-chip memory (e.g., the IBM Cell processor). Experiments on DSP benchmarks demonstrate the effectiveness and efficiency of the TLP algorithm, namely, the TLP algorithm can completely hide memory latencies to achieve full parallelism and generate the least amount of write operation to main memory compared with previous approaches. Experimental results show that our proposed algorithm is superior to all known methods, including the list scheduling, rotation scheduling, Partition Scheduling with Prefetching (PSP), and Iterational Retiming with Partitioning (IRP) algorithms. Furthermore, the TLP scheduling algorithm can reduce write operation to main memory by 45.35% and reduce the schedule length by 23.7% on average compared with the IRP scheduling algorithm, the best known algorithm. 相似文献

5.

CPU scheduling and memory management for interactive real-time applications

Shinpei Kato Yutaka Ishikawa Ragunathan Rajkumar 《Real-Time Systems》2011,47(5):454-488

In this paper, we propose, design, implement, and evaluate a CPU scheduler and a memory management scheme for interactive soft real-time applications. Our CPU scheduler provides a new CPU reservation algorithm that is based on the well-known Constant Bandwidth Server (CBS) algorithm but is more flexible in allocating the CPU time to multiple concurrently-executing real-time applications. Our CPU scheduler also employs a new multicore scheduling algorithm, extending the Earliest Deadline First to yield Window-constraint Migrations (EDF-WM) algorithm, to improve the absolute CPU bandwidth available in reservation-based systems. Furthermore, we propose a memory reservation mechanism incorporating a new paging algorithm, called Private-Shared-Anonymous Paging (PSAP). This PSAP algorithm allows interactive real-time applications to be responsive under memory pressure without wasting and compromising the memory resource available for contending best-effort applications. Our evaluation demonstrates that our CPU scheduler enables the simultaneous playback of multiple movies to perform at the stable frame-rates more than existing real-time CPU schedulers, while also improves the ratio of hard-deadline guarantee for randomly-generated task sets. Furthermore, we show that our memory management scheme can protect the simultaneous playback of multiple movies from the interference introduced by memory pressure, whereas these movies can become unresponsive under existing memory management schemes. 相似文献

6.

分时EDF算法及其在多媒体操作系统中的应用 总被引：2，自引：0，他引：2

张怡张拥军彭宇行陈福接《计算机学报》2001,24(3):315-320

提出了一种新的CPU调度算法－－分时EDF（Earliest Deadine First)算法,该算法能保证硬实时任务不丢失死线,并易于在分时系统中实现。以分时EDF算法为基础,提出一种新的CPU层次调度算法－－HRFSFQ,该算法用于多媒体操作系统时能保证各类任务的QoS。最后通过大量实验证明了上述算法的有效性和正确性。相似文献

7.

Discovering Dispatching Rules Using Data Mining 总被引：1，自引：0，他引：1

Xiaonan?Li Sigurdur?Olafsson Email author 《Journal of Scheduling》2005,8(6):515-527

This paper introduces a novel methodology for generating scheduling rules using a data-driven approach. We show how to use data mining to discover previously unknown dispatching rules by applying the learning algorithms directly to production data. This approach involves preprocessing of historic scheduling data into an appropriate data file, discovery of key scheduling concepts, and representation of the data mining results in a way that enables its use for job scheduling. We also consider how by using this new approach unexpected knowledge and insights can be obtained, in a manner that would not be possible if an explicit model of the system or the basic scheduling rules had to be obtained beforehand. All of our results are illustrated via numerical examples and experiments on simulated data. 相似文献

8.

Compact DAG Representation and Its Dynamic Scheduling

Michel Cosnard Emmanuel Jeannot 《Journal of Parallel and Distributed Computing》1999,58(3):143

Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems: (1) how to schedule a task graph, when it is too large to fit into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-time? Our answers feature the parameterized task graph (PTG), which is a symbolic representation of the task graph. We propose a dynamic scheduling algorithm which takes a PTG as an entry and allows us to generate a generic program. We present a theoretical study which shows that our algorithm finds good schedules for coarse-grain task graphs, has a low memory cost, and a low computational complexity. When the average number of operations of each task is large enough, we prove that the scheduling overhead is negligible with respect to the makespan. We also provide experimental results that demonstrate the feasibility of our approach using several compute-intensive kernels found in numerical scientific applications. 相似文献

9.

Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU

Kai-Cheng Wei Xue Sun Hsun Chu Chao-Chin Wu 《The Journal of supercomputing》2017,73(11):4711-4738

General-purpose computing on graphics processing unit (GPGPU) has been adopted to accelerate the running of applications which require long execution time in various problem domains. Tabu Search belonging to meta-heuristics optimization has been used to find a suboptimal solution for NP-hard problems within a more reasonable time interval. In this paper, we have investigated in how to improve the performance of Tabu Search algorithm on GPGPU and took the permutation flow shop scheduling problem (PFSP) as the example for our study. In previous approach proposed recently for solving PFSP by Tabu Search on GPU, all the job permutations are stored in global memory to successfully eliminate the occurrences of branch divergence. Nevertheless, the previous algorithm requires a large amount of global memory space, because of a lot of global memory access resulting in system performance degradation. We propose a new approach to address the problem. The main contribution of this paper is an efficient multiple-loop struct to generate most part of the permutation on the fly, which can decrease the size of permutation table and significantly reduce the amount of global memory access. Computational experiments on problems according with benchmark suite for PFSP reveal that the best performance improvement of our approach is about 100%, comparing with the previous work. 相似文献

10.

Parallelizing with BDSC,a resource-constrained scheduling algorithm for shared and distributed memory systems

《Parallel Computing》2015

We introduce a new parallelization framework for scientific computing based on BDSC, an efficient automatic scheduling algorithm for parallel programs in the presence of resource constraints on the number of processors and their local memory size. BDSC extends Yang and Gerasoulis’s Dominant Sequence Clustering (DSC) algorithm; it uses sophisticated cost models and addresses both shared and distributed parallel memory architectures. We describe BDSC, its integration within the PIPS compiler infrastructure and its application to the parallelization of four well-known scientific applications: Harris, ABF, equake and IS. Our experiments suggest that BDSC’s focus on efficient resource management leads to significant parallelization speedups on both shared and distributed memory systems, improving upon DSC results, as shown by the comparison of the sequential and parallelized versions of these four applications running on both OpenMP and MPI frameworks. 相似文献

11.

A palmer-based continuous fuzzy flexible flow-shop scheduling algorithm

T.-P. Hong T.-T. Wang S.-L. Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2001,5(6):426-433

Flexible flow shops can be thought of as generalizations of simple flow shops. In the past, the processing time for each job was usually assumed to be known exactly, but in many real-world applications, processing times may vary dynamically due to human factors or operating faults. In the past, we demonstrated how discrete fuzzy concepts could easily be used in the Palmer algorithm for managing uncertain flexible-flow-shop scheduling. In this paper, we generalize it to continuous fuzzy domains. We use triangular membership functions for flexible flow shops with more than two machine centers to examine processing-time uncertainties and to make scheduling more suitable for real applications. We first use the triangular fuzzy LPT algorithm to allocate jobs, and then use the triangular fuzzy Palmer algorithm to deal with sequencing the tasks. The proposed method thus provides a more flexible way of scheduling jobs than conventional scheduling methods. 相似文献

12.

并行任务图的优化调度算法

李于锋莫则尧肖永浩熊敏《计算机工程与科学》2019,41(6):955-962

科学与工程计算中的很多复杂应用问题需要使用科学工作流技术,超算领域中的科学工作流常以并行任务图建模,并行任务图的有效调度对应用的高效执行有重要意义。给出了资源限制条件下并行任务图的调度模型;针对Fork-Join类并行任务图给出了若干最优化调度结论;针对一般并行任务图提出了一种新的调度算法,该算法考虑了数据通信开销对资源分配和调度性能的影响,并对已有的CPA算法在特定情况下进行了改进。通过实验与常用的CPR和CPA算法做比较,验证了提出的新算法能够获得很好的调度效果。本文提出的调度算法和得到的最优调度结论对工作流应用系统的高性能调度功能开发具有借鉴意义。相似文献

13.

Parallel machine scheduling with precedence constraints and setup times

Bernat Gacias Christian Artigues Pierre Lopez 《Computers & Operations Research》2010,37(12):2141-2151

This paper presents different methods for solving parallel machine scheduling problems with precedence constraints and setup times between the jobs. These problems are strongly NP-hard and it is even conjectured that no list scheduling algorithm can be defined without explicitly considering jointly scheduling and resource allocation. We propose dominance conditions based on the analysis of the problem structure and an extension to setup times of the energetic reasoning constraint propagation algorithm. An exact branch-and-bound procedure and a climbing discrepancy search (CDS) heuristic based on these components are defined. We show how the proposed dominance rules can still be valid in the CDS scheme. The proposed methods are evaluated on a set of randomly generated instances and compared with previous results from the literature and those obtained with an efficient commercial solver. We conclude that our propositions are quite competitive and our results even outperform other approaches in most cases. 相似文献

14.

Optimal scheduling across public and private clouds in complex hybrid cloud environment

Li?Chunlin Email author Li?LaYuan 《Information Systems Frontiers》2017,19(1):1-12

The hybrid cloud extends the private cloud model by using both local and remote resources. The private cloud will rely on the resources leased from public cloud providers for the execution of private cloud applications. The paper presents optimal scheduling across public and private clouds in complex hybrid cloud environment. The contributions of this paper have three aspects. 1) The proposed hybrid cloud scheduling policy considers the benefits of private cloud applications and public cloud provider, it can adapt to the changes in the system to find the scheduling optimization. The scheduling optimization is decomposed and conducted across the private cloud and public cloud. 2) Secondly, The paper describes negotiations in hybrid cloud marketplace and gives an example to explain how these rules are resolved by the cloud marketplace. 3) Thirdly, the paper proposes an optimal scheduling algorithm across public and private clouds. The paper also describes negotiations in hybrid cloud marketplace and gives an example to explain how these rules are resolved by the cloud marketplace. In the simulations, the profit of public cloud provider and resource utilization of the proposed algorithm are better than other related works. 相似文献

15.

基于调度规则和免疫算法的作业车间多目标调度

龙田王俊佳《信息与控制》2016,45(3):278-286

利用动态在线调度方法对动态环境下的作业车间进行研究,采用优先级调度规则对大量调度案例进行求解,针对7个调度目标,从备选调度规则集中选出了单个目标下性能最优的调度规则;为实现调度规则的动态选择以适应多目标调度,基于免疫系统中的独特型网络理论,设计了一种免疫调度算法．根据算法,定义了有效的抗体和抗原结构,并通过抗体间亲和力计算、抗体浓度计算、抗体选择等关键步骤,实现对调度规则的动态控制．仿真测试数据表明,所设计的免疫调度算法能根据不同的车间情况,快速选出不同的调度规则满足多个调度目标,有效解决了作业车间多目标调度问题．相似文献

16.

Heavy traffic optimal resource allocation algorithms for cloud computing clusters

《Performance Evaluation》2014

Cloud computing is emerging as an important platform for business, personal and mobile computing applications. In this paper, we study a stochastic model of cloud computing, where jobs arrive according to a stochastic process and request resources like CPU, memory and storage space. We consider a model where the resource allocation problem can be separated into a routing or load balancing problem and a scheduling problem. We study the join-the-shortest-queue routing and power-of-two-choices routing algorithms with the MaxWeight scheduling algorithm. It was known that these algorithms are throughput optimal. In this paper, we show that these algorithms are queue length optimal in the heavy traffic limit. 相似文献

17.

FIT: A Flexible, Lightweight, and Real-Time Scheduling System for Wireless Sensor Platforms

Dong Wei Chen Chun Liu Xue Zheng Kougen Chu Rui Bu Jiajun 《Parallel and Distributed Systems, IEEE Transactions on》2010,21(1):126-138

We propose FIT, a flexible, lightweight, and real-time scheduling system for wireless sensor platforms. There are three salient features of FIT. First, its two-tier hierarchical framework supports customizable application-specific scheduling policies, hence, FIT is very flexible. Second, FIT is lightweight in terms of minimizing the thread number to reduce preemptions and memory consumption while at the same time ensuring system schedulability. We propose a novel Minimum Thread Scheduling Policy (MTSP) exploration algorithm within FIT to achieve this goal. Finally, FIT provides a detailed real-time schedulability analysis method to help check if application's temporal requirements can be met. We implemented FIT on MicaZ motes and carried out extensive evaluations. Results demonstrate that FIT is indeed flexible and lightweight for implementing real-time applications, at the same time, the schedulability analysis provided can predict the real-time behavior. FIT is a promising scheduling system for implementing complex real-time applications in sensor networks. 相似文献

18.

Loop scheduling and bank type assignment for heterogeneous multi-bank memory

Meikang Qiu Minyi Guo Meiqin Liu Chun Jason Xue Laurence T. Yang Edwin H.-M. Sha 《Journal of Parallel and Distributed Computing》2009

Many high-performance DSP processors employ multi-bank on-chip memory to improve performance and energy consumption. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multi-bank memory remains difficult, considering the combined effect of performance and energy requirement. This paper studies the scheduling and assignment problem about how to minimize the total energy consumption while satisfying the timing constraint with heterogeneous multi-bank memory for applications with loop. An algorithm, TASL (Type Assignment and Scheduling for Loops), is proposed. The algorithm uses bank type assignment with the consideration of variable partition to find the best configuration for both memory and ALU. The experimental results show that the average improvement on energy-saving is significant by using TASL. 相似文献

19.

半导体封装测试生产线排产研究 总被引：1，自引：0，他引：1

姚丽丽史海波刘昶《自动化学报》2014,40(5):892-900

以某半导体封装测试（Semiconductor assembly and test manufacturing,ATM）企业为研究背景,对半导体封装测试的生产过程进行分析总结,提出一种新的“产能限定混线车间”（Capacity-limit flexible flow-shop,CLFFS）模型作为半导体封装测试生产线的排产模型.通过对半导体封装测试的特殊逻辑处理、排产方法以及排产规则等进行研究,提出采用逻辑约束和调度规则双层优化控制的启发式正序排产算法作为半导体封装测试的总体排产方法,同时针对批准备单处理生产阶段,提出一种新的预测开机控制优化调度方法.最后,结合CLFFS排产模型和所提出的策略方法,给出半导体封装测试排产的应用研究示例与比较,结果证明本文给定的总体排产方法在ATM中具有很好的可行性和业务逻辑嵌入的即便性,同时本文所提出的新的预测开机控制优化调度方法能够很好的缩短生产周期,提高生产效率. 相似文献

20.

A faster branch-and-bound algorithm for the earliness-tardiness scheduling problem

Francis Sourd Safia Kedad-Sidhoum 《Journal of Scheduling》2008,11(1):49-58

This paper addresses the one-machine scheduling problem with earliness-tardiness penalties. We propose a new branch-and-bound algorithm that can solve instances with up to 50 jobs and that can solve problems with even more general non-convex cost functions. The algorithm is based on the combination of a Lagrangean relaxation of resource constraints and new dominance rules. 相似文献