期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A general framework for dynamic and automatic I/O scheduling in hard and solid-state drives

Pilar González-Férez Juan Piernas Toni Cortes 《Journal of Parallel and Distributed Computing》2014

The selection of the right I/O scheduler for a given workload can significantly improve the I/O performance. However, this is not an easy task because several factors should be considered, and even the “best” scheduler can change over the time, specially if the workload’s characteristics change too. To address this problem, we present a Dynamic and Automatic Disk Scheduling framework (DADS) that simultaneously compares two different Linux I/O schedulers, and dynamically selects that which achieves the best I/O performance for any workload at any time. The comparison is made by running two instances of a disk simulator inside the Linux kernel. Results show that, by using DADS, the performance achieved is always close to that obtained by the best scheduler. Thus, system administrators are exempted from selecting a suboptimal scheduler which can provide a good performance for some workloads, but may downgrade the system throughput when the workloads change. 相似文献

2.

Series production in a basic re-entrant shop to minimize makespan or total flow time

Feng Chu Chengbin Chu Caroline Desprez 《Computers & Industrial Engineering》2010,58(2):257-268

This paper addresses a real life shop scheduling problem in a manufacturing company. In this problem, a set of n identical jobs are to be processed on two machines. Every job visits one of the machines more than once. This is therefore a re-entrant shop. Due to the fact that the jobs are identical, the decision version of this problem is even not in the class NP. We give an optimal schedule to minimize the makespan. Since the total flow time depends on the relations between the processing times, we decompose this problem into sub-problems according to the relations between the processing times. We prove various properties of optimal solutions and, based on these properties, we provide an optimal solution for all the sub-problems except one of them. For the sole remaining sub-problem, we prove a dominance property which allows to consider a part of schedules to find an optimal one. 相似文献

3.

Exploring optimal combination of a file system and an I/O scheduler for underlying solid state disks

Hui Sun Xiao Qin Chang-sheng Xie 《浙江大学学报:C卷英文版》2014,15(8):607-621

Performance and energy consumption of a solid state disk （SSD） highly depend on file systems and I/O schedulers in operating systems. To find an optimal combination of a file system and an I/O scheduler for SSDs, we use a metric called the aggregative indicator （AI）, which is the ratio of SSD performance value （e.g., data transfer rate in MB/s or throughput in IOPS） to that of energy consumption for an SSD. This metric aims to evaluate SSD performance per energy consumption and to study the SSD which delivers high performance at low energy consumption in a combination of a file system and an I/O scheduler. We also propose a metric called Cemp to study the changes of energy consumption and mean performance for an Intel SSD （SSD-I） when it provides the largest AI, lowest power, and highest performance, respectively. Using Cemp, we attempt to find the combination of a file system and an I/O scheduler to make SSD-I deliver a smooth change in energy consumption. We employ Filebench as a workload generator to simulate a wide range of workloads （i.e., varmail, fileserver, and webserver）, and explore optimM combinations of file systems and I/O schedulers （i.e., optimal values of AI） for tested SSDs under different workloads. Experimental results reveal that the proposed aggregative indicator is comprehensive for exploring the optimal combination of a file system and an I/O scheduler for SSDs, compared with an individual metric. 相似文献

4.

大数据流式计算框架Storm的任务迁移策略

鲁亮于炯卞琛刘月超廖彬李慧娟《计算机研究与发展》2018,55(1):71-92

Storm作为流式计算模式下最具代表性的平台之一,其默认轮询的调度机制未考虑到异构环境下不同工作节点的自身性能和负载差异,以及工作节点之间的网络传输开销和节点内部的进程与线程通信开销,无法充分发挥集群的性能.为了在各类资源约束的前提下最小化通信开销,在建立并论证Storm资源约束模型、最优通信开销模型和任务迁移模型的基础上,提出一种异构Storm环境下的任务迁移策略(task migration strategy for heterogeneous Storm cluster, TMSH-Storm),包括源节点选择算法和任务迁移算法.其中,源节点选择算法根据集群中各工作节点CPU、内存和网络带宽的负载情况以及各类资源的优先级顺序,将超出阈值的节点加入源节点集;任务迁移算法综合迁移开销、通信开销、节点资源约束以及节点和任务负载等因素,依次将源节点中的待迁移任务异步迁移至目的节点上.实验表明：相对于现有研究而言,TMSH-Storm能有效降低延迟和节点间通信开销,且执行开销较小. 相似文献

5.

Cilk: An Efficient Multithreaded Runtime System

《Journal of Parallel and Distributed Computing》1996,37(1):55-69

Cilk (pronounced “silk”) is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk runtime system currently runs on the Connection Machine CM5 MPP, the Intel Paragon MPP, the Sun Sparcstation SMP, and the Cilk-NOW network of workstations. Applications written in Cilk include protein folding, graphic rendering, backtrack search, and the Socrates chess program, which won second prize in the 1995 ICCA World Computer Chess Championship. 相似文献

6.

Joint optimization of overlapping phases in MapReduce

Minghong Lin Li Zhang Adam Wierman Jian Tan 《Performance Evaluation》2013

MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. There are a variety of scheduling challenges within the MapReduce architecture, and this paper studies the challenges that result from the overlapping of the “map” and “shuffle” phases. We propose a new, general model for this scheduling problem, and validate this model using cluster experiments. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate (i.e., in the resource augmentation framework). Finally, we validate the algorithms using a workload trace from a Google cluster and show that the algorithms are near optimal in practical settings. 相似文献

7.

Sensor-centric energy-constrained reliable query routing for wireless sensor networks

《Journal of Parallel and Distributed Computing》2004,64(7):839-852

Standard wireless sensor network models emphasize energy efficiency and distributed decision-making by considering untethered and unattended sensors. To this we add two constraints—the possibility of sensor failure and the fact that each sensor must tradeoff its own resource consumption with overall network objectives. In this paper, we develop an analytical model of energy-constrained, reliable, data-centric information routing in sensor networks under all the above constraints. Unlike existing techniques, we use game theory to model intelligent sensors thereby making our approach sensor-centric. Sensors behave as rational players in an N-player routing game, where they tradeoff individual communication and other costs with network wide benefits. The outcome of the sensor behavior is a sequence of communication link establishments, resulting in routing paths from reporting to querying sensors. We show that the optimal routing architecture is the Nash equilibrium of the N-player routing game and that computing the optimal paths (which maximizes payoffs of the individual sensors) is NP-Hard with and without data-aggregation. We develop a game-theoretic metric called path weakness to measure the qualitative performance of different routing mechanisms. This sensor-centric concept which is based on the contribution of individual sensors to the overall routing objective is used to define the quality of routing (QoR) paths. Analytical results on computing paths of bounded weakness are derived and game-theoretic heuristics for finding approximately optimal paths are presented. Simulation results are used to compare the QoR of different routing paths derived using various energy-constrained routing algorithms. 相似文献

8.

Distributed fair DRAM scheduling in network-on-chips architecture

《Journal of Systems Architecture》2013,59(7):543-550

Memory access scheduling is an effective manner to improve performance of Chip Multi-Processors (CMPs) by taking advantage of the timing characteristics of a DRAM. A memory access scheduler can subdivide resources utilization (banks and rows) to increase throughput by accessing different DRAM banks in parallel. However, different threads running on different cores may exhibit different performance. One thread may experience starvation while the others are serviced normally. Therefore, designing a scheduler which reduces the unfairness in the DRAM system, while also improving system throughput on a variety of workloads and systems, is necessary. In this paper, a distributed fair DRAM scheduling for two-dimensional mesh network-on-chips (NoCs), called DFDS, is presented. The key design points in DFDS are: (i) assessing the total waiting cycles of a memory request in NoC and considering it as a metric in arbitration. For this purpose waiting cycles of a memory request are put in an additional flit in a packet and are updated while traversing the NoC, and (ii) proposing a semi-dynamic virtual channel allocation to provide in-order memory requests to memory controllers (MCs). Consequently, we use a simple scheduling algorithm in MCs, instead of complex algorithms. To validate our approach, we apply synthetic and real workload from Parsec benchmark suite. The results show effectiveness of our approach, as we reduce the waiting time of memory requests by up to 15%. 相似文献

9.

Adaptive holistic scheduling for query processing in sensor networks

Hejun Wu Qiong Luo 《Journal of Parallel and Distributed Computing》2010

We observe two deficiencies of current query processing and scheduling techniques for sensor networks: (1) A query execution plan does not adapt to the hardware characteristics of sensing devices; and (2) the data communication schedule of each node is not adapted to the query runtime workload. Both cause time and energy waste in query processing in sensor networks. To address this problem, we propose an adaptive holistic scheduler, AHS, to run on each node in a wireless sensor network. AHS schedules both the query evaluation and the wireless communication operations, and is able to adapt the schedule to the runtime dynamics of these operations on each node. We have implemented AHS and tested it on real motes as well as in simulation. Our results show that AHS improves the performance of query processing in various dynamic settings. 相似文献

10.

Scheduling unit length jobs on parallel machines with lookahead information

Marvin Mandelbaum Dvir Shabtay 《Journal of Scheduling》2011,14(4):335-350

This paper studies two closely related online-list scheduling problems of a set of n jobs with unit processing times on a set of m multipurpose machines. It is assumed that there are k different job types, where each job type can be processed on a unique subset of machines. In the classical definition of online-list scheduling, the scheduler has all the information about the next job to be scheduled in the list while there is uncertainty about all the other jobs in the list not yet scheduled. We extend this classical definition to include lookahead abilities, i.e., at each decision point, in addition to the information about the next job in the list, the scheduler has all the information about the next h jobs beyond the current one in the list. We show that for the problem of minimizing the makespan there exists an optimal (1-competitive) algorithm for the online problem when there are two job types. That is, the online algorithm gives the same minimal makespan as the optimal offline algorithm for any instance of the problem. Furthermore, we show that for more than two job types no such online algorithm exists. We also develop several dynamic programming algorithms to solve a stochastic version of the problem, where the probability distribution of the job types is known and the objective is to minimize the expected makespan. 相似文献

11.

An 802.11e HCCA scheduler with an end-to-end quality aware territory method

Jorge Navarro-Ortiz Pablo Ameigeiras Juan J. Ramos-Munoz Juan M. Lopez-Soler 《Computer Communications》2009,32(11):1281-1297

In this paper we present a solution for the IEEE 802.11e HCCA (Hybrid coordination function Controlled Channel Access) mechanism which aims both at supporting strict real-time traffic requirements and, simultaneously, at handling TCP applications efficiently. Our proposal combines a packet scheduler and a dynamic resource allocation algorithm. The scheduling discipline is based on the Monolithic Shaper-Scheduler, which is a modification of a General Processor Sharing (GPS) related scheduler. It supports minimum-bandwidth and delay guarantees and, at the same time, it achieves the optimal latency of all the GPS-related schedulers. In addition, our innovative resource allocation procedure, called the territory method, aims at prioritizing real time services and at improving the performance of TCP applications. For this purpose, it splits the wireless channel capacity (in terms of transmission opportunities) into different territories for the different types of traffic, taking into account the end-to-end network dynamics. In order to give support to the desired applications, we consider the following traffic classes: conversational, streaming, interactive and best-effort. The so called territories shrink or expand depending on the current quality experienced by the corresponding traffic class. We evaluated the performance of our solution through extensive simulations in a heterogeneous wired-cum-wireless scenario under different traffic conditions. Additionally, we compare our proposal to other HCCA scheduling algorithms, the HCCA reference scheduler and Fair Hybrid Coordination Function (FHCF). The results show that the combination of the MSS and the territory method obtains higher system capacity for VoIP traffic (up to 32 users) in the simulated scenario, compared to FHCF and the HCCA reference scheduler (13 users). In addition, the MSS with the territory method also improves the throughput of TCP sources (one FTP application achieves between 6.1 Mbps without VoIP traffic and 2.1 Mbps with 20 VoIP users) compared to the reference scheduler (at most 388 kbps) and FHCF (with a maximum FTP throughput of 4.8 Mbps). 相似文献

12.

Multi-criteria and satisfaction oriented scheduling for hybrid distributed computing infrastructures

《Future Generation Computer Systems》2016

Assembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, and greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software. 相似文献

13.

一种自适应负载的I/O调度算法

下载免费PDF全文

徐炜遐李琼蒋艳凰《计算机工程与科学》2009,31(11):1-3

I/O调度算法对磁盘阵列(RAID)性能具有至关重要的影响。虽然已有很多典型的I/O调度算法在一定负载情况下可获得较好的性能,但很难有哪一种算法在各种负载情况下均能获得很好的性能。本文提出了一种智能RAID控制模型,结合C4.5决策树和AdaBoost算法实现负载自动分类,根据负载变化和性能反馈情况动态调整I/O调度策略,实现面向应用需求的自治调度。模拟实验结果表明,自适应调度算法具有较好的适应性,在各种负载情况下优于现有的I/O调度算法,尤其适用于多线程混合负载环境的I/O性能优化。相似文献

14.

Optimal job packing,a backfill scheduling optimization for a cluster of workstations

Syed Munir Hussain Shah Kalim Qureshi Haroon Rasheed 《The Journal of supercomputing》2010,54(3):381-399

In this paper, we have proposed two backfill scheduling optimizations, i.e., Shortest Width First Backfill (SWFBF) and Shortest Area First Backfill (SAFBF). A near optimal simple, but effective job packing algorithm called the Select-Replace algorithm has also been presented to minimize external fragmentation. Proof of the concept has been given with the help of a simulation study. Five workloads which were derived from a clean version of the parallel workload archive (CTC, LANL, and SDSC. NASA) have been used to evaluate and compare proposed heuristics with previous techniques. With the simple but effective optimizations, significant (56.1%) performance improvement has been achieved as compared to EASY scheduler. 相似文献

15.

Towards automated HPC scheduler configuration tuning

Diwakar Krishnamurthy Mehrnoush Alemzadeh Mahmood Moussavi 《Concurrency and Computation》2011,23(15):1723-1748

High performance computing (HPC) systems allow researchers and businesses to harness large amounts of computing power needed for solving complex problems. In such systems a job scheduler prioritizes the execution of jobs belonging to users of the system in a manner that allows the system to satisfy performance objectives for various groups of users while simultaneously making efficient use of available resources. Typically, system administrators have the responsibility of manually configuring or tuning the job scheduler such that the performance objectives of user groups as well as system‐level performance objectives are met. Modern job schedulers used in production systems are quite complex. Through detailed trace‐driven simulations, we show that manually tuning the configuration of production schedulers in an environment characterized by multiple performance objectives is very challenging and may not be feasible. To alleviate this problem, this paper describes a toolset that can help a system administrator to automatically configure a scheduler such that the performance objectives for various classes of users in the system as well as other system‐level performance objectives can be satisfied. A unique aspect of this work that differentiates it from the existing work on scheduler tuning is that it has been implemented to work with a widely used production scheduler. Furthermore, in contrast to the existing work it considers the challenging real‐world problem of delivering different levels of performance to different classes of users. System administrators can exploit the toolset to react quickly to changes in performance objectives and workload conditions. Case studies using synthetic and real HPC workloads demonstrate the effectiveness of the technique. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

16.

Jcluster: an efficient Java parallel environment on a large‐scale heterogeneous cluster

Bao‐Yin Zhang Guang‐Wen Yang Wei‐Min Zheng 《Concurrency and Computation》2006,18(12):1541-1557

In this paper, we present Jcluster, an efficient Java parallel environment that provides some critical services, in particular automatic load balancing and high‐performance communication, for developing parallel applications in Java on a large‐scale heterogeneous cluster. In the Jcluster environment, we implement a task scheduler based on a transitive random stealing (TRS) algorithm. Performance evaluations show that the scheduler based on TRS can make any idle node obtain a task from another node with much fewer stealing times than random stealing (RS), which is a well‐known dynamic load‐balancing algorithm, on a large‐scale cluster. In the performance aspects of communication, with the method of asynchronously multithreaded transmission, we implement a high‐performance PVM‐like and MPI‐like message‐passing interface in pure Java. The evaluation of the communication performance is conducted among the Jcluster environment, LAM‐MPI and mpiJava on LAM‐MPI based on the Java Grande Forum's pingpong benchmark. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

17.

Interaction-aware scheduling of report-generation workloads

Mumtaz Ahmad Ashraf Aboulnaga Shivnath Babu Kamesh Munagala 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(4):589-615

The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions, making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interaction-aware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made online. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently outperform (up to 4x) query schedulers in current database systems. 相似文献

18.

TCP is Competitive with Resource Augmentation

Jeff Edmonds Suprakash Datta Patrick Dymond 《Theory of Computing Systems》2010,47(1):137-161

The well-known Transport Control Protocol (TCP) is a crucial component of the TCP/IP architecture on which the Internet is built, and is a de facto standard for reliable communication on the Internet. At the heart of the TCP protocol is its congestion control algorithm. While most practitioners believe that the TCP congestion control algorithm performs very well, a complete analysis of the congestion control algorithm is yet to be done. A lot of effort has, therefore, gone into the evaluation of different performance metrics like throughput and average latency under TCP. In this paper, we approach the problem from a different perspective and use the competitive analysis framework to provide some answers to the question “how good is the TCP/IP congestion control algorithm?” We describe how the TCP congestion control algorithm can be viewed as an online, distributed scheduling algorithm. We observe that existing lower bounds for non-clairvoyant scheduling algorithms imply that no online, distributed, non-clairvoyant algorithm can be competitive with an optimal offline algorithm if both algorithms were given the same resources. Therefore, in order to evaluate TCP using competitive analysis, we must limit the power of the adversary, or equivalently, allow TCP to have extra resources compared to an optimal, offline algorithm for the same problem. In this paper, we show that TCP is competitive to an optimal, offline algorithm provided the former is given more resources. Specifically, we prove first that for networks with a single bottleneck (or point of congestion), TCP is ${\mathcal{O}}(1)The well-known Transport Control Protocol (TCP) is a crucial component of the TCP/IP architecture on which the Internet is built, and is a de facto standard for reliable communication on the Internet. At the heart of the TCP protocol is its congestion control algorithm. While most practitioners believe that the TCP congestion control algorithm performs very well, a complete analysis of the congestion control algorithm is yet to be done. A lot of effort has, therefore, gone into the evaluation of different performance metrics like throughput and average latency under TCP. In this paper, we approach the problem from a different perspective and use the competitive analysis framework to provide some answers to the question “how good is the TCP/IP congestion control algorithm?” We describe how the TCP congestion control algorithm can be viewed as an online, distributed scheduling algorithm. We observe that existing lower bounds for non-clairvoyant scheduling algorithms imply that no online, distributed, non-clairvoyant algorithm can be competitive with an optimal offline algorithm if both algorithms were given the same resources. Therefore, in order to evaluate TCP using competitive analysis, we must limit the power of the adversary, or equivalently, allow TCP to have extra resources compared to an optimal, offline algorithm for the same problem. In this paper, we show that TCP is competitive to an optimal, offline algorithm provided the former is given more resources. Specifically, we prove first that for networks with a single bottleneck (or point of congestion), TCP is O(1){\mathcal{O}}(1)-competitive to an optimal centralized (global) algorithm in minimizing the user-perceived latency or flow time of the sessions, provided we allow TCP O(1){\mathcal{O}}(1) times as much bandwidth and O(1){\mathcal{O}}(1) extra time per session. Second, we show that TCP is fair by proving that the bandwidths allocated to sessions quickly converge to fair sharing of network bandwidth. 相似文献

19.

G-PaMeLA: A divide-and-conquer approach for joint channel assignment and routing in multi-radio multi-channel wireless mesh networks

Vanessa GardellinAuthor Vitae Sajal K. DasAuthor VitaeLuciano LenziniAuthor Vitae Claudio CicconettiAuthor VitaeEnzo MingozziAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(3):381-396

The performance of Multi-Radio Multi-Channel Wireless Mesh Networks (MRMC-WMNs) based on the IEEE 802.11 technology depends significantly on how the channels are assigned to the radios and how traffic is routed between the access points and the gateways. In this paper we propose an algorithmic approach to this problem, for which, as far as we know, no optimal polynomial time solutions have been put forward in the literature. The core of our scheme consists of a sequential divide-and-conquer technique which divides the overall Joint Channel Assignment and Routing (JCAR) problem into a number of local optimization sub-problems that are executed sequentially. We propose a generalized scheme called Generalized Partitioned Mesh network traffic and interference aware channeL Assignment (G-PaMeLA), where the number of sub-problems is equal to the maximum number of hops to the gateway, and a customized version which takes advantage of the knowledge of the topology. In both cases each sub-problem is formulated as an Integer Linear Programming (ILP) optimization problem. An optimal solution for each sub-problem can be found by using a branch-and-cut method. The final solution is obtained after a post-processing phase, which improves network connectivity. The divide-and-conquer technique significantly reduces the execution time and makes our solution feasible for an operational WMN. With the help of a detailed packet level simulation, the G-PaMeLA technique is compared with several state-of-the-art JCAR algorithms. Our results highlight that G-PaMeLA performs much better than the others in terms of packet loss rate, collision probability and fairness among traffic flows. 相似文献

20.

Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems

Chao-Chin Wu Lien-Fu Lai Liang-Tsung Huang MingLung Chen 《The Journal of supercomputing》2012,62(1):399-430

Previously we have proposed a Layered Self-Scheduling (LSS) approach that is a hybrid MPI and OpenMP based loop self-scheduling approach for dealing with the heterogeneity problem on a cluster system consisting of multi-core compute nodes, where the allocation functions of several well-known schemes have been modified for better performance. Though LSS provides better performance than the conventional self-scheduling schemes, we found the performance can be improved further after our comprehensive experiments and analyses. The newly proposed task scheduling strategy, called Enhanced Layered Self-Scheduling (ELSS), aims at how to utilize the compute powers of multiple processor cores more efficiently in the master compute node and how to schedule tasks to have more stable performance improvements. We have evaluated the new task scheduling strategy by three benchmark applications: Matrix Multiplication, Monte Carlo Integration, and Mandelbrot Set Computation. It is recommended that the global scheduler adopts Guided Self-Scheduling (GSS) for all, and the local scheduler adopts the static scheme for applications with regular workload distribution but any scheme for applications with irregular workload distribution. Experimental results show the best speedups obtained by ELSS for the three benchmark programs are 1.373, 13.34 and 2.4, respectively, compared with that scheduled by LSS. 相似文献