期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dynamic scheduling of task graphs in multi-FPGA systems using critical path

Ramezani Reza 《The Journal of supercomputing》2021,77(1):597-618

SRAM-based FPGAs feature high performance and flexibility. Thus, they have found many applications in modern high-performance computing (HPC) systems. These systems suffer from the limitation of the computing resources problem for running HPC applications. Therefore, multi-FPGA systems have been emerged to alleviate such resource limitations. In this regard, efficient scheduling strategies are required to dynamically steer the execution of applications—represented as task graphs—on a set of connected FPGAs. In this paper, a heuristic-based dynamic critical path-aware scheduling technique named CPA is presented to schedule task graphs on multi-FPGA systems. The proposed technique, by considering the computation and communication capabilities of FPGAs, dynamically assigns priority to tasks in different steps in order to achieve better makespans. The proposed technique has been evaluated by conducting several experiments on real-world and three different shapes of random task graphs with different number of tasks, and its efficiency has been compared with that of three task graph scheduling approaches. The obtained results demonstrate that the proposed CPA technique outperforms well-known heuristic scheduling strategies and improves their makespan by 13.47% on average. In addition, the experiments show that the proposed technique generates the schedules in the order of milliseconds and the average of its yielded makespans is 12.05% longer than that of an optimum schedule.

相似文献

2.

Group-based Parallel Multi-scheduler for Grid computing

《Future Generation Computer Systems》2015

The Group-based Parallel Multi-scheduler (GPMS), introduced in this paper, is aimed at effectively exploiting the benefits of multicore systems for Grid scheduling by splitting jobs and machines into paired groups and independently scheduling jobs in parallel from those groups. We implemented two job grouping methods; Execution Time Balanced (ETB) and Execution Time Sorted then Balanced (ETSB), and two machine grouping methods,; Evenly Distributed (EvenDist) and Similar Together (SimTog). For each method, we varied the number of groups between 2, 4, 8 and 16. We then executed the MinMin Grid scheduling algorithm independently within the groups. We demonstrated that by sharing jobs and machines into groups before scheduling, the computation time for the scheduling process drastically improved by magnitudes of 85% over the ordinary MinMin algorithm when implemented on a HPC system. We also found that our balanced group based approach achieved better results than our previous Priority based grouping approach. 相似文献

3.

Hadoop下基于统计最优的资源调度算法

邓传华范通让高峰《计算机应用研究》2013,30(2):417-419

云计算集群中的资源存在异构和节点稳定性问题.异构资源的计算能力不同会导致较突出的作业任务同步问题,而某个节点的不稳定状态会使运行于该节点的任务大量备份或重新计算.针对上述两问题将严重影响集群作业的执行进度,在Hadoop平台下利用统计方法,提出一种资源调度算法,对计算资源较少的节点和不稳定状态的节点进行标志并降权,让集群尽可能调度资源较好的稳定节点.实验结果表明,该算法能够在一定程度上减少作业的周转时间,提高集群的效率和吞吐量. 相似文献

4.

基于判例构造的并行作业性能预测

张伟哲张宏莉张元竞《软件学报》2010,21(Z1):238-250

针对基于MPI 的并行作业性能预测问题,鉴于历史预测与建模分析方法在异构网络计算环境中性能预测的局限,提出了基于判例构造的并行作业性能预测方法.在MPI 库PMPI 接口中插入封套函数,获取通信日志,并设计了日志规整和合并算法.将最核心的日志循环收缩问题,转化为字符串循环子串收缩问题,提出了一种基于后缀数组算法,在理论和实际的性能方面均优于已有算法;判例程序自动构建阶段,解决了计算时间与通信时间等比例缩放问题,设计了自动构建可执行判例程序的方法.同构与异构机群环境实验结果表明,判例预测方法能够比较准确地预估计算作业的运行时间,对于同构机群误差不超过3%,异构机群误差不超过10%,与同类算法相比,具有较好的综合性能. 相似文献

5.

Neighborhood search-based job scheduling for IoT big data real-time processing in distributed edge-cloud computing environment

Li Chunlin Zhang YiHan Luo Youlong 《The Journal of supercomputing》2021,77(2):1853-1878

Cloud-edge collaboration architecture, which combines edge processing and centralized cloud processing, is suitable for placement and caching of streaming media. A cache-aware scheduling model based on neighborhood search is proposed. The model is divided into four sub-problems: job classification, node resource allocation, node clustering, and cache-aware job scheduling. Firstly, jobs are categorized into three categories, and then different resources are allocated to nodes according to different job execution conditions. Secondly, the nodes with similar capabilities are clustered, and the jobs are cached by delay-waiting. For jobs that do not satisfy the data locality, the jobs are scheduled to the nodes with similar capabilities according to the neighborhood search results. Meanwhile, a cache-aware scheduling algorithm based on neighborhood search is proposed. Experiments show that the proposed algorithm can effectively minimize the delay of content transmission and the cost of content placement, the job execution time is shortened and the processing capacity of the cloud data center is improved.

相似文献

6.

clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters

Raca Valon Mehofer Eduard 《The Journal of supercomputing》2020,76(12):9976-10008

Heterogeneous cluster systems consisting of CPUs and different kinds of accelerators have become mainstream in HPC. Programming such systems is a difficult task and requires addressing manifold challenges that stem from the intricate composition of such systems and peculiarities of scientific applications. A broad range of obstacles preventing efficient execution have to be considered and dealt with properly. In this paper, we propose a systematic approach and a framework that is capable of providing comprehensive support for running data-parallel applications in heterogeneous asymmetric clusters. Our implementation provides work partitioning and distribution by ensuring workload balance in the cluster while handling of partitioning-induced communication and synchronization in a transparent way. In our experimental section, we choose 11 representative scientific applications from different domains to evaluate our approach. Experimental results show a strong speedup and workload balance for different cluster configurations.

相似文献

7.

基于调度历史数据在线预测作业执行时间

许伦凡熊敏肖永浩《计算机应用研究》2020,37(3):763-767

传统基于用户预估的执行时间通常准确性较差。结合分类和基于实例的学习方法,综合使用模板相似和数值相似方法,在历史调度数据中获取当前作业的相似作业,并使用其历史信息预测当前作业执行时间。使用调度历史中的用户名、分组名、队列名、应用名、用户请求处理器数、用户请求（预估）执行时间和用户请求内存量等属性进行训练和预测,算法中涉及的参数使用遗传算法确定。数值实验表明,相较于已有文献,本方法在使用更少参数的前提下得到了与文献结果中相近的低估率,并获得了更低的平均绝对误差。在HPC2N04和HPC2N05日志数据集上,平均绝对误差分别降低了43%和77%。研究了使用在线预测替换用户估计对作业调度的影响,对结果进行了初步分析并指出了今后的改进方向。相似文献

8.

大型高能物理计算集群资源管理方法的评测

孙震宇石京燕姜晓巍邹佳恒杜然《计算机科学》2017,44(10):85-90

高能物理数据由物理事例组成,事例之间没有相关性。可以通过大量作业同时处理大量不同的数据文件,从而实现高能物理计算任务的并行化,因此高能物理计算是典型的高吞吐量计算场景。高能所计算集群使用开源的TORQUE/Maui进行资源管理及作业调度,并通过将集群资源划分成不同队列以及限制用户最大运行作业数来保证公平性,然而这也导致了集群整体资源利用率非常低下。SLURM和HTCondor都是近年来流行的开源资源管理系统,前者拥有丰富的作业调度策略,后者非常适合高吞吐量计算,二者都能够替代老旧、缺乏维护的TORQUE/Maui,都是管理计算集群资源的可行方案。在SLURM和HTCondor测试集群上模拟大亚湾实验用户的作业提交行为,对SLURM和HTCondor的资源分配行为和效率进行了测试,并与相同作业在高能物理研究所TORQUE/Maui集群上的实际调度结果进行了对比,分析了SLURM及HTCondor的优势和不足,探讨了使用SLURM或HTCondor管理高能物理研究所计算集群的可行性。相似文献

9.

EMM: Extended matching market based scheduling for big data platform hadoop

Singh Balraj Verma Harsh K 《Multimedia Tools and Applications》2022,81(24):34823-34847

Hadoop has emerged as a popular choice for processing Big data. Its cluster is used to process large scale jobs. The performance of a cluster is largely dependent upon the different kind of scheduling policies employed for job processing. However, a single type of scheduling policy may not be suitable for different kind of jobs. Inefficient performance of a cluster is an apparent outcome of inappropriate scheduling policies. These policies are either too complex or they are too elementary to understand the diverse jobs and their needs. Most of them follow a fixed pattern, which cannot be considered as a common solution for different jobs. The effect of such a non-fitting mechanism is lower resource utilization and poor cluster performance. In this paper, a pluggable scheduling mechanism is proposed for efficient and adaptive processing of the jobs. It utilizes the Matching Market concept for the allocation and further adaptively accommodates the diverse needs of the multiple jobs by understanding the varying requirements of the tasks. The experimental results reveal an enhanced resource utilization and improved cluster performance with an overall reduction in makespan. In certain instances, we have seen resource utilization improved up to 80% and performance improvement up to 60% with the proposed technique. Cluster efficiency is increased up of 31%. The evaluation and comparisons were conducted on various scheduling policies using different benchmarks of Hadoop with the same data and identical configurations. The proposed system has shown significant improvement in cluster efficiency.

相似文献

10.

基于模板的高性能计算应用封装方法

下载免费PDF全文

卢大勇陆琪姜恺《计算机工程》2011,37(11):34-36

提出一种基于模板的高性能计算应用封装方法——HPC-APT。该方法具有平台无关性,并且语法简单、易于维护和扩展、用户界面友好。HPC-APT可应用于现今主流的网格中间件或云计算平台上,支持Web和Windows应用。HPC用户可以通过Web或Windows应用界面向远程HPC集群提交作业,从而降低使用HPC的准入门槛,提高HPC集群的可用性。相似文献

11.

面向MapReduce计算的大规模集群通信优化

曹云鹏王海峰刘海涛何淑庆《计算机应用研究》2020,37(4):1174-1178

为了优化大规模集群运行MapReduce作业时的通信效率和减少shuffle数据传输量,首先采用存储局部性换取通信局部性的策略建立一个分布式协同数据映射模型;其次通过随机抽样和机器学习方法来提取作业数据的局部性特征,实现map计算数据的有效部署;最后,利用软件定义网络的全局灵活控制能力,优选通信链路好的节点并将计算任务映射到该类节点中。实验表明对于中间数据混洗密集类作业有较好的优化效果,通信延迟降低了4.3%~5.8%。该方案能减少shuffle流量和数据迁移延迟,并且适合各种调度策略和网络拓扑结构。相似文献

12.

The triangle scheduling problem

Christoph Dürr Zdeněk Hanzálek Christian Konrad Yasmina Seddik René Sitters Óscar C. Vásquez Gerhard Woeginger 《Journal of Scheduling》2018,21(3):305-312

This paper introduces a novel scheduling problem, where jobs occupy a triangular shape on the time line. This problem is motivated by scheduling jobs with different criticality levels. A measure is introduced, namely the binary tree ratio. It is shown that the Greedy algorithm solves the problem to optimality when the binary tree ratio of the input instance is at most 2. We also show that the problem is unary NP-hard for instances with binary tree ratio strictly larger than 2 and provide a quasi-polynomial time approximation scheme. The approximation ratio of Greedy on general instances is shown to be between 1.5 and 1.05. 相似文献

13.

多租户集群中基于服务水平目标的调度机制

杜雄杰王旻汤学海张章《计算机应用》2015,35(4):944-949

针对多租户集群中无法保证作业服务水平目标(SLO)的问题,提出了一种多租户场景下基于SLO的调度机制,其中包括优先调度算法和资源抢占算法。优先调度算法区别考虑超额使用资源的租户和未超额使用资源的租户,赋予后者的作业更高的优先级,在此前提下选择紧急度最高的作业,优先为其分配资源;资源抢占算法在资源受限的情况下,选择紧急度超过阈值的作业实施资源抢占,并根据租户的资源使用情况,在相应的运行作业范围内选择紧急度最低的作业,抢占其资源。实验结果表明,与现有保证公平的多租户调度器Capacity Scheduler相比,该调度机制可以在兼顾作业执行效率和租户间公平的前提下,显著提高作业的截止时间保证率,从而保证业务的服务水平目标。相似文献

14.

计算密集型与数据密集型混合网格作业调度算法

郝永生卢俊文刘冠峰温娜《计算机工程与科学》2014,36(8):1423-1429

针对计算密集型作业与数据密集型作业混合情况,在一个作业有时间限制的动态环境中,对传统的网格作业调度方法进行扩展,提出了三种网格作业调度启发式算法:Emin min、Ebest、Esufferage。并在一个由多个Cluster组成的、通过高速网络连接的网格模型上,对三种算法进行验证。与Min min算法的比较结果显示：三种算法均优于Min min算法。与ASJS算法比较结果显示：Emin min减少了等待时间与作业的makespan; Esufferage算法以减少作业完成量为代价,减少了作业的等待时间及makespan; Ebest在完成作业数量上与ASJS基本保持一致,但却增加了作业的等待时间与makespan。总体上,Emin min具有比较大的优势。相似文献

15.

Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

Zhengxiong HOU Hong SHEN Xingshe ZHOU Jianhua GU Yunlan WANG Tianhai ZHAO 《Frontiers of Computer Science》2022,16(5):165107

Nowadays, high-performance computing (HPC) clusters are increasingly popular. Large volumes of job logs recording many years of operation traces have been accumulated. In the same time, the HPC cloud makes it possible to access HPC services remotely. For executing applications, both HPC end-users and cloud users need to request specific resources for different workloads by themselves. As users are usually not familiar with the hardware details and software layers, as well as the performance behavior of the underlying HPC systems. It is hard for them to select optimal resource configurations in terms of performance, cost, and energy efficiency. Hence, how to provide on-demand services with intelligent resource allocation is a critical issue in the HPC community. Prediction of job characteristics plays a key role for intelligent resource allocation. This paper presents a survey of the existing work and future directions for prediction of job characteristics for intelligent resource allocation in HPC systems. We first review the existing techniques in obtaining performance and energy consumption data of jobs. Then we survey the techniques for single-objective oriented predictions on runtime, queue time, power and energy consumption, cost and optimal resource configuration for input jobs, as well as multi-objective oriented predictions. We conclude after discussing future trends, research challenges and possible solutions towards intelligent resource allocation in HPC systems. 相似文献

16.

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach

Gandomi Abolfazl Movaghar Ali Reshadi Midia Khademzadeh Ahmad 《The Journal of supercomputing》2020,76(9):7177-7203

MapReduce framework is an effective method for big data parallel processing. Enhancing the performance of MapReduce clusters, along with reducing their job execution time, is a fundamental challenge to this approach. In fact, one is faced with two challenges here: how to maximize the execution overlap between jobs and how to create an optimum job scheduling. Accordingly, one of the most critical challenges to achieving these goals is developing a precise model to estimate the job execution time due to the large number and high volume of the submitted jobs, limited consumable resources, and the need for proper Hadoop configuration. This paper presents a model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster. Moreover, a novel heuristic method is designed, which significantly reduces the makespan of the jobs. In this method, first by providing the job profiling tool, we obtain the execution details of the MapReduce phases through log analysis. Then, using machine learning methods and statistical analysis, we propose a relevant model to predict runtime. Finally, another tool called job submission and monitoring tool is used for calculating makespan. Different experiments were conducted on the benchmarks under identical conditions for all jobs. The results show that the average makespan speedup for the proposed method was higher than an unoptimized case.

相似文献

17.

Job-shop scheduling in a body shop

Joachim Schauer Cornelius Schwarz 《Journal of Scheduling》2013,16(2):215-229

We study a generalized job-shop problem called the body shop scheduling problem (BSSP). This problem arises from the industrial application of welding in a car body production line, where possible collisions between industrial robots have to be taken into account. BSSP corresponds to a job-shop problem where the operations of a job have to follow alternating routes on the machines, certain operations of different jobs are not allowed to be processed at the same time and after processing an operation of a certain job a machine might be unavailable for a given time for operations of other jobs. As main results we will show that for three jobs and four machines the special case where only one machine is used by more than one job is already $\mathcal NP $ -hard. This also implies that the single machine scheduling problem that asks for a makespan minimal schedule of three chains of operations with delays between the operations of a chain is $\mathcal NP $ -hard. On the positive side, we present a polynomial algorithm for the two job case and a pseudo-polynomial algorithm together with an FPTAS for an arbitrary but constant number of jobs. Hence for a constant number of jobs we fully settle the complexity status of the problem. 相似文献

18.

Hadoop下资源匹配最大集作业调度算法

朱洁李雯睿赵红李滢《计算机应用》2015,35(12):3383-3386

针对目前层级队列作业调度算法中资源占比高的作业执行效率低的问题,提出一种资源匹配最大集算法。该算法分析作业特征,引入完成度、等待时间、优先级、重调度次数为紧迫值因子,优先考虑资源占比高或等待时间长的作业,以改善作业公平性;采用双队列结构在可用资源总量内优先选择高紧迫值作业,在不同资源占比作业集比较中选择作业数最大集,以实现调度平衡。在与最大最小公平(Max-min fairness)算法的实例对比中发现,该算法可降低作业集平均等待时间、提高资源利用率。实验对比结果表明,该算法可将不同资源占比的单一类型作业集执行时间缩短18.73%,其中资源占比高的作业执行时间缩短27.26%;在混合型作业集中对应的执行时间可分别缩短22.36%与30.28%。所提算法能有效减少资源占比高作业的等待,提高作业整体执行效率。相似文献

19.

Cluster scheduling for real-time systems: utilization bounds and run-time overhead 总被引：1，自引：1，他引：0

Xuan?Qi Dakai?Zhu Email author Hakan?Aydin 《Real-Time Systems》2011,47(3):253-284

Cluster scheduling, where processors are grouped into clusters and the tasks that are allocated to one cluster are scheduled by a global scheduler, has attracted attention in multiprocessor real-time systems research recently. In this paper, assuming that an optimal global scheduler is adopted within each cluster, we investigate the worst-case utilization bounds for cluster scheduling with different task allocation/partitioning heuristics. First, we develop a lower limit on the utilization bounds for cluster scheduling with any reasonable task allocation scheme. Then, the lower limit is shown to be the exact utilization bound for cluster scheduling with the worst-fit task allocation scheme. For other task allocation heuristics (such as first-fit, best-fit, first-fit decreasing, best-fit decreasing and worst-fit decreasing), higher utilization bounds are derived for systems with both homogeneous clusters (where each cluster has the same number of processors) and heterogeneous clusters (where clusters have different number of processors). In addition, focusing on an efficient optimal global scheduler, namely the boundary-fair (Bfair) algorithm, we propose a period-aware task allocation heuristic with the goal of reducing the scheduling overhead (e.g., the number of scheduling points, context switches and task migrations). Simulation results indicate that the percentage of task sets that can be scheduled is significantly improved under cluster scheduling even for small-size clusters, compared to that of the partitioned scheduling. Moreover, when comparing to the simple generic task allocation scheme (e.g., first-fit), the proposed period-aware task allocation heuristic markedly reduces the scheduling overhead of cluster scheduling with the Bfair scheduler. 相似文献

20.

Randomized mechanism design for decentralized network scheduling

Jian Sun Dachuan Xu Deren Han Wenjing Hou 《Optimization methods & software》2020,35(4):722-740

ABSTRACT

In the network scheduling, jobs (tasks) must be scheduled on uniform machines (processors) connected by a complete graph so as to minimize the total weighted completion time. This setting can be applied in distributed multi-processor computing environments and also in operations research. In this paper, we study the design of randomized decentralized mechanism in the setting where a set of non-preemptive jobs select randomly a machine from a set of uniform machines to be processed on, and each machine can process at most one job at a time. We introduce a new concept of myopic Bayes–Nash incentive compatibility which weakens the classical Bayes–Nash incentive compatibility and derive a randomized decentralized mechanism under the assumption that each job is a rational and selfish agent. We show that our mechanism can induce jobs to report truthfully their private information referred to myopic Bayes–Nash implementability by using a graph theoretic interpretation of the incentive compatibility constraints. Furthermore, we prove that the performance of this mechanism is asymptotically optimal. 相似文献