首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
网格计算中的资源是动态和异构的,常规的静态作业调度方法不适宜网格计算环境,对于网格计算中一类并行计算的有效执行有赖于网格资源(CPU和网络带宽等)与作业的有效匹配。提出了一种基于资源预测结果对作业进行调度的策略,首先阐述了网格主机负载预测的研究成果——IAR模型,并提出了一种预测网络带宽的工具——网络性能平面,利用资源预测结果构造了一种反馈作业调度模型并对一类基于时间平衡的作业进行实验。结果表明,该模型在与其他诸多方法比较中,取得了执行时间较短和稳定性较好的效果。  相似文献   

2.
网格计算中对资源的有效预测能很好的改进任务分配和作业调度的策略,提高它们的执行效率,作为网格资源预测的核心?主机负载的预测显得尤为重要。文中提出了一种基于AR改进的主机负载预测模型,它不仅具有AR模型本身的计算成本小、预测性能稳定的优点,还对AR模型只对未来某个固定时间段的负载预测进行了改进,使之能根据作业的预测执行时间进行主机负载动态预测,同时该改进模型还充分体现了主机负载变化的自相似性和长期依赖性,实验结果表明,该模型达到了预期的效果。  相似文献   

3.
传统基于用户预估的执行时间通常准确性较差。结合分类和基于实例的学习方法,综合使用模板相似和数值相似方法,在历史调度数据中获取当前作业的相似作业,并使用其历史信息预测当前作业执行时间。使用调度历史中的用户名、分组名、队列名、应用名、用户请求处理器数、用户请求(预估)执行时间和用户请求内存量等属性进行训练和预测,算法中涉及的参数使用遗传算法确定。数值实验表明,相较于已有文献,本方法在使用更少参数的前提下得到了与文献结果中相近的低估率,并获得了更低的平均绝对误差。在HPC2N04和HPC2N05日志数据集上,平均绝对误差分别降低了43%和77%。研究了使用在线预测替换用户估计对作业调度的影响,对结果进行了初步分析并指出了今后的改进方向。  相似文献   

4.
为解决Hadoop云平台下作业无法满足时间约束的问题,提出一种基于资源预估的作业调度算法。通过建立资源预估模型计算作业所需资源,然后结合作业间的资源竞争关系对完成时间进行判定,最后根据作业的数据本地性改进延迟调度策略。实验结果表明,本文算法能够满足作业对时间约束的需求,提升系统的资源利用率。  相似文献   

5.
基于蚁群优化算法的服务网格的作业调度   总被引:9,自引:0,他引:9  
提出了利用蚁群算法来优化服务网格的作业调度系统的方法和一个两层的作业调度模型,该模型可以在网格的动态和异构环境下实现对作业执行时间的预测,然后根据作业的预测执行时间并利用蚁群优化算法使适应函数取得最小值,从而得到最优化的作业调度。基于开发的校园网格实验床,通过实验显示该方法可以优化服务网格的性能,减少作业的平均执行时问,提高系统的吞吐率。  相似文献   

6.
日益增多的应用部署在云端使得云数据中心的功耗波动剧烈,从而导致云数据中心资源利用率不平衡,高效的负载预测是解决该问题的关键技术。针对目前负载预测模型预测精度低、预测时间长的问题,建立一种基于门控循环单元(GRU)与长短期记忆(LSTM)网络的组合预测模型GRU-LSTM。该模型的网络结构包括3层,第一层采用GRU,利用GRU参数少、易收敛的特点减少模型训练时间,第二、第三层采用LSTM,结合LSTM参数多的优势提高模型的预测精度。在此基础上,对数据集作缺失值处理和标准化处理,使用随机森林算法对原始序列进行特征选择后得到一组新的序列值,将该序列值作为GRU-LSTM组合预测模型的输入,以对云计算资源进行高效预测。在集群公开数据集Cluster-trace-v2018上进行实验,结果表明,与传统的单一预测模型ARIMA、LSTM、GRU以及现有的组合预测模型ARIMA-LSTM、Refined LSTM等相比,GRU-LSTM模型预测结果的均方误差减少6~9,预测时间平均缩短约10%。  相似文献   

7.
为了满足云计算环境下用户服务质量(QoS)需求和提高虚拟资源空闲时间段的利用率,提出了一种基于任务复制的多维QoS任务调度策略。首先,构建云资源模型和用户QoS模型,然后根据虚拟资源的利用情况和QoS的满意度对虚拟机进行性能测评,选择综合性能更高的虚拟资源进行任务的分配;在任务执行时为了缩短任务的完成时间,在调度过程中引入了在空闲时间段复制父任务的方式。通过仿真实验将该算法与HEFT、CPOP进行比较,实验结果显示:当用户偏好可靠性执行时,该算法平均可靠性比HEFT和CPOP高;当用户偏好完成时间和费用花费执行时,该算法平均完成时间比HEFT和CPOP少;当用户无偏好执行时,该算法平均完成时间和平均花费均比HEFT和CPOP少。结果表明该算法能有效提高资源利用率和用户的满意度。  相似文献   

8.
为了解决当前Hadoop集群在异构资源环境下固有的调度分配方法的不足,提出了一种基于节点能力的自适应调度算法NCAS(node capacity adaptive scheduling)。首先,NCAS算法根据节点性能、任务特征计算得到调度因子;然后,由调度因子确定各节点应分得的数据量与任务槽数;最后,将数据和任务多分给快节点同时少分给慢节点。实验结果表明,与传统的调度算法相比,NCAS算法大幅度减少了备份任务的启动数量,明显减少了作业完成时间,提升了任务执行效率。  相似文献   

9.
针对采用MapReduce模型的大数据分析作业的调度问题进行深入研究,并分析现有任务调度算法的缺陷,现有算法没有考虑资源分配对于作业截止时间的影响,也未考虑不同类型作业截止时间的敏感性问题。因作业的完成时间随着分配资源的不同而改变,故称之为弹性作业,截止时间敏感性是指不同类型作业对截止时间要求的严格程度不同。针对以上问题,提出一种截止时间感知的弹性作业调度算法(DA)。该算法将作业依据截止时间敏感程度进行分类,在基于作业整体执行时间预测的基础上,通过调控不同的资源分配策略来改变作业完成时间,同时结合用户对于截止时间的需求及作业预执行的收益来提前规划作业的资源分配及调度次序使得整体收益最大化。将算法在仿真拥有210个物理节点的集群中进行实验,实验表明该算法满足了截止时间的限制并使得作业整体收益值平均提高了2.37倍。  相似文献   

10.
针对在共享集群中进行任务调度时,无法兼顾任务的响应速度与任务完成时间的问题,提出一种基于截止时间的自适应调度算法。该算法以用户提交的截止时间为依据,根据任务的执行进度自适应地分配适当的计算资源。不同于传统调度方式里由用户提交固定资源参数,该算法在资源约束的情况下会对优先级高的任务进行抢占式调度以保证服务质量(QoS),并在抢占过程结束后额外分配资源补偿被抢占的任务。在Spark平台进行的任务调度实验结果显示,与另一种资源协调者(YARN)框架下的调度算法相比,所提算法能严格地控制短任务的响应速度,并使长作业的任务完成时间缩短35%。  相似文献   

11.
超级计算机的规模不断扩大,与此同时,科学应用的复杂性也在不断增加,这导致了超级计算机上许多作业失败。作业失败会造成资源浪费,排队作业等待时间延长,严重影响系统的执行效率。提前预测作业失败,就可以采取必要的措施提升系统资源利用率和系统执行效率,这对未来的E级超级计算机至关重要。为此,尝试研究从已知的传统特征和构建特征中预测作业失败,发现能够反映用户工作行为模式和提交行为模式的特征及处理方式。通过结合行为特征和传统特征,提出基于树结构模型的综合框架来预测作业失败。实验结果表明,预测效果优于其他相关方法。  相似文献   

12.
The effectiveness of distributed execution of computationally intensive applications (jobs) largely depends on the quality of the applied scheduling approach. However, most of the existing non-trivial scheduling algorithms rely on prior knowledge or on prediction of application parameters, such as execution time, size of input and output, dependencies, etc., to assign applications to the available computational resources. A major issue is that these parameters are hard to determine in advance, especially if the end user does not possess an extensive history of previous application runs. In this work we propose an online method for execution time prediction of applications, for which execution progress can be collected at run-time. Using dynamic progress information, the total job execution time can be predicted using extrapolation. However, the predictions achieved by extrapolation are far from precise and often vary over time as a result of changing application dynamics and varying resource load. Therefore, to compute the actual job execution time we match a number of predefined prediction evolution models against the consecutive extrapolations, by adopting nonlinear curve-fitting. The ??best-fit?? coefficients allow for more accurate execution time prediction. The predictions made are used to enhance a dynamic scheduling algorithm for workflows introduced in our earlier work. The scheduling algorithm is run with and without curve-fitting, showing a performance improvement of up to 15% in the former case.  相似文献   

13.

In recent years, various studies on OpenStack-based high-performance computing have been conducted. OpenStack combines off-the-shelf physical computing devices and creates a resource pool of logical computing. The configuration of the logical computing resource pool provides computing infrastructure according to the user’s request and can be applied to the infrastructure as a service (laaS), which is a cloud computing service model. The OpenStack-based cloud computing can provide various computing services for users using a virtual machine (VM). However, intensive computing service requests from a large number of users during large-scale computing jobs may delay the job execution. Moreover, idle VM resources may occur and computing resources are wasted if users do not employ the cloud computing resources. To resolve the computing job delay and waste of computing resources, a variety of studies are required including computing task allocation, job scheduling, utilization of idle VM resource, and improvements in overall job’s execution speed according to the increase in computing service requests. Thus, this paper proposes an efficient job management of computing service (EJM-CS) by which idle VM resources are utilized in OpenStack and user’s computing services are processed in a distributed manner. EJM-CS logically integrates idle VM resources, which have different performances, for computing services. EJM-CS improves resource wastes by utilizing idle VM resources. EJM-CS takes multiple computing services rather than single computing service into consideration. EJM-CS determines the job execution order considering workloads and waiting time according to job priority of computing service requester and computing service type, thereby providing improved performance of overall job execution when computing service requests increase.

  相似文献   

14.
As the mean-time-between-failures (MTBF) continues to decline with the increasing number of components on large-scale high performance computing (HPC) systems, program failures might occur during the execution period with high probability. Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned. From the user perspective, if the program failure cannot be detected and handled in time, it would waste resources and delay the progress of program execution. Unfortunately, the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege. Currently, automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing. This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs. The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs. In addition, we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher (ARL) and evaluate it on the Tianhe-2 system. Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system. In addition, the communication and performance overhead caused by ARL is negligible. The good scalability of ARL makes it applicable for large-scale HPC systems.  相似文献   

15.
网格环境下的作业运行支持系统支持用户在网格资源上远程提交作业任务,执行科学计算应用程序,并管理运行着的作业任务.作业运行支持系统解决了计算执行环境的准备、状态监视汇报、运行时操纵和I/O支持等方面的关键问题.现有的几种主要的网格中间件系统均提供了作业执行和管理工具,很好地解决了几个主要问题,但并不能完全满足用户的需要,还需进一步改进与完善.  相似文献   

16.
新兴分布式计算框架Apache Flink支持在集群上执行大规模的迭代程序,但其默认的静态资源分配机制导致无法进行合理的资源配置来使迭代作业按时完成.针对这一问题,应该依靠用户来主动表达性能约束而不是被动地进行资源保留,故提出了一种基于运行时间预测的动态资源分配策略RABORP (resource allocation...  相似文献   

17.

MapReduce framework is an effective method for big data parallel processing. Enhancing the performance of MapReduce clusters, along with reducing their job execution time, is a fundamental challenge to this approach. In fact, one is faced with two challenges here: how to maximize the execution overlap between jobs and how to create an optimum job scheduling. Accordingly, one of the most critical challenges to achieving these goals is developing a precise model to estimate the job execution time due to the large number and high volume of the submitted jobs, limited consumable resources, and the need for proper Hadoop configuration. This paper presents a model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster. Moreover, a novel heuristic method is designed, which significantly reduces the makespan of the jobs. In this method, first by providing the job profiling tool, we obtain the execution details of the MapReduce phases through log analysis. Then, using machine learning methods and statistical analysis, we propose a relevant model to predict runtime. Finally, another tool called job submission and monitoring tool is used for calculating makespan. Different experiments were conducted on the benchmarks under identical conditions for all jobs. The results show that the average makespan speedup for the proposed method was higher than an unoptimized case.

  相似文献   

18.
研究多集群的管理模式,设计支持多集群的用户决策和系统决策两种调度方式,提出基于PBS的多集群作业调度MCJSS架构。利用PBS的扩展接口,实现MCJSS的核心模块,管理多集群作业提交、作业转发和负载信息收集。给出了多集群间基于预测最轻负载转发的调度策略。运用两层负载信息收集策略服务于转发机制,利用组合单项负载的阈值来判断转发时机和目的集群。实验证明,相同的作业在相同规模但是不同组织形式的多集群和单集群组织模式下,多集群的系统吞吐量大于单集群组织模式。  相似文献   

19.
A resource broker with a user-friendly interface for job submission developed on a platform constructed using the Globus toolkit is proposed. The broker employs a domain-based network information model and dynamic version to measure network statuses, and also monitors and collects resource statuses and network-related information as the basis of its brokerage. A network bandwidth-aware job scheduling algorithm for brokering suitable Grid resources to communication-intensive jobs based on improving and preserving the advantages of our previously developed network information model is also proposed. Using timely information, the resource broker effectively matches Grid resources and user requests, thus improving job execution efficiency.  相似文献   

20.
In recent years, the demand for real-time data processing has been increasing, and various stream processing systems have emerged. When the amount of data input to the stream processing system fluctuates, the computing resources required by the stream processing job will also change. The resources used by stream processing jobs need to be adjusted according to load changes, avoiding the waste of computing resources. At present, existing works adjust stream processing jobs based on the assumption that there is a linear relationship between the operator parallelism and operator resource consumption (e.g., throughput), which makes a significant deviation when the operator parallelism increases. This paper proposes a nonlinear model to represent operator performance. We divide the operator performance into three stages, the Non-competition stage, the Non-full competition stage, and the Full competition stage. Using our proposed performance model, given the parallelism of the operator, we can accurately predict the CPU utilization and operator throughput. Evaluated with actual experiments, the prediction error of our model is below 5%. We also propose a quick accurate auto-scaling (QAAS) method that uses the operator performance model to implement the auto-scaling of the operator parallelism of the Flink job. Compared to previous work, QAAS is able to maintain stable job performance under load changes, minimizing the number of job adjustments and reducing data backlogs by 50%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号