首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands  相似文献   

2.
虚拟化技术作为一种新的资源管理技术,正在高能物理领域得到越来越广泛的应用。静态虚拟机集群方式已经逐渐不能满足多作业队列对于计算资源动态的需求。为此,实现了一种云计算环境下面向多作业队列的弹性计算资源管理系统。系统通过高吞吐量计算系统HTCondor运行计算作业,使用开源的云计算平台Openstack管理虚拟计算节点,给出了一种结合虚拟资源配额服务,基于双阈值的弹性资源管理算法,实现资源池整体伸缩,同时设计了二级缓冲池以提高伸缩效率。目前系统已部署在高能所公共服务云IHEPCloud上,实际运行结果表明,当计算资源需求变化时系统能够动态调整各队列虚拟计算节点数量,同时计算资源的CPU利用率相比传统的资源管理方式有显著的提高。  相似文献   

3.
The performance of cluster computing depends on how concurrent jobs share multiple data center resource types such as CPU, RAM and disk storage. Recent research has discussed efficiency and fairness requirements and identified a number of desirable scheduling objectives including so-called dominant resource fairness (DRF). We argue here that proportional fairness (PF), long recognized as a desirable objective in sharing network bandwidth between ongoing data transfers, is preferable to DRF. The superiority of PF is manifest under the realistic modeling assumption that the population of jobs in progress is a stochastic process. In random traffic the strategy-proof property of DRF proves unimportant while PF is shown by analysis and simulation to offer a significantly better efficiency–fairness tradeoff.  相似文献   

4.
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users. Such resources are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail unexpectedly, as resources become unavailable. To improve this situation, we consider methods to predict resource availability. This paper presents empirical studies on resource availability in FGCS systems and a prediction method. From studies on resource contention among guest jobs and local users, we derive a multi-state availability model. The model enables us to detect resource unavailability in a non-intrusive way. We analyzed the traces collected from a production FGCS system for 3 months. The results suggest the feasibility of predicting resource availability, and motivate our method of applying semi-Markov Process models for the prediction. We describe the prediction framework and its implementation in a production FGCS system, named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves an accuracy of 86% on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource availability. We tested the effectiveness of the prediction in a proactive scheduler. Initial results show that applying availability prediction to job scheduling reduces the number of jobs failed due to resource unavailability. This work was supported, in part, by the National Science Foundation under Grants No. 0103582-EIA, 0429535-CCF, and 0650016-CNS. We thank Ruben Torres for his help with the reference prediction algorithms used in our experiments.  相似文献   

5.
Cooperative resource sharing enables distinct organizations to form a federation of computing resources. The motivation behind cooperation is that organizations are likely to serve each other by trading unused CPU cycles given the existence of irregular usage patterns of their local resources. In this way, resource sharing would enable organizations to purchase resources at a feasible level while meeting peak computational throughput requirements. This federation results in community grid that must be managed. A functional broker is deployed to facilitate remote resource access within the community grid. A major issue is the problem of correlations in job arrivals caused by seasonal usage and/or coincident resource usage demand patterns. These correlations incur high levels of burstiness in job arrivals causing the job queue of the broker to grow to an extent such that its performance becomes severely impaired. Since job arrivals cannot be controlled, management strategies must be employed to admit jobs in a manner that can sustain a fair level of resource allocation performance at all participating organizations in the community. In this paper, we present a theoretical analysis of the problem of job traffic burstiness on resource allocation performance in order to elicit the general job management strategies to be employed. Based on the analysis, we define and justify a job management strategies for the resource broker to cope with overload conditions caused by job arrival correlations.  相似文献   

6.
Contention in performance-critical shared resources affects performance and quality-of-service (QoS) significantly. While this issue has been studied recently in CMP architectures, the same problem exists in SoC architectures where the challenge is even more severe due to the contention of shared resources between programmable cores and fixed-function IP blocks. In the SoC environment, efficient resource sharing and a guarantee of a certain level of QoS are highly desirable. Researchers have proposed different techniques to support QoS, but most existing works focus on only one individual resource. Coordinated management of multiple QoS-aware shared resources remains an open problem. In this paper, we propose a class-of-service based QoS architecture (CoQoS), which can jointly manage three performance-critical resources (cache, NoC, and memory) in a NoC-based SoC platform. We evaluate the interaction between the QoS-aware allocation of shared resources in a trace-driven platform simulator consisting of detailed NoC and cache/memory models. Our simulations show that the class-of-service based approach provides a low-cost flexible solution for SoCs. We show that assigning the same class-of-service to multiple resources is not as effective as tuning the class-of-service of each resource while observing the joint interactions. This demonstrates the importance of overall QoS support and the coordination of QoS-aware shared resources.  相似文献   

7.
Cloud computing is emerging as an important platform for business, personal and mobile computing applications. In this paper, we study a stochastic model of cloud computing, where jobs arrive according to a stochastic process and request resources like CPU, memory and storage space. We consider a model where the resource allocation problem can be separated into a routing or load balancing problem and a scheduling problem. We study the join-the-shortest-queue routing and power-of-two-choices routing algorithms with the MaxWeight scheduling algorithm. It was known that these algorithms are throughput optimal. In this paper, we show that these algorithms are queue length optimal in the heavy traffic limit.  相似文献   

8.
为了避免多应用间的资源争用,Spark采用了FIFO、FAIR等作业调度策略,辅以SpreadOut和非SpreadOut两种资源调度算法,但是这些算法没有充分考虑用户作业类型和集群节点性能的相互关系。用户作业类型及节点性能偏向感知的资源调度算法ATNPA提出了对该问题的解决方案。ATNPA根据作业运行所需的内存量和CPU核数将用户作业分为CPU密集型和内存密集型。节点的性能偏向性由节点的静态因素和动态因素决定。静态因素包括CPU速度、内存大小、CPU核数和磁盘容量等;动态因素包括CPU剩余率、内存剩余率、磁盘剩余率和磁盘读写速度等。ATNPA算法在进行资源分配时,能够将作业分配到最适合其类型的节点上。仿真实验表明,与未考虑节点和作业匹配的算法相比较,ATNPA能够有效缩短作业的执行时间、提高集群的性能。  相似文献   

9.
汤小春  赵全  符莹  朱紫钰  丁朝  胡小雪  李战怀 《软件学报》2022,33(12):4704-4726
Dataflow模型的使用,使得大数据计算的批处理和流处理融合为一体.但是,现有的针对大数据计算的集群资源调度框架,要么面向流处理,要么面向批处理,不适合批处理与流处理作业共享集群资源的需求.另外,GPU用于大数据分析计算时,由于缺乏有效的CPU-GPU资源解耦方式,降低了资源使用效率.在分析现有的集群资源调度框架的基础上,设计并实现了一种可以感知批处理/流处理应用的混合式资源调度框架HRM.它以共享状态架构为基础,采用乐观封锁协议和悲观封锁协议相结合的方式,确保流处理作业和批处理作业的不同资源要求.在计算节点上,提供CPU-GPU资源的灵活绑定,采用队列堆叠技术,不但满足流处理作业的实时性需求,也减少了反馈延迟并实现了GPU资源的共享.通过模拟大规模作业的调度,结果显示, HRM的调度延迟只有集中式调度框架的75%左右;使用实际负载测试,批处理与流处理共享集群时,使用HRM调度框架, CPU资源利用率提高25%以上;而使用细粒度作业调度方法,不但GPU利用率提高2倍以上,作业的完成时间也能够减少50%左右.  相似文献   

10.
In an enterprise grid computing environments, users have access to multiple resources that may be distributed geographically. Thus, resource allocation and scheduling is a fundamental issue in achieving high performance on enterprise grid computing. Most of current job scheduling systems for enterprise grid computing provide batch queuing support and focused solely on the allocation of processors to jobs. However, since I/O is also a critical resource for many jobs, the allocation of processor and I/O resources must be coordinated to allow the system to operate most effectively. To this end, we present a hierarchical scheduling policy paying special attention to I/O and service-demands of parallel jobs in homogeneous and heterogeneous systems with background workload. The performance of the proposed scheduling policy is studied under various system and workload parameters through simulation. We also compare performance of the proposed policy with a static space–time sharing policy. The results show that the proposed policy performs substantially better than the static space–time sharing policy.  相似文献   

11.
针对志愿计算中资源分配的特点,结合新兴古典经济学中的动态分析理论,提出了一种基于超边际分析的志愿计算资源分配方法,并使用比较优势对资源分配进行调整。模型适用于志愿计算环境中多个计算节点分别执行多种服务的情况,针对网络条件变化动态地进行响应、调整分配策略。  相似文献   

12.
传统的包调度方案能实现单一资源分配的公平性,但不能直接应用于主动网络中的主动节点上,因为主动节点包含多种相互关联的资源。提出一种能实现在主动节点中公平分配资源的包调度算法。仿真实验结果显示,该算法能在主动流之间实现CPU和带宽资源的有效公平分配。  相似文献   

13.
Job scheduling typically focuses on the CPU with little work existing to include I/O or memory. Time-shared execution provides the chance to hide I/O and long-communication latencies though potentially creating a memory conflict. Hyperthreaded CPUs support coscheduling without any context switches and provide additional options for CPU-internal resource sharing. We present an approach that includes all possible resources into the schedule optimization and improves utilization by coscheduling two jobs if feasible. Our LOMARC approach partially reorders the queue by lookahead to increase the potential to find good matches. In simulations based on the workload model of Lublin and Feitelson, we have obtained improvements between 30 percent and 50 percent in both response times and relative bounded response times on hyperthreaded CPUs (i.e., cut times to two third or to half)  相似文献   

14.
Cluster computing is an attractive approach to provide high‐performance computing for solving large‐scale applications. Owing to the advances in processor and networking technology, expanding clusters have resulted in the system heterogeneity; thus, it is crucial to dispatch jobs to heterogeneous computing resources for better resource utilization. In this paper, we propose a new job allocation system for heterogeneous multi‐cluster environments named the Adaptive Job Allocation Strategy (AJAS), in which a self‐scheduling scheme is applied in the scheduler to dispatch jobs to the most appropriate computing resources. Our strategy focuses on increasing resource utility by dispatching jobs to computing nodes with similar performance capacities. By doing so, execution times among all nodes can be equalized. The experimental results show that AJAS can improve the system performance. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

15.
一种新的网格环境模型——TGrid Model   总被引:1,自引:0,他引:1  
在分析了现有网格环境不足的基础上,提出一种新的网格环境模型——基于树形结构的网格体系与环境TGrid,支持高性能计算,面向主题的资源共享和新一代的需求建模。它以树结构来组织网格节点和集成各种资源,实现了自底向上、多级、面向需求的资源抽象和多种资源融合。而且树型结构符合自然层次组织关系,容易实现网格系统的层次化管理,有利于减轻中心节点的负载和实现大规模应用的负载平衡,提高资源查找效率。同时,TGrid以虚拟资源的形式实现网格资源的共享,利用分布式JVM(TJVM)虚拟网格节点上CPU和主存资源,利用多数据库中间件(TDOD)实现数据库级资源集成和共享,利用Globus网格服务(GService)实现其他软件和数据资源共享。该树型网格为日益增长的网格应用的需求提供了新的解决方案。  相似文献   

16.
With cloud and utility computing models gaining significant momentum, data centers are increasingly employing virtualization and consolidation as a means to support a large number of disparate applications running simultaneously on a chip-multiprocessor (CMP) server. In such environments, contention for shared platform resources (CPU cores, shared cache space, shared memory bandwidth, etc.) can have a significant effect on each virtual machine’s performance. In this paper, we investigate the shared resource contention problem for virtual machines by: (a) measuring the effects of shared platform resources on virtual machine performance, (b) proposing a model for estimating shared resource contention effects, and (c) proposing a transition from a virtual machine (VM) to a virtual platform architecture (VPA) that enables transparent shared resource management through architectural mechanisms for monitoring and enforcement. Our measurement and modeling experiments are based on a consolidation benchmark (vConsolidate) running on a state-of-the-art CMP server. Our virtual platform architecture experiments are based on detailed simulations of consolidation scenarios. Through detailed measurements and simulations, we show that shared resource contention affects virtual machine performance significantly and emphasize that virtual platform architectures is a must for future virtualized datacenters.  相似文献   

17.
Many visual tasks in modern personal devices such smartphones resort heavily to graphics processing units (GPUs) for their fluent user experiences. Because most GPUs for embedded systems are non-preemptive by nature, it is important to schedule GPU resources efficiently across multiple GPU tasks. We present a novel spatial resource sharing (SRS) technique for GPU tasks, called a budget-reservation spatial resource sharing (BR-SRS) scheduling, which limits the number of GPU processing cores for a job based on the priority of the job. Such a priority-driven resource assignment can prevent a high-priority foreground GPU task from being delayed by background GPU tasks. The BR-SRS scheduler is invoked only twice at the arrival and completion of jobs, and thus, the scheduling overhead is minimized as well. We evaluated the performance of our scheduling scheme in an Android-based smartphone, and found that the proposed technique significantly improved the performance of high-priority tasks in comparison to the previous temporal budget-based multi-task scheduling.  相似文献   

18.
Loosely coordinated (implicit/dynamic) coscheduling is a time‐sharing approach that originates from network of workstations environments of mixed parallel/serial workloads and limitedsoftware support. It is meant to be an easy‐to‐implement and scalable approach. Considering that the percentage of clusters in parallel computing is increasing and easily portable software is needed, loosely coordinated coscheduling becomes an attractive approach for dedicated machines. Loose coordination offers attractive features as a dynamic approach. Static approaches for local job scheduling assign resources exclusively and non‐preemptively. Such approaches still remain beyond the desirable resource utilization and average response times. Conversely, approaches for dynamic scheduling of jobs can preempt resources and/or adapt their allocation. They typically provide better resource utilization and response times. Existing dynamic approaches are full preemption with checkpointing, dynamic adaptation of node/CPU allocation, and time sharing via gang or loosely coordinated coscheduling. This survey presents and compares the different approaches, while particularly focusing on the less well‐explored loosely coordinated time sharing. The discussion particularly focuses on the implementation problems, in terms of modification of standard operating systems, the runtime system and the communication libraries. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

19.
教育资源共享网络体系结构及其关键策略   总被引:2,自引:0,他引:2  
针对中小学的教育资源共享问题,设计了教育资源共享网络体系结构,并对各层节点的功能进行了定义.从网络拓扑结构和通信效率两个角度来探讨降低网络延迟时间和提高网络带宽利用率的方法和措施.首先,在管理节点之间构建逻辑RP(k)网络,然后基于RP(k)网络给出了一系列提高系统服务质量的关键策略,包括节点的加入/离开策略、分布式资源检索策略以及节点数据的协同策略.通过理论分析比较,证实了采用RP(k)互联网络的优势和相关策略的有效性.  相似文献   

20.
高能物理数据由物理事例组成,事例之间没有相关性。可以通过大量作业同时处理大量不同的数据文件,从而实现高能物理计算任务的并行化,因此高能物理计算是典型的高吞吐量计算场景。高能所计算集群使用开源的TORQUE/Maui进行资源管理及作业调度,并通过将集群资源划分成不同队列以及限制用户最大运行作业数来保证公平性,然而这也导致了集群整体资源利用率非常低下。SLURM和HTCondor都是近年来流行的开源资源管理系统,前者拥有丰富的作业调度策略,后者非常适合高吞吐量计算,二者都能够替代老旧、缺乏维护的TORQUE/Maui,都是管理计算集群资源的可行方案。在SLURM和HTCondor测试集群上模拟大亚湾实验用户的作业提交行为,对SLURM和HTCondor的资源分配行为和效率进行了测试,并与相同作业在高能物理研究所TORQUE/Maui集群上的实际调度结果进行了对比,分析了SLURM及HTCondor的优势和不足,探讨了使用SLURM或HTCondor管理高能物理研究所计算集群的可行性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号