首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
分布式大数据计算引擎是科研机构、互联网企业和政府部门处理大规模数据必不可少的工具,它们的使用和推广促进了各个领域的快速发展,为社会进步做出了巨大贡献。但是,在多作业处理的情况下,目前主流的大数据计算引擎在资源分配和作业调度方面仍有许多不足之处,它们通常对多作业平均划分内存资源并以先进先出FIFO的方式调度作业,这样简单的资源划分方式和作业调度机制并不能充分利用系统性能。针对此问题,从计算引擎的作业层面做出了改进:在资源划分方面,通过提取作业特征对作业的任务量进行预估,判断作业任务量和作业预分配资源间的差异,合并对集群资源浪费较高的作业,充分利用计算资源;在作业调度方面,对作业池中的作业进行特征提取,使用多路K-means算法对作业进行聚类分析,然后基于分析的结果,使用自平衡轮询调度算法对作业进行调度,达到负载均衡的目的。为了验证所提算法的有效性,使用大规模文本数据集在分布式集群环境中进行对比实验,实验结果表明,提出的作业合并算法和多作业调度算法可以减少5%~23%的作业运行时间,提高了7.5%~29%的系统吞吐量,在最好情况下可减少40%的线程启动数。  相似文献   

2.
高能物理计算平台中的HTCondor和SLURM计算集群为多个高能物理实验提供数据处理服务,然而HTCondor并行作业调度效率较低、SLURM难以应对大量串行作业,且计算平台整体资源管理及调度策略过于简单。为满足高能物理计算集群高负荷运行的需求,在传统作业调度器上增加作业管理层,设计双层作业调度系统,通过高效调度串并行作业并兼顾实验组间资源的使用公平性,实现用户对作业的细粒度管理。测试结果表明,双层作业调度系统支持大批量高能物理作业的快速提交,并充分利用计算平台的总体资源,具有较好的作业调度性能。  相似文献   

3.
本文简述了分布式无线集群调度监控系统的设计与实现方法,该系统通过RS232实现PC和集群控制器的通信,分别完成各个基站的数据采集、状态显示和控制的功能。异地和异构集群系统间通过一个专用网关进行通信,达到异地实时调度的目的,具有良好的实用性。  相似文献   

4.
文中针对多集群环境资源异构且地域分散、网络环境不可靠以及面向用户需求的特点,提出了一种采用消息模型的多集群作业管理方案。该方案采用全局一局部的层次调度方法,基于发布一订阅的消息模型,根据当前网络环境、用户作业的资源需求、各集群自身负载情况进行综合统一调度管理。实践证明,采用该方案设计实现的多集群作业管理系统实现了多集群环境下的资源监控、资源管理、作业调度、作业控制、数据管理等功能,有效解决了在资源异构及网络环境不可靠条件下的系统稳定性问题,显著提高了多集群系统作业吞吐能力。  相似文献   

5.
Storm on YARN是目前主流的分布式资源调度框架,但其存在需要人工干预和无法根据资源可用性实时调整系统资源的不足。根据流数据处理的实时延迟计算系统负载情况,在Storm平台上基于YARN设计分布式资源调度和协同分配系统。建立包含系统层和任务层的双层调度模型,系统层通过对流数据处理负载的实时监测进行资源分配预测,任务层利用ZooKeeper和YARN对集群资源的高效管理能力进行动态资源管理。实验结果表明,该系统可以实时调整集群资源分布,有效减小系统延迟。  相似文献   

6.
研究多集群的管理模式,设计支持多集群的用户决策和系统决策两种调度方式,提出基于PBS的多集群作业调度MCJSS架构。利用PBS的扩展接口,实现MCJSS的核心模块,管理多集群作业提交、作业转发和负载信息收集。给出了多集群间基于预测最轻负载转发的调度策略。运用两层负载信息收集策略服务于转发机制,利用组合单项负载的阈值来判断转发时机和目的集群。实验证明,相同的作业在相同规模但是不同组织形式的多集群和单集群组织模式下,多集群的系统吞吐量大于单集群组织模式。  相似文献   

7.
航空制造业分布式协同生产计划模式研究*   总被引:3,自引:0,他引:3  
分析了现有生产计划管理模式的不足,提出了一个基于协同的系统理念能够整合航空制造企业内各分布资源的优化模型。该模型能够描述企业内生产过程中复杂的供需关系,进行详细的物流与负荷平衡,可在任务组合分配、分布式生产计划与作业调度计划进行优化的过程中协调各优化模块间的冲突,平衡各模块局部利益的同时实现企业整体利益最大化。对本模型中的面向负荷均衡的协同任务规划、一体化的分布式协同生产计划与作业调度优化、协商协调机制关键技术进行了讨论。  相似文献   

8.
《软件》2017,(1):77-80
作业调度一直是大数据技术研究的热点,关于分布式集群上的调度优化的探讨一直没有停过。本文对比分析对比静态分配调度、均匀分配调度、资源感知调度和就近调度算法,提出差异化作业调度管理技术,并把它应用到分布式实时处理系统Storm当中。经过实验验证,该调度算法能对Storm集群中不同作业任务,进行差异化管理。  相似文献   

9.
各种类型的大数据计算框架存在各自专用的管理方法。传统的监控和调度服务在异构环境下的操作 由于无法获取集群整体的运行状态而受到限制,且无法综合多粒度的运行时资源状态来调度不同的计算作业。这不仅浪费了集群的可用资源,而且增加了计算作业的等待时间。针对上述两个问题,提出了一种面向异构大数据计算框架的一体化监控及动态调度管理服务。该服务可以自动适应并监控多种类型的大数据计算框架及计算作业,并对多类型作业提供一体化调度。针对Hadoop和Storm两种计算框架,实现了原型系统并进行了实验。实验结果表明,所提服务在异构环境下的大数据计算框架中能降低人工操作的复杂度,并且能提高作业的调度效率。  相似文献   

10.
计算网格中作业调度系统GridPBS的设计与实现   总被引:3,自引:0,他引:3  
孙帅  杨凡  李万城  董小社 《计算机工程》2006,32(9):107-108,111
通过对目前流行的集群调度系统PBS进行封装和扩展,设计并实现了一个计算网格作业调度系统GridPBS。该系统可使PBS系统的应用拓展到整个计算网格环境,克服了现有PBS系统只能在单个集群节点上运行的局限性。根据计算网格中的集群资源运行情况对用户作业进行凋度分配,从而有效地整合和利用计算网格各节点的计算能力。  相似文献   

11.
通过比较目前几种集群任务调度算法,选择基于集中队列的动态调度算法——梯形自调度算法(TSS),实现了一个基于集群的任务调度系统GTS。GTS是一个用户级的任务调度系统,它建立于Linux之上,主要负责对用户任务的调度。通过测试,GTS系统可以胜任集群工作站中的任务调度并在结点数较多的情况下大幅度提高应用问题的性能。  相似文献   

12.
Computational grids that couple geographically distributed resources such as PCs, workstations, clusters, and scientific instruments, have emerged as a next generation computing platform for solving large-scale problems in science, engineering, and commerce. However, application development, resource management, and scheduling in these environments continue to be a complex undertaking. In this article, we discuss our efforts in developing a resource management system for scheduling computations on resources distributed across the world with varying quality of service (QoS). Our service-oriented grid computing system called Nimrod-G manages all operations associated with remote execution including resource discovery, trading, scheduling based on economic principles and a user-defined QoS requirement. The Nimrod-G resource broker is implemented by leveraging existing technologies such as Globus, and provides new services that are essential for constructing industrial-strength grids. We present the results of experiments using the Nimrod-G resource broker for scheduling parametric computations on the World Wide Grid (WWG) resources that span five continents.  相似文献   

13.
网格任务调度是一个NP-hard问题,而且是并行与分布式计算中一个必不可少的组成部分,特别是在网格计算环境中任务调度更加复杂。提出了一种基于人工鱼群算法的网络任务调度策略,通过鱼群的觅食、聚群、追尾等方式,实现网格任务的有效调度。  相似文献   

14.
We introduce a middleware infrastructure that provides software services for developing and deploying high-performance parallel programming models and distributed applications on clusters and networked heterogeneous systems. This middleware infrastructure utilizes distributed agents residing on the participating machines and communicating with one another to perform the required functions. An intensive study of the parallel programming models in Java has helped identify the common requirements for a runtime support environment, which we used to define the middleware functionality. A Java-based prototype, based on this architecture, has been developed along with a Java object-passing interface (JOPI) class library. Since this system is written completely in Java, it is portable and allows executing programs in parallel across multiple heterogeneous platforms. With the middleware infrastructure, users need not deal with the mechanisms of deploying and loading user classes on the heterogeneous system. Moreover, details of scheduling, controlling, monitoring, and executing user jobs are hidden, while the management of system resources is made transparent to the user. Such uniform services are essential for facilitating the development and deployment of scalable high-performance Java applications on clusters and heterogeneous systems. An initial deployment of a parallel Java programming model over a heterogeneous, distributed system shows good performance results. In addition, a framework for the agents' startup mechanism and organization is introduced to provide scalable deployment and communication among the agents.  相似文献   

15.
提出一种GPU集群下用户服务质量QoS感知的深度学习研发平台上的动态任务调度方法.采用离线评估模块对深度学习任务进行离线评测并构建计算性能预测模型.在线调度模块基于性能预测模型,结合任务的预期QoS,共同开展任务放置和任务执行顺序的调度.在一个分布式GPU集群实例上的实验表明,该方法相比其他基准策略能够实现更高的QoS保证率和集群资源利用率.  相似文献   

16.
We present a framework for a parallel programming model by remote procedure calls, which bridge large-scale computing resource pools managed by multiple Grid-enabled job scheduling systems. With this system, the user can exploit not only remote servers and clusters, but also the computing resources provided by Grid-enabled job scheduling systems located on different sites. This framework requires a Grid remote procedure call (RPC) system to decouple the computation in a remote node from the Grid RPC mechanism and uses document-based communication rather than connection-based communication. We implemented the proposed framework as an extension of the OmniRPC system, which is a Grid RPC system for parallel programming. We designed a general interface to easily adapt the OmniRPC system to various Grid-enabled job scheduling systems, including XtremWeb, CyberGRIP, Condor and Grid Engine. We show the preliminary performance of these implementations using a phylogenetic application. We found that the proposed system can achieve approximately the same performance as OmniRPC and can handle interruptions in worker programs on remote nodes. Yoshihiro Nakajima is a Research Fellow of the Japan Society for the Promotion of Science  相似文献   

17.
In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are distributed. These sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site capped by administrative policies. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naïve approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity. We propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications using the Swift parallel and distributed execution framework. We use two distinct computational environments-geographically distributed multiple clusters and multiple clouds. We show that our approach improves the resource utilization and reduces execution time when compared to the default schedule.  相似文献   

18.
李蕾  李玲 《图学学报》2018,39(1):30
为实现对等架构的低成本视频流传输和实时播放要求,提出基于请求下降叠加选 取的分布式P2P 视频点播调度算法。首先,基于叠加技术构建P2P 视频点播的技术指标,充分 考虑输入邻域节点、输出邻域节点和媒体服务器负载3 组优化指标,构建叠加架构和分布式算 法流程;其次,利用请求下降策略对发送节点和服务节点选取进行改进,解决可能出现的带宽 低利用率和无效的视频播放问题;最后,通过BitTorrent 视频点播系统对所提算法的有效性进 行了验证。  相似文献   

19.
To maintain quality of service, some heavily trafficked Web sites use multiple servers, which share information through a shared file system or data space. The Andrews file system (AFS) and distributed file system (DFS), for example, can facilitate this sharing. In other sites, each server might have its own independent file system. Although scheduling algorithms for traditional distributed systems do not address the special needs of Web server clusters well, a significant evolution in the computational approach to artificial intelligence and cognitive engineering shows promise for Web request scheduling. Not only is this transformation - from discrete symbolic reasoning to massively parallel and connectionist neural modeling - of compelling scientific interest, but also of considerable practical value. Our novel application of connectionist neural modeling to map Web page requests to Web server caches maximizes hit ratio while load balancing among caches. In particular, we have developed a new learning algorithm for fast Web page allocation on a server using the self-organizing properties of the neural network (NN).  相似文献   

20.
基于P2P的分布式VOD系统的研究   总被引:4,自引:0,他引:4  
本文构建了基于P2P的分布式VOD系统的应用模型,并以JXTA和Windows Media作为技术支持,对系统中的相应关键问题提出了解决方案及其实现方法,力争用尽量小的资源消耗来提供保障QOS的分布式VOD服务。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号