期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems

Prakash Murali Sathish Vadhiyar 《Concurrency and Computation》2016,28(9):2685-2710

Production parallel systems are space‐shared, and resource allocation on such systems is usually performed using a batch queue scheduler. Jobs submitted to the batch queue experience a variable delay before the requested resources are granted. Predicting this delay can assist users in planning experiment time‐frames and choosing sites with less turnaround times and can also help meta‐schedulers make scheduling decisions. In this paper, we present an integrated adaptive framework, Qespera, for prediction of queue waiting times on parallel systems. We propose a novel algorithm based on spatial clustering for predictions using history of job submissions and executions. The framework uses adaptive set of strategies for choosing either distributions or summary of features to represent the system state and to compare with history jobs, varying the weights associated with the features for each job prediction, and selecting a particular algorithm dynamically for performing the prediction depending on the characteristics of the target and history jobs. Our experiments with real workload traces from different production systems demonstrate up to 22% reduction in average absolute error and up to 56% reduction in percentage prediction error over existing techniques. We also report prediction errors of less than 1 h for a majority of the jobs. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

2.

多资源公平调度器在Hadoop 中的实现

马肖燕洪爵《集成技术》2012,1(3):66-71

目前Hadoop的作业调度算法都是将系统中的多类资源抽象成单一资源,分配给作业的资源均是节点资源中固定大小的一部分,称为插槽。这类基于插槽的算法没有考虑到系统多资源的差异性,忽略了不同类型作业对资源的不同需求,因此导致系统在吞吐量和平均作业完成时间上性能低下。本文研究了多资源环境下公平调度算法在Hadoop中的实现,设计了一种多资源公平调度器MFS(Multi-resource Fair Scheduler)。MFS采用了DRF(Dominant Resource Fairness)调度思想,使用需求向量来描述作业对各类资源的需求,并按照需求向量中各资源的大小给作业分配资源。MFS能更加充分有效地使用系统的各类资源,并能满足不同类型作业对资源的不同需求。实验表明相比于基于插槽的Fair Scheduler与Capacity Scheduler,MFS提高了系统的吞吐量,降低了平均作业完成时间。相似文献

3.

面向高性能计算环境的作业优化调度模型的设计与实现 总被引：1，自引：0，他引：1

王小宁肖海力曹荣强《计算机工程与科学》2017,39(4):619-626

高性能计算环境聚合了多个分布在不同地域、不同组织机构的高性能计算资源,面向用户提供统一的访问入口和使用方式,由系统中间件根据用户作业请求匹配合适的高性能计算资源。随着环境应用编程接口的开放以及作业请求数量的大幅增加,面对高并发作业提交请求时,目前采用的即时调度模型会由于网络等原因导致一定数量的请求处理失败,同时缺乏灵活性。针对此问题,优化了环境作业调度模型,引入作业环境队列,细化了作业系统层状态,增加了作业调度策略可配置性,并基于环境中间件SCE实现了系统原型。经测试,在单核心服务每分钟处理近200个作业提交请求的工作负载下,无因系统和网络原因引起的作业提交出错现象;在共计1 000个作业中,近500个作业提交命令请求在0.3s以内完成,800余个作业提交命令请求在0.5s以内完成。相似文献

4.

Cello: A Disk Scheduling Framework for Next Generation Operating Systems*

Shenoy Prashant Vin Harrick M. 《Real-Time Systems》2002,22(1-2):9-48

In this paper, we present the Cello disk scheduling framework for meeting the diverse service requirements of applications. Cello employs a two-level disk scheduling architecture, consisting of a class-independent scheduler and a set of class-specific schedulers. The two levels of the framework allocate disk bandwidth at two time-scales: the class-independent scheduler governs the coarse-grain allocation of bandwidth to application classes, while the class-specific schedulers control the fine-grain interleaving of requests. The two levels of the architecture separate application-independent mechanisms from application-specific scheduling policies, and thereby facilitate the co-existence of multiple class-specific schedulers. We demonstrate that Cello is suitable for next generation operating systems since: (i) it aligns the service provided with the application requirements, (ii) it protects application classes from one another, (iii) it is work-conserving and can adapt to changes in work-load, (iv) it minimizes the seek time and rotational latency overhead incurred during access, and (v) it is computationally efficient. 相似文献

5.

Predictability of Fixed-Job Priority schedulers on heterogeneous multiprocessor real-time systems

Liliana Cucu-Grosjean 《Information Processing Letters》2010,110(10):399-402

The multiprocessor Fixed-Job Priority (FJP) scheduling of real-time systems is studied. An important property for the schedulability analysis, the predictability (regardless to the execution times), is studied for heterogeneous multiprocessor platforms. Our main contribution is to show that any FJP schedulers are predictable on unrelated platforms. A convenient consequence is the fact that any FJP schedulers are predictable on uniform multiprocessors. 相似文献

6.

基于层次化调度策略和动态数据复制的网格调度方法 总被引：2，自引：0，他引：2

赖锦辉梁松《计算机应用研究》2014,31(2):412-416

针对在网格中如何有效地进行任务调度和数据复制, 以便减少任务执行时间等问题, 提出了任务调度算法（ISS）和优化动态数据复制算法（ODHRA）, 并构建一个方案将两种算法进行了有效结合。该方案采用ISS算法综合考虑任务等待队列的数量、任务需求数据的位置和站点的计算容量, 采用网络结构分级调度的方式, 配以适当的权重系数计算综合任务成本, 搜索出最佳计算节点区域; 采用ODHRA算法分析数据传输时间、存储访问延迟、等待在存储队列中的副本请求和节点间的距离, 在众多的副本中选取出最佳副本位置, 再结合副本放置和副本管理, 从而降低了文件访问时间。仿真结果表明, 提出的方案在平均任务执行时间方面, 与其他算法相比表现出了更好的性能。相似文献

7.

Priority scheduling service for E-commerce web servers

Muhammad Younas Irfan Awan Kuo-Ming Chao Jen-Yao Chung 《Information Systems and E-Business Management》2008,6(1):69-82

Service scheduling is one of the crucial issues in E-commerce environment. E-commerce web servers often get overloaded as they have to deal with a large number of customers’ requests—for example, browse, search, and pay, in order to make purchases or to get product information from E-commerce web sites. In this paper, we propose a new approach in order to effectively handle high traffic load and to improve web server’s performance. Our solution is to exploit networking techniques and to classify customers’ requests into different classes such that some requests are prioritised over others. We contend that such classification is financially beneficial to E-commerce services as in these services some requests are more valuable than others. For instance, the processing of “browse” request should get less priority than “payment” request as the latter is considered to be more valuable to the service provider. Our approach analyses the arrival process of distinct requests and employs a priority scheduling service at the network nodes that gives preferential treatment to high priority requests. The proposed approach is tested through various experiments which show significant decrease in the response time of high priority requests. This also reduces the probability of dropping high priority requests by a web server and thus enabling service providers to generate more revenue. 相似文献

8.

DCSACA: distributed constraint service-aware collaborative access algorithm based on large-scale access to the Internet of Things

Yi Meng Chen QingKui 《The Journal of supercomputing》2018,74(12):6408-6427

With the rapid development of the smart city and the Internet Plus-themed multi-network applications, it is becoming increasingly difficult for the data access center for the Internet of Things (DACIOT) to meet large-scale users’ service requirements with low latency and high quality while sending service access requests. This paper first converts the problem of a large number of access requests to DACIOT into a distributed constraint optimization problem. Then, in order to address the optimization problem, a dynamic multi-constraint service-aware collaborative access algorithm is proposed based on dynamic load feedback from the access nodes, which can effectively reduce network congestion through load feedback and improve access performance. The algorithm firstly defines the dynamic context load sensing model, which is able to detect the load metrics of access clusters and assist access servers to work together to improve the availability of DACIOT, then it uses a heuristic falling search algorithm to search for the optimal resource on the basis of this model, after which it analyzes the convergence of the access algorithm. Experimental results show that the algorithm can effectively improve the rate of success, lower the network delay of access requests and reduce network jitter when accessing DACIOT. 相似文献

9.

Dynamic cluster resource allocations for jobs with known andunknown memory demands

Li Xiao Songqing Chen Xiaodong Zhang 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(3):223-240

The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands 相似文献

10.

A NEW DISTRIBUTED JOB SCHEDULING ALGORITHM FOR GRID SYSTEMS

Javad Akbari Torkestani 《控制论与系统》2013,44(1):77-93

Job scheduling is one of the key issues in the design of grid environments. The performance of the grid system severely degrades if a method does not exist to efficiently schedule the user jobs. In this article, a fully distributed, learning automata–based job scheduling algorithm is proposed for grid environments. The proposed method is composed of two types of procedures: in the first, a procedure is run at the grid nodes and in the second, the procedure is run at the schedulers. The proposed algorithm synchronizes the performance of the schedulers by the learning automata that select their actions using the pseudo-random number generators with the same seed. In this method, the grid computational capacity that is allocated to each scheduler is proportional to its workload. To show the efficiency of the proposed method, several simulation experiments were conducted under different grid scenarios. The obtained results show that the proposed algorithm outperforms several well-known methods in terms of makespan, flow time, and load balancing. 相似文献

11.

A performance study of robust load sharing strategies fordistributed heterogeneous Web server systems

Colajanni M. Yu P.S. 《Knowledge and Data Engineering, IEEE Transactions on》2002,14(2):398-414

Replication of information across multiple servers is becoming a common approach to support popular Web sites. A distributed architecture with some mechanisms to assign client requests to Web servers is more scalable than any centralized or mirrored architecture. In this paper, we consider distributed systems in which the Authoritative Domain Name Server (ADNS) of the Web site takes the request dispatcher role by mapping the URL hostname into the IP address of a visible node, that is, a Web server or a Web cluster interface. This architecture can support local and geographical distribution of the Web servers. However, the ADNS controls only a very small fraction of the requests reaching the Web site because the address mapping is not requested for each client access. Indeed, to reduce Internet traffic, address resolution is cached at various name servers for a time-to-live (TTL) period. This opens an entirely new set of problems that traditional centralized schedulers of parallel/distributed systems do not have to face. The heterogeneity assumption on Web node capacity, which is much more likely in practice, increases the order of complexity of the request assignment problem and severely affects the applicability and performance of the existing load sharing algorithms. We propose new assignment strategies, namely adaptive TTL schemes, which tailor the TTL value for each address mapping instead of using a fixed value for all mapping requests. The adaptive TTL schemes are able to address both the nonuniformity of client requests and the heterogeneous capacity of Web server nodes. Extensive simulations show that the proposed algorithms are very effective in avoiding node overload, even for high levels of heterogeneity and limited ADNS control 相似文献

12.

Grid Resource Availability Prediction-Based Scheduling and Task Replication

Brent Rood Michael J. Lewis 《Journal of Grid Computing》2009,7(4):479-500

The frequent and volatile unavailability of volunteer-based Grid computing resources challenges Grid schedulers to make effective job placements. The manner in which host resources become unavailable will have different effects on different jobs, depending on their runtime and their ability to be checkpointed or replicated. A multi-state availability model can help improve scheduling performance by capturing the various ways a resource may be available or unavailable to the Grid. This paper uses a multi-state model and analyzes a machine availability trace in terms of that model. Several prediction techniques then forecast resource transitions into the model’s states. We analyze the accuracy of our predictors, which outperform existing approaches. We also propose and study several classes of schedulers that utilize the predictions, and a method for combining scheduling factors. We characterize the inherent tradeoff between job makespan and the number of evictions due to failure, and demonstrate how our schedulers can navigate this tradeoff under various scenarios. Lastly, we propose job replication techniques, which our schedulers utilize to replicate those jobs that are most likely to fail. Our replication strategies outperform others, as measured by improved makespan and fewer redundant operations. In particular, we define a new metric for replication efficiency, and demonstrate that our multi-state availability predictor can provide information that allows our schedulers to be more efficient than others that blindly replicate all jobs or some static percentage of jobs. 相似文献

13.

Autonomous network-based integration architecture for multi-agent systems under dynamic and heterogeneous environment

Hujun Li XiaoZhi Li Yi Wan 《Automatic Control and Computer Sciences》2016,50(5):347-360

Multi-agent systems fit nicely into domains that are naturally distributed and require artificial intelligence technology. Has been designed an autonomous information services integration architecture based on network to support the rapid changing environments and needs. However, substantial increase of users requests and redirects it may cause the system to unbalance loading and part overloading. This paper proposes an integrated access method by reduces the number of Pull Mobile Agents to reduces the total load of the system in order to achieveautonomous load distribution. In addition, the information structure of integrated service area is effective to improve the ratio of the satisfaction of Pull-Mas (Pull Mobile Agents) with joint request on one node. Through simulation tests show that this system can be guaranteed that services requests and related services requests is uniformly distributed to the nodes of system and ensure that the system load balancing. 相似文献

14.

A multicriteria approach to two-level hierarchy scheduling in grids

Krzysztof Kurowski Jarek Nabrzyski Ariel Oleksiak Jan Węglarz 《Journal of Scheduling》2008,11(5):371-379

In this paper we address a multicriteria scheduling problem for computational Grid systems. We focus on the two-level hierarchical Grid scheduling problem, in which at the first level (the Grid level) a Grid broker makes scheduling decisions and allocates jobs to Grid nodes. Jobs are then sent to the Grid nodes, where local schedulers generate local schedules for each node accordingly. A general approach is presented taking into account preferences of all the stakeholders of Grid scheduling (end-users, Grid administrators, and local resource providers) and assuming a lack of knowledge about job time characteristics. A single-stakeholder, single-criterion version of the approach has been compared experimentally with the existing approaches. 相似文献

15.

Advanced Authentication Mechanisms for Identity and Access Management in Cloud Computing

Amjad Alsirhani Mohamed Ezz Ayman Mohamed Mostafa 《计算机系统科学与工程》2022,43(3):967-984

Identity management is based on the creation and management of user identities for granting access to the cloud resources based on the user attributes. The cloud identity and access management (IAM) grants the authorization to the end-users to perform different actions on the specified cloud resources. The authorizations in the IAM are grouped into roles instead of granting them directly to the end-users. Due to the multiplicity of cloud locations where data resides and due to the lack of a centralized user authority for granting or denying cloud user requests, there must be several security strategies and models to overcome these issues. Another major concern in IAM services is the excessive or the lack of access level to different users with previously granted authorizations. This paper proposes a comprehensive review of security services and threats. Based on the presented services and threats, advanced frameworks for IAM that provide authentication mechanisms in public and private cloud platforms. A threat model has been applied to validate the proposed authentication frameworks with different security threats. The proposed models proved high efficiency in protecting cloud platforms from insider attacks, single sign-on failure, brute force attacks, denial of service, user privacy threats, and data privacy threats. 相似文献

16.

Balancing throughput and response time in online scientific Clouds via Ant Colony Optimization (SP2013/2013/00006)

《Advances in Engineering Software》2015

The Cloud Computing paradigm focuses on the provisioning of reliable and scalable infrastructures (Clouds) delivering execution and storage services. The paradigm, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. The goal of this work is to study private Clouds to execute scientific experiments coming from multiple users, i.e., our work focuses on the Infrastructure as a Service (IaaS) model where custom Virtual Machines (VM) are launched in appropriate hosts available in a Cloud. Then, correctly scheduling Cloud hosts is very important and it is necessary to develop efficient scheduling strategies to appropriately allocate VMs to physical resources. The job scheduling problem is however NP-complete, and therefore many heuristics have been developed. In this work, we describe and evaluate a Cloud scheduler based on Ant Colony Optimization (ACO). The main performance metrics to study are the number of serviced users by the Cloud and the total number of created VMs in online (non-batch) scheduling scenarios. Besides, the number of intra-Cloud network messages sent are evaluated. Simulated experiments performed using CloudSim and job data from real scientific problems show that our scheduler succeeds in balancing the studied metrics compared to schedulers based on Random assignment and Genetic Algorithms. 相似文献

17.

APEX: adaptive disk scheduling framework with QoS support

Ketil Lund Vera Goebel Thomas Plagemann 《Multimedia Systems》2005,11(1):45-59

APEX is an adaptive disk scheduling framework with Quality-of-Service (QoS) support designed for environments with highly varying disk bandwidth usage. APEX is based on a three-layer scheduling architecture: (1) the upper layer realizes different service classes using a set of queues; (2) the mid-layer distributes available disk bandwidth among these queues; and (3) the lower layer is handled by the disk itself, which does the final ordering of disk requests. We demonstrate the use of APEX in an example scenario, a Learning-on-Demand (LoD) application supported by a multimedia system, where students can search for and playback multimedia-based learning material. In this paper, we present the scheduling concepts of APEX which are based on an extended token bucket algorithm. The disk requests scheduled for service are assembled into batches in order to exploit the intelligence of modern disks. Combined with a specialized work-conservation scheme, this enables APEX to apply bandwidth where it is needed, without the loss of efficiency. We demonstrate, through simulations, that APEX provides both higher throughput and lower response times than other mixed-media disk schedulers while still avoiding deadline violations for real-time requests. We also show its robustness with respect to misaligned bandwidth allocation. The work was conducted while Ketil Lund was an employee at UniK – University Graduate Center, Kjeller, Norway. 相似文献

18.

大规模短时间任务的低延迟集群调度框架

赵全汤小春朱紫钰毛安琪李战怀《计算机应用》2021,41(8):2396-2405

大规模数据分析环境中,经常存在一些持续时间较短、并行度较大的任务。如何调度这些低延迟要求的并发作业是目前研究的一个热点。现有的一些集群资源管理框架中,集中式调度器由于主节点的瓶颈无法达到低延迟的要求,而一些分布式调度器虽然达成了低延迟的任务调度,但在最优资源分配以及资源分配冲突方面存在一定的不足。从大规模实时作业的需求出发,设计和实现了一个分布式的集群资源调度框架,以满足大规模数据处理的低延迟要求。首先提出了两阶段调度框架以及优化后的两阶段多路调度框架;然后针对两阶段多路调度过程中存在的一些资源冲突问题,提出了基于负载平衡的任务转移机制,从而解决了各个计算节点的负载不平衡问题;最后使用实际负载以及一个模拟调度器对大规模集群中的任务调度框架进行了模拟和验证。对于实际负载,所提框架的调度延迟控制在理想调度的12%以内;在模拟环境下,该框架与集中式调度器相比在短时间任务的延迟上能够减少40%以上。相似文献

19.

Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid

Frédéric Desprez Antoine Vernois 《Journal of Grid Computing》2006,4(1):19-31

Managing large datasets has become one major application of Grids. Life science applications usually manage large databases that should be replicated to scale applications. The growing number of users and the simple access to Internet-based application has stressed Grid middleware. Such environment are thus asked to manage data and schedule computation tasks at the same time. These two important operations have to be tightly coupled. This paper presents an algorithm (Scheduling and Replication Algorithm, SRA) that combines data management and scheduling using a steady-state approach. Using a model of the platform, the number of requests as well as their distribution, the number and size of databases, we define a linear program to satisfy all the constraints at every level of the platform in steady-state. The solution of this linear program will give us a placement for the databases on the servers as well as providing, for each kind of job, the server on which they should be executed. Our theoretical results are validated using simulation and logs from a large life science application. This work was supported in part by the ACI GRID and Grid5000 projects of the French Department of Research. 相似文献

20.

一种基于深度强化学习与概率性能感知的边缘计算环境多工作流卸载方法

马堉银郑万波马勇刘航夏云霓郭坤银陈鹏刘诚武《计算机科学》2021,48(1):40-48

移动边缘计算是一种新兴的分布式和泛在计算模式,其将计算密集型和时延敏感型任务转移到附近的边缘服务器,有效缓解了移动终端资源不足的问题,显著减小了用户与计算处理节点之间的通信传输开销.然而,如果多个用户同时提出计算密集型任务请求,特别是流程化的工作流任务请求,边缘计算环境往往难以有效地进行响应,并会造成任务拥塞.另外,受... 相似文献