首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Executing heterogeneous workloads with different priorities, resource demands and performance objectives is one of the key operations for today’s data centers to increase resource as well as energy efficiency. In order to meet the performance objectives of diverse workloads, schedulers rely on evictions even resulting in waste of resources due to lost executions of evicted tasks. It is not straightforward to design priority schedulers which capture key aspects of workloads and systems and also to strike a balance between resource (in)efficiency and application performance tradeoff. To explore large space of designing such schedulers, we propose a trace-driven cluster management framework that models a comprehensive set of system configurations and general priority-based scheduling policies. In particular, we focus on the impact of task evictions on resource inefficiency and task response times of multiple priority classes driven by Google production cluster trace. Moreover, we propose a system design as a use case exploiting workload heterogeneity and introducing workload-awareness into the system configuration and task assignment.  相似文献   

In the recent years, energy-efficiency of computing infrastructures has gained a great attention. For this reason, proper estimation and evaluation of energy that is required to execute data center workloads became an important research problem. In this paper we present a Data Center Workload and Resource Management Simulator (DCworms) which enables modeling and simulation of computing infrastructures to estimate their performance, energy consumption, and energy-efficiency metrics for diverse workloads and management policies. We discuss methods of power usage modeling available in the simulator. To this end, we compare results of simulations to measurements of real servers. To demonstrate DCworms capabilities we evaluate impact of several resource management policies on overall energy-efficiency of specific workloads executed on heterogeneous resources.  相似文献   

In manycore systems, eviction decisions related to caches and memory coherence greatly impact system performance, thereby emphasizing their importance. Extensive research has produced numerous standalone eviction policies such as LRU, LFU, FIFO, etc. all aiming to attain the Bélády optimum solution. Standalone eviction policies optimize for a single attribute (recency, frequency, etc.), limiting their impact on applications exhibiting non-uniform memory access patterns. The Hybrid Voting-based Eviction Policy (HyVE) extends multiple standalone eviction policies with a ranking system and evaluates them using concepts from the voting theory domain. The goal of HyVE is to make better replacement decisions by creating a consensus among its constituent eviction policies. With its inherent voting properties, HyVE takes different replacement decisions compared to its standalone counterparts, making it a unique and new eviction policy. We deploy and evaluate HyVE as part of two case-studies: last-level cache replacement decisions in a generic manycore environment, and sparse directory eviction decisions on a tile-based distributed shared memory (DSM) architecture. We explore different variants of HyVE, and evaluate them using workloads from the PARSEC and SPLASH-2 benchmark suites in a simulation environment. We also compare HyVE to state-of-the-art set-dueling and learning-based eviction policies. For last-level cache replacement decisions, HyVE reduces cache misses by 7.4% compared to the LRU policy, whereas DRRIP and Hawkeye reduce cache misses by 5.5% and 9.2% respectively compared to the LRU policy. Though Hawkeye exhibits better performance on average, HyVE offers a unique advantage for certain workloads by using a voting-based approach to solve the replacement problem. For sparse directory eviction decisions, results show that HyVE reduces coherence traffic and execution time by up to 11% compared to the LRU policy. We have synthesized HyVE on an FPGA prototype. Hardware analysis results show that HyVE’s constituent policies contribute the most to its overheads, while HyVE’s ranking and voting extensions do not add significant overheads. Timing analysis results show that HyVE’s logic delay is comparable to that of standalone eviction policies. Lastly, we evaluate HyVE on the FPGA prototype using characteristic micro-benchmarks that further emphasize HyVE’s ability to remain agnostic to varying data access patterns.  相似文献   

The paper presents a performance case study of parallel jobs executing in real multi user workloads. The study is based on a measurement based model capable of predicting the completion time distribution of the jobs executing under real workloads. The model constructed is also capable of predicting the effects of system design changes on application performance. The model is a finite state, discrete time Markov model with rewards and costs associated with each state. The Markov states are defined from real measurements and represent system/workload states in which the machine has operated. The paper places special emphasis on choosing the correct number of states to represent the workload measured. Specifically, the performance of computationally bound, parallel applications executing in real workloads on an Alliant FX/80 is evaluated. The constructed model is used to evaluate scheduling policies, the performance effects of multiprogramming overhead, and the scalability of the Alliant FX/8O in real workloads. The model identifies a number of available scheduling policies which would improve the response time of parallel jobs. In addition, the model predicts that doubling the number of processors in the current configuration would only improve response time for a typical parallel application by 25%. The model recommends a different processor configuration to more fully utilize extra processors. The paper also presents empirical results which validate the model created  相似文献   

Digital archives protect important data collections from failures by making multiple copies at other archives, so that there are always several good copies of a collection. In a cooperative replication network, sites "trade" space, so that each site contributes storage resources to the system and uses storage resources at other sites. Here, we examine bid trading: a mechanism where sites conduct auctions to determine who to trade with. A local site wishing to make a copy of a collection announces how much remote space is needed, and accepts bids for how much of its own space the local site must "pay" to acquire that remote space. We define a spectrum of trading scenarios, ranging from a network of archives and digital libraries that trust each other, to a scenario where sites do as they please, including breaking the rules. Then, we focus on developing techniques for the scenarios where sites trust each other, although we discuss issues that may arise if sites are greedy or malicious. We examine the best policies for determining when to call auctions and how much to bid, as well as the effects of "maverick" sites that behave differently than other sites. Simulations of auction and trading sessions indicate that bid trading can allow sites to achieve higher reliability than the alternative: a system where sites trade equal amounts of space without bidding.  相似文献   

Deadline assignment in a distributed soft real-time system   总被引:3,自引:0,他引:3  
In a distributed environment, tasks often have processing demands at multiple different sites. A distributed task is usually divided into several subtasks, each to be executed in order at some site. In a real-time system, an overall deadline is usually specified by an application designer indicating when a distributed task is to be finished. In this paper, we present and analyze techniques for automatically translating the overall deadline into deadlines for the individual subtasks  相似文献   

The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions, making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interaction-aware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made online. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently outperform (up to 4x) query schedulers in current database systems.  相似文献   

Job scheduling on production supercomputers is complicated by diverse demands of system administrators and amorphous characteristics of workloads. Specifically, various scheduling goals such as queuing efficiency and system utilization are usually conflicting and thus need to be balanced. Also, changing workload characteristics often impact the effectiveness of the deployed scheduling policies. Thus it is challenging to design a versatile scheduling policy that is effective in all circumstances. In this paper, we propose a novel job scheduling strategy to balance diverse scheduling goals and mitigate the impact of workload characteristics. First, we introduce metric-aware scheduling, which enables the scheduler to balance competing scheduling goals represented by different metrics such as job waiting time, fairness, and system utilization. Second, we design a scheme to dynamically adjust scheduling policies based on feedback information of monitored metrics at runtime. We evaluate our design using real workloads from supercomputer centers. The results demonstrate that our scheduling mechanism can significantly improve system performance in a balanced, sustainable fashion.  相似文献   

Online auction sites have very specific workloads and user behavior characteristics. Previous studies on workload characterization conducted by the authors showed that (1) bidding activity on auctions increases considerably after 90% of an auction’s life time has elapsed, (2) a very large percentage of auctions have a relatively low number of bids and bidders and a very small percentage of auctions have a high number of bids and bidders, (3) prices rise very fast after an auction has lasted more than 90% of its life time. Thus, if bidders are not able to successfully bid at the very last moments of an auction because of site overload, the final price may not be as high as it could be and sellers, and consequently the auction site, may lose revenue. In this paper, we propose server-side caching strategies in which cache placement and replacement policies are based on auction-related parameters such as number of bids placed or percent remaining time till closing time. A main-memory auction cache at the application server can be used to reduce accesses to the back-end database server. Trace-based simulations were used to evaluate these caching strategies in terms of cache hit ratio and cache efficiency. The performance characteristics of the best policies were then evaluated through experiments conducted on a benchmark online auction system.  相似文献   

In this paper, we claim that memory migration mechanism is a useful approach to improve the execution of parallel applications in dynamic execution environments, but that their performance depends on related system components such as the processor scheduling. To show that, we evaluate the automatic memory migration mechanism provided by IRIX in Origin systems, under different dynamic processor allocation policies when executing OpenMP parallel multiprogrammed workloads. We have focused the evaluation on the effects of the page migration mechanism on the CPU time consumed by each application, the processor allocation received, and the speedup. Results demonstrate that, if the processor scheduler is memory conscious, that is, it maintains as much as possible the system stable, the automatic memory page migration mechanism provided by IRIX improves the CPU time consumed by OpenMP applications.  相似文献   

Resizable caches can trade-off capacity for access speed to dynamically match the needs of the workload. In single-threaded cores, resizable caches have demonstrated their ability to improve processor performance by adapting to the phases of the running application. In Simultaneous Multi-Threaded (SMT) cores, the caching needs can vary greatly across the number of threads and their characteristics, thus, offering even more opportunities to dynamically adjust cache resources to the workload.In this paper, we demonstrate that the preferred control methodology for data cache reconfiguring in a SMT core changes as the number of running threads increases. In workloads with one or two threads, the resizable cache control algorithm should optimize for cache miss behavior because misses typically form the critical path. In contrast, with several independent threads running, we show that optimizing for cache hit behavior has more impact, since large SMT workloads have other threads to run during a cache miss. Moreover, we demonstrate that these seemingly diametrically opposed policies are closely related mathematically; the former minimizes the arithmetic mean cache access time (which we will call AMAT), while the latter minimizes its harmonic mean. We introduce an algorithm (HAMAT) that smoothly and naturally adjusts between the two strategies with the degree of multi-threading.We extend a previously proposed Globally Asynchronous, Locally Synchronous (GALS) processor core with SMT support and dynamically resizable caches. We show that the HAMAT algorithm significantly outperforms the AMAT algorithm on four-thread workloads while matching its performance on one and two thread workloads. Moreover, HAMAT achieves overall performance improvements of 18.7%, 10.1%, and 14.2% on one, two, and four thread workloads, respectively, over the best fixed-configuration cache design.  相似文献   

The consolidation of multiple workloads and servers enables the efficient use of server and power resources in shared resource pools. We employ a trace-based workload placement controller that uses historical information to periodically and proactively reassign workloads to servers subject to their quality of service objectives. A reactive migration controller is introduced that detects server overload and underload conditions. It initiates the migration of workloads when the demand for resources exceeds supply. Furthermore, it dynamically adds and removes servers to maintain a balance of supply and demand for capacity while minimizing power usage. A host load simulation environment is used to evaluate several different management policies for the controllers in a time effective manner. A case study involving three months of data for 138 SAP applications compares three integrated controller approaches with the use of each controller separately. The study considers trade-offs between: (i) required capacity and power usage, (ii) resource access quality of service for CPU and memory resources, and (iii) the number of migrations. Our study sheds light on the question of whether a reactive controller or proactive workload placement controller alone is adequate for resource pool management. The results show that the most tightly integrated controller approach offers the best results in terms of capacity and quality but requires more migrations per hour than the other strategies.  相似文献   

Computer systems are now powerful enough to run multiple virtual machines (VMs), each one running a separate operating system (OS) instance. In such an environment, direct and centralized energy management employed by a single OS is unfeasible. Accurately predicting the idle intervals is one of the major approaches to save energy of disk drives. However, for the intensive workloads, it is difficult to find long idle intervals. Even if long idle intervals exist, it is very difficult for a predictor to catch the idle spikes in the workloads. This paper proposes to divide the workloads into buckets which are equal in time length, and predict the number of the forthcoming requests in each bucket instead of the length of the idle periods. By doing so, the bucket method makes the converted workload more predictable. The method also squeezes the executing time of each request to the end of its respective bucket, thus extending the idle length. By deliberately reshaping the workloads such that the crests and troughs of each workload become aligned, we can aggregate the peaks and the idle periods of the workloads. Due to the extended idle length caused by this aggregation, energy can be conserved. Furthermore, as a result of aligning the peaks, resource utilization is improved when the system is active. A trace driven simulator is designed to evaluate the idea. Three traces are employed to represent the workloads issued by three web servers residing in three VMs. The experimental results show that our method can save significant amounts of energy by sacrificing a small amount of quality of service.  相似文献   

Data grids are middleware systems that offer secure shared storage of massive scientific datasets over wide area networks. The main challenge in their design is to provide reliable storage, search, and transfer of numerous or large files over geographically dispersed heterogeneous platforms. The Storage Resource Broker (SRB) is an example of a system that provides these services and that has been deployed in multiple high-performance scientific projects during the past few years. In this paper, we take a detailed look at several of its functional features and examine its scalability using synthetic and trace-based workloads. Unlike traditional file systems, SRB uses a commodity database to manage both system- and user-defined metadata. We quantitatively evaluate this decision and draw insightful conclusions about its implications to the system architecture and performance characteristics. We find that the bulk transfer facilities of SRB demonstrate good scalability properties, and we identify the bottleneck resources across different data search and transfer tasks. We examine the sensitivity to several configuration parameters and provide details about how different internal operations contribute to the overall performance.  相似文献   

This work examines scheduling for a real-time multiprocessor (MAFT) in which both hard deadlines and fault-tolerance are necessary system components. A workload for this system consists of a set of concurrent dependent tasks, each with some execution frequency; tasks are also fully ordered by priority. Fault tolerance mechanisms include hardware-supported voting on computation results as well as on task starts, task completions, and branch conditions. The distributed agreement mechanism used on system-level decisions adds a variable threading delay to the run time of each copy of a task. These delays make current schedule verification techniques inapplicable. In the most general execution profile, each processor in the system runs a subset of the tasks, with different tasks possibly having different frequencies. In this work, however, we restrict attention to a special class of workloads, termed uni-schedule, in which each processor executes the entire task set, using the multiple processors to implement full redundancy. In addition, all tasks are assumed to have the same periodicity. Given these restrictions, we produce stable schedules consistent with the initial workload specifications. Algorithms are first given for uni-schedule workloads with no run-time branches, and then for uni-schedule workloads with branches.  相似文献   

A distributed multiserver Web site can provide the scalability necessary to keep up with growing client demand at popular sites. Load balancing of these distributed Web-server systems, consisting of multiple, homogeneous Web servers for document retrieval and a Domain Name Server (DNS) for address resolution, opens interesting new problems. In this paper, we investigate the effects of using a more active DNS which, as an atypical centralized scheduler, applies some scheduling strategy in routing the requests to the most suitable Web server. Unlike traditional parallel/distributed systems in which a centralized scheduler has full control of the system, the DNS controls only a very small fraction of the requests reaching the multiserver Web site. This peculiarity, especially in the presence of highly skewed load, makes it very difficult to achieve acceptable load balancing and avoid overloading some Web servers. This paper adapts traditional scheduling algorithms to the DNS, proposes new policies, and examines their impact under different scenarios. Extensive simulation results show the advantage of strategies that make scheduling decisions on the basis of the domain that originates the client requests and limited server state information (e.g., whether a server is overloaded or not). An initially unexpected result is that using detailed server information, especially based on history, does not seem useful in predicting the future load and can often lead to degraded performance  相似文献   

As semiconductor manufacturing technology continues to improve, it is possible to integrate more and more transistors onto a single processor. Many-core processor design has resulted in part from the search to utilize this enormous transistor real estate. The Single-Chip Cloud Computer (SCC) is an experimental many-core processor created by Intel Labs. In this paper we present a study in which we analyze this innovative many-core system by running several workloads with distinctive parallelism characteristics. We investigate the effect on system performance by monitoring specific hardware performance counters. Then, we experiment on varying different hardware configuration parameters such as number of cores, clock frequency and voltage levels. We execute the chosen workloads and collect the timing, power consumption and energy consumption information on such a many-core research platform. Thus, we can comprehensively analyze the behavior and scalability of the Intel SCC system with the introduced workload in terms of performance and energy consumption. Our results show that the profiled parallel workload execution has a communication bottleneck on the Intel SCC system. Moreover, our results indicate that we should carefully choose the number of cores to execute different workloads in order to yield a balance between execution performance and energy efficiency for different applications.  相似文献   

This paper addresses an evaluation of new heuristics solution procedures for the location of cross-docks and distribution centers in supply chain network design. The model is characterized by multiple product families, a central manufacturing plant site, multiple cross-docking and distribution center sites, and retail outlets which demand multiple units of several commodities. This paper describes two heuristics that generate globally feasible, near optimal distribution system design and utilization strategies utilizing the simulated annealing (SA) methodology. This study makes two important contributions. First, we continue the study of location planning for the cross-dock and distribution center supply chain network design problem. Second, we systematically evaluate the computational performance of this network design location model under more sophisticated heuristic control parameter settings to better understand interaction effects among the various factors comprising our experimental design, and present convergence results. The central idea of the paper is to evaluate the impact of geometric control mechanism vis-a-vis more sophisticated ones on solution time, quality, and convergence for two new heuristics. Our results suggest that integrating traditional simulated annealing with TABU search is recommended for this supply chain network design and location problem.  相似文献   

To balance multiple scheduling performance requirements on parallel computer systems, traditional job schedulers use many parameters that can be configured to define job or queue priorities. Offering many parameters seems flexible, but in reality tuning the values for the parameters is highly challenging. To simplify the task of resource management, we propose goal-oriented policies, which allow system administrators to specify high-level performance objectives, rather than tuning low-level scheduling parameters. We study the design of goal-oriented policies, including (1) appropriate multi-objective models for specifying trade-offs between objectives, (2) efficient search algorithms for searching the best schedule at each scheduling decision point, and (3) appropriate performance measures to be optimized in the objectives with respect to two common performance requirements: preventing starvation and favoring shorter jobs. We compare goal-oriented policies with widely used backfill policies. Policies are evaluated by simulation using ten monthly workloads that ran on a Linux cluster (IA-64) from NCSA. Our results show that by automatically optimizing performance according to the given objectives through search, goal-oriented policies can simultaneously outperform FCFS-backfill and LXF-backfill, which are designed in favor of the maximum wait and average slowdown, respectively.  相似文献   

Wei  Xing  Hu  Huiqi  Duan  Huichao  Qian  Weining  Zhou  Aoying 《World Wide Web》2019,22(6):2561-2587

To support the large-scale analytic for Web applications, the backend distributed data management system must provide the service for accessing massive data. Thus, the scan operation becomes a critical step. To improve the performance of scan operation, modern data management systems usually rely on the simple partitioned parallelism. Under the partitioned parallelism, tables are consist of several partitions, and each scan operation can access multiple partitions separately. It is a simple and effective solution for a single scan operation. In this paper, we consider managing multiple scan operations together, where the situation is no longer straightforward. To address the problem, we propose the parallel strategy to schedule batched scan operations together beyond the simple partitioned parallelism. For the sake of performance, first, we utilize replications to increase the parallelism and propose an effective load balancing strategy over replication nodes based on linear programming. Second, we propose an effective chunk-based scheduling algorithm for multi-threading parallelism on each node to guarantee all threads have even workloads under a qualified cost model. Finally, we integrate our parallel scan strategy into an open-sourced distributed data management system. Experimental evaluation shows our parallel scan strategy significantly improves the performance of scan operation.


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号