期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Autonomic Performance and Power Control on Virtualized Servers: Survey,Practices, and Trends

周笑波 ;蒋昌俊《计算机科学技术学报》2014,29(4):631-645

Modern datacenter servers hosting popular Internet services face significant and multi-facet challenges in performance and power control. The user-perceived performance is the result of a complex interaction of complex workloads in a very complex underlying system. Highly dynamic and bursty workloads of Internet services fluctuate over multiple time scales, which has a significant impact on processing and power demands of datacenter servers. High-density servers apply virtualization technology for capacity planning and system manageability. Such virtuMized computer systems are increasingly large and complex. This paper surveys representative approaches to autonomic performance and power control on virtualized servers, which control the quality of service provided by virtualized resources, improve the energy efficiency of the underlying system, and reduce the burden of complex system management from human operators. It then presents three designed self-adaptive resource management techniques based on machine learning and control for percentile-based response time assurance, non-intrusive energy-efficient performance isolation, and joint performance and power guarantee on virtualized servers. The techniques were implemented and evaluated in a testbed of virtualized servers hosting benchmark applications. Finally, two research trends are identified and discussed for sustainable cloud computing in green datacenters. 相似文献

2.

Proactive planning of bandwidth resource using simulation-based what-if predictions forWeb services in the cloud

Jianpeng HU Linpeng HUANG Tianqi SUN Ying FAN Wenqiang HU Hao ZHONG 《Frontiers of Computer Science》2021,15(1):151201-52

Resource planning is becoming an increasingly important and timely problem for cloud users.As more Web services are moved to the cloud,minimizing network usage is often a key driver of cost control.Most existing approaches focus on resources such as CPU,memory,and disk I/O.In particular,CPU receives the most attention from researchers,but the bandwidth is somehow neglected.It is challenging to predict the network throughput of modem Web services,due to the factors of diverse and complex response,evolving Web services,and complex network transportation.In this paper,we propose a methodology of what-if analysis,named Log2Sim,to plan the bandwidth resource of Web services.Log2Sim uses a lightweight workload model to describe user behavior,an automated mining approach to obtain characteristics of workloads and responses from massive Web logs,and traffic-aware simulations to predict the impact on the bandwidth consumption and the response time in changing contexts.We use a real-life Web system and a classic benchmark to evaluate Log2Sim in multiple scenarios.The evaluation result shows that Log2Sim has good performance in the prediction of bandwidth consumption.The average relative error is 2%for the benchmark and 8% for the real-life system.As for the response time,Log2Sim cannot produce accurate predictions for every single service request,but the simulation results always show similar trends on average response time with the increase of workloads in different changing contexts.It can provide sufficient information for the system administrator in proactive bandwidth planning. 相似文献

3.

RSAD: A Robust Distributed Contention-Based Adaptive Mechanism for IEEE 802.11 Wireless LANs

下载免费PDF全文

YongPeng Shi-DuanCheng Jun-LiangChen 《计算机科学技术学报》2005,20(2):0-0

Previous researches have shown that Distributed Coordination Function (DCF) access mode of IEEE 802.11 has lower performance in heavy contention environment. Based on the in-depth analysis of IEEE 802.11 DCF, NSAD (New Self-adapt DCF-based protocol) has been proposed to improve system saturation throughput in heavy contention condition. The initial contention window tuning algorithm of NSAD is proved effective in error-free environment. However, problems concerning the exchanging of initial contention window occur in error-prone environment. Based on the analysis of NSAD's performance in error-prone environment, RSAD is proposed to further enhance the performance. Simulation in a more real shadowing error-prone environment is done to compare the performance of NSAD and RSAD and results have shown that RSAD can achieve further performance improvement as expected in the error-prone environment than NSAD (i.e., better goodput and fairness index). 相似文献

4.

Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

下载免费PDF全文

Andrés Ortiz Julio Ortega Antonio F.Díaz Alberto Prieto 《计算机科学技术学报》2013,28(3):508-524

Improving the network interface performance is needed by the demand of applications with high communication requirements (for example, some multimedia, real-time, and high-performance computing applications), and the availability of network links providing multiple gigabits per second bandwidths that could require many processor cycles for communication tasks. Multicore architectures, the current trend in the microprocessor development to cope with the difficulties to further increase clock frequencies and microarchitecture efficiencies, provide new opportunities to exploit the parallelism available in the nodes for designing efficient communication architectures. Nevertheless, although present OS network stacks include multiple threads that make it possible to execute network tasks concurrently in the kernel, the implementations of packet-based or connection-based parallelism are not trivial as they have to take into account issues related with the cost of synchronization in the access to shared resources and the efficient use of caches. Therefore, a common trend in many recent researches on this topic is to assign network interrupts and the corresponding protocol and network application processing to the same core, as with this affinity scheduling it would be possible to reduce the contention for shared resources and the cache misses. In this paper we propose and analyze several configurations to distribute the network interface among the different cores available in the server. These alternatives have been devised according to the affinity of the corresponding communication tasks with the location (proximity to the memories where the different data structures are stored) and characteristics of the processing core. As this approach uses several cores to accelerate the communication path of a given connection, it can be seen as complementary to those that consider several cores to simultaneously process packets belonging to either the same or different connections. Message passing interface (MPI) workloads and dynamic web servers have been considered as applications to evaluate and compare the communication performance of these alternatives. In our experiments, performed by full-system simulation, improvements of up to 35% in the throughput and up to 23% in the latency have been observed in MPI workloads, and up to 100% in the throughput, up to 500% in the response time, and up to 82% in the requests attended per second have been measured in dynamic web servers. 相似文献

5.

IEEE 802.11 EDCA带宽分配控制的竞争窗口优化

毛建兵毛玉明冷甦鹏《计算机应用》2009,29(1):1-4,8

Two approximate computation methods were proposed to acquire the optimal transmitting probability and further to find the optimal contention window setting by analyzing the influence of the priority-based service differentiation of IEEE 802.11 Enhanced Distributed Channel Access (EDCA) on throughput. The optimal contention window setting can help yield the maximum aggregate throughput while maintaining the weighted proportional bandwidth differentiation among different traffic classes. Through validation by numeric evaluation and simulation, the results are proved to be very close to the theoretical value, and the optimal contention window setting can effectively optimize the throughput performance of the whole network. 相似文献

6.

Accessible network of performance index tables

《计算机安全》2003,(3)

In each of the information processing apparatuses connected to each other via a network, there is arranged a quality of service (QOS) table to which functions and performance thereof are registered. When an information processing apparatus is additionally linked with the network, a QOS table thereof is automatically registered to a local directory of the network such that an agent converts the contents of the QOS table into service information to be supplied via a user interface to the user. Thanks to the operation, information of functions and performance of each information processing apparatus connected to the network is converted into service information for the user. Consequently, the user can much more directly receive necessary services. 相似文献

7.

Resource pooling for frameless network architecture with adaptive resource allocation

XU XiaoDong WANG Da TAO XiaoFeng SVENSSON Tommy 《中国科学:信息科学(英文版)》2013,(2):152-163

The system capacity for future mobile communication needs to be increased to fulfill the emerging requirements of mobile services and innumerable applications. The cellular topology has for long been regarded as the most promising way to provide the required increase in capacity. However with the emerging densification of cell deployments, the traditional cellular structure limits the efficiency of the resource, and the coordination between different types of base stations is more complicated and entails heavy cost. Consequently, this study proposes frameless network architecture (FNA) to release the cell boundaries, enabling the topology needed to implement the FNA resource allocation strategy. This strategy is based on resource pooling incorporating a new resource dimension-antenna/antenna array. Within this architecture, an adaptive resource allocation method based on genetic algorithm is proposed to find the optimal solution for the multi-dimensional resource allocation problem. Maximum throughput and proportional fair resource allocation criteria are considered. The simulation results show that the proposed architecture and resource allocation method can achieve performance gains for both criteria with a relatively low complexity compared to existing schemes. 相似文献

8.

Scalable SaaS Indexing Algorithms with Automated Redundancy and Recovery Management

下载免费PDF全文

Wei-Tek Tsai Guanqiu Qi Zhiqin Zhu 《International Journal of Software and Informatics》2013,7(1):63-84

Software-as-a-Service (SaaS) is a new software delivery model with Multi-Tenancy Architecture (MTA). An SaaS system is often mission critical as it often supports a large number of tenants, and each tenant supports a large number of users. This paper proposes a scalable index management algorithm based on B+ tree but with automated redundancy and recovery management as the tree maintains two copies of data. The redundancy and recovery management is done at the SaaS level as data are duplicated with tenant information rather than at the PaaS level where data are duplicated in chunks. Using this approach, an SaaS system can scale out or in based on the dynamic workload. This paper also uses tenant similarity measures to cluster tenants in a multi-level scalability architecture where similar tenants can be grouped together for effcient processing. The scalability mechanism also includes an automated migration strategies to enhance the SaaS performance. The proposed scheme with automated recovery and scalability has been simulated, the results show that the proposed algorithm can scale well with increasing workloads. 相似文献

9.

Admission control with elastic QoS for video on demand systems

Fu-Shou Lin Bao-Qun Yin Jing Huang Xu-Min Wu 《国际自动化与计算杂志》2012,9(5):467-473

In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy. 相似文献

10.

On-Line Predicting Behaviors of Jobs in Dynamic Load Balancing

下载免费PDF全文

Ju Jiubin Xu Gaochao Yang Kun 《计算机科学技术学报》1996,11(1):39-49

A key issue of dynamic load balancing in a lossely coupled distributed system is selecting appropriate jobs to transfer.In this paper,a job selection policy based on on-line predicting behaviors of jobs is proposed.Tracing is used at the beginning of execution of a job to predict the approximate execution time and resource requirements of th job so as to make a correct decision about whether transferring the job is worthwhil.A dynamic load balancer using the job selection policy has been implemented.Experimental measurement results show that the policy proposed is able to improve mean response time of jobs and resource utilization of systems substantially. 相似文献

11.

Dynamic cluster resource allocations for jobs with known andunknown memory demands

Li Xiao Songqing Chen Xiaodong Zhang 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(3):223-240

The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands 相似文献

12.

Preemptive cloud resource allocation modeling of processing jobs

Shahin Vakilinia Mohamed Cheriet 《The Journal of supercomputing》2018,74(5):2116-2150

Cloud computing allows execution and deployment of different types of applications such as interactive databases or web-based services which require distinctive types of resources. These applications lease cloud resources for a considerably long period and usually occupy various resources to maintain a high quality of service (QoS) factor. On the other hand, general big data batch processing workloads are less QoS-sensitive and require massively parallel cloud resources for short period. Despite the elasticity feature of cloud computing, fine-scale characteristics of cloud-based applications may cause temporal low resource utilization in the cloud computing systems, while process-intensive highly utilized workload suffers from performance issues. Therefore, ability of utilization efficient scheduling of heterogeneous workload is one challenging issue for cloud owners. In this paper, addressing the heterogeneity issue impact on low utilization of cloud computing system, conjunct resource allocation scheme of cloud applications and processing jobs is presented to enhance the cloud utilization. The main idea behind this paper is to apply processing jobs and cloud applications jointly in a preemptive way. However, utilization efficient resource allocation requires exact modeling of workloads. So, first, a novel methodology to model the processing jobs and other cloud applications is proposed. Such jobs are modeled as a collection of parallel and sequential tasks in a Markovian process. This enables us to analyze and calculate the efficient resources required to serve the tasks. The next step makes use of the proposed model to develop a preemptive scheduling algorithm for the processing jobs in order to improve resource utilization and its associated costs in the cloud computing system. Accordingly, a preemption-based resource allocation architecture is proposed to effectively and efficiently utilize the idle reserved resources for the processing jobs in the cloud paradigms. Then, performance metrics such as service time for the processing jobs are investigated. The accuracy of the proposed analytical model and scheduling analysis is verified through simulations and experimental results. The simulation and experimental results also shed light on the achievable QoS level for the preemptively allocated processing jobs. 相似文献

13.

The impact of spatial layout of jobs on I/O hotspots in mesh networks

《Journal of Parallel and Distributed Computing》2005,65(10):1190-1203

Network contention hotspots can limit network throughput for parallel disk I/O, even when the interconnection network appears to be sufficiently provisioned. We studied I/O hotspots in mesh networks as a function of the spatial layout of an application's compute nodes relative to the I/O nodes.Our analytical modeling and dynamic simulations show that when I/O nodes are configured on one side of a two-dimensional mesh, realizable I/O throughput is at best bounded by four times the network bandwidth per link. Maximal performance depends on the spatial layout of jobs, and cannot be further improved by adding I/O nodes.Applying these results, we devised a new parallel layout allocation strategy (PLAS) which minimizes I/O hotspots, and approaches the theoretical best case for parallel I/O throughput. Our I/O performance analysis and processor allocation strategy are applicable to a wide range of contemporary and emerging high-performance computing systems. 相似文献

14.

I/O-aware bandwidth allocation for petascale computing systems

《Parallel Computing》2016

In the Big Data era, the gap between the storage performance and an application’s I/O requirement is increasing. I/O congestion caused by concurrent storage accesses from multiple applications is inevitable and severely harms the performance. Conventional approaches either focus on optimizing an application’s access pattern individually or handle I/O requests on a low-level storage layer without any knowledge from the upper-level applications. In this paper, we present a novel I/O-aware bandwidth allocation framework to coordinate ongoing I/O requests on petascale computing systems. The motivation behind this innovation is that the resource management system has a holistic view of both the system state and jobs’ activities and can dynamically control the jobs’ status or allocate resource on the fly during their execution. We treat a job’s I/O requests as periodical sub-jobs within its lifecycle and transform the I/O congestion issue into a classical scheduling problem. Based on this model, we propose a bandwidth management mechanism as an extension to the existing scheduling system. We design several bandwidth allocation policies with different optimization objectives either on user-oriented metrics or system performance. We conduct extensive trace-based simulations using real job traces and I/O traces from a production IBM Blue Gene/Q system at Argonne National Laboratory. Experimental results demonstrate that our new design can improve job performance by more than 30%, as well as increasing system performance. 相似文献

15.

混部数据中心负载特征及其任务调度优化分析

王济伟葛浙奉蒋从锋张纪林俞俊林江彬闫龙川任祖杰万健《计算机工程与科学》2020,42(1):8-17

随着现代互联网数据中心的规模越来越大,数据中心面临着能耗、可靠性、可管理性与可扩展性等方面的挑战。同时,数据中心承载的服务多样,既有在线Web服务,也有离线批处理任务。在线任务要求较低的延迟,而离线任务要求较高的吞吐量。为了提高服务器利用率,降低数据中心能耗,当前数据中心往往将在线任务和离线任务混合部署到同一个计算集群中。在混部场景下,如何同时满足在线和离线任务的不同要求,是目前面临的关键挑战。分析了阿里巴巴于2018年发布的含有4034台服务器的混部计算集群在8天内的日志数据(cluster-trace-v2018),从静态配置信息、动态混部运行状态、离线批处理作业DAG依赖结构等出发,揭示其负载特征,包括任务倾斜与容器部署的相关关系等,根据任务依赖关系与关键路径,提出了相应的任务调度优化策略。相似文献

16.

Min_c: Heterogeneous concentration policy for energy-aware scheduling of jobs with resource contention

F. A. Armenta-Cano A. Tchernykh J. M. Cortes-Mendoza R. Yahyapour A. Yu. Drozdov P. Bouvry D. Kliazovich A. Avetisyan S. Nesmachnow 《Programming and Computer Software》2017,43(3):204-215

In this paper, we address energy-aware online scheduling of jobs with resource contention. We propose an optimization model and present new approach to resource allocation with job concentration taking into account types of applications and heterogeneous workloads that could include CPU-intensive, diskintensive, I/O-intensive, memory-intensive, network-intensive, and other applications. When jobs of one type are allocated to the same resource, they may create a bottleneck and resource contention either in CPU, memory, disk or network. It may result in degradation of the system performance and increasing energy consumption. We focus on energy characteristics of applications, and show that an intelligent allocation strategy can further improve energy consumption compared with traditional approaches. We propose heterogeneous job consolidation algorithms and validate them by conducting a performance evaluation study using the Cloud Sim toolkit under different scenarios and real data. We analyze several scheduling algorithms depending on the type and amount of information they require. 相似文献

17.

Adaptive parallel job scheduling with flexible coscheduling

Frachtenberg E. Feitelson G. Petrini F. Fernandez J. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(11):1066-1077

Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios. 相似文献

18.

Adaptive hierarchical scheduling policy for enterprise grid computing systems

J.H. Abawajy 《Journal of Network and Computer Applications》2009,32(3):770-779

In an enterprise grid computing environments, users have access to multiple resources that may be distributed geographically. Thus, resource allocation and scheduling is a fundamental issue in achieving high performance on enterprise grid computing. Most of current job scheduling systems for enterprise grid computing provide batch queuing support and focused solely on the allocation of processors to jobs. However, since I/O is also a critical resource for many jobs, the allocation of processor and I/O resources must be coordinated to allow the system to operate most effectively. To this end, we present a hierarchical scheduling policy paying special attention to I/O and service-demands of parallel jobs in homogeneous and heterogeneous systems with background workload. The performance of the proposed scheduling policy is studied under various system and workload parameters through simulation. We also compare performance of the proposed policy with a static space–time sharing policy. The results show that the proposed policy performs substantially better than the static space–time sharing policy. 相似文献

19.

FEADS: A Framework for Exploring the Application Design Space on Network Processors

Rajani Pai R. Govindarajan 《International journal of parallel programming》2007,35(1):1-31

Network processors are designed to handle the inherently parallel nature of network processing applications. However, partitioning and scheduling of application tasks and data allocation to reduce memory contention remain as major challenges in realizing the full performance potential of a given network processor. The large variety of processor architectures in use and the increasing complexity of network applications further aggravate the problem. This work proposes a novel framework, called FEADS, for automating the task of application partitioning and scheduling for network processors. FEADS uses the simulated annealing approach to perform design space exploration of application mapping onto processor resources. Further, it uses cyclic and r-periodic scheduling to achieve higher throughput schedules. To evaluate dynamic performance metrics such as throughput and resource utilization under realistic workloads, FEADS automatically generates a Petri net (PN) which models the application, architectural resources, mapping and the constructed schedule and their interaction. The throughput obtained by schedules constructed by FEADS is comparable to that obtained by manual scheduling for linear task flow graphs; for more complicated task graphs, FEADS’ schedules have a throughput which is upto 2.5 times higher compared to the manual schedules. Further, static scheduling of tasks results in an increase in throughput by upto 30% compared to an implementation of the same mapping without task scheduling. 相似文献

20.

Dynamic core affinity for high-performance file upload on Hadoop Distributed File System

《Parallel Computing》2014,40(10):722-737

The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-performance data storing has become more critical because of the continuously increasing volume of data uploaded to distributed file systems and database servers. However, the analysis of the performance characteristics of the processes that store upstream data is very intricate, because both network and disk inputs/outputs (I/O) are heavily involved in their operations. In this paper, we analyze the impact of core affinity on both network and disk I/O performance and propose a novel approach for dynamic core affinity for high-throughput file upload. We consider the dynamic changes in the processor load and the intensiveness of the file upload at run-time, and accordingly decide the core affinity for service threads, with the objective of maximizing the parallelism, data locality, and resource efficiency. We apply the dynamic core affinity to Hadoop Distributed File System (HDFS). Measurement results show that our implementation can improve the file upload throughput of end applications by more than 30% as compared with the default HDFS, and provide better scalability. 相似文献