期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Power optimization for dynamic configuration in heterogeneous web server clusters

Luciano Bertini Author Vitae Julius C.B. Leite Author Vitae Author Vitae 《Journal of Systems and Software》2010,83(4):585-598

To reduce the environmental impact, it is essential to make data centers green, by turning off servers and tuning their speeds for the instantaneous load offered, that is, determining the dynamic configuration in web server clusters. We model the problem of selecting the servers that will be on and finding their speeds through mixed integer programming; we also show how to combine such solutions with control theory. For proof of concept, we implemented this dynamic configuration scheme in a web server cluster running Linux, with soft real-time requirements and QoS control, in order to guarantee both energy-efficiency and good user experience. In this paper, we show the performance of our scheme compared to other schemes, a comparison of a centralized and a distributed approach for QoS control, and a comparison of schemes for choosing speeds of servers. 相似文献

2.

Adaptive energy-efficient scheduling for real-time tasks on DVS-enabled heterogeneous clusters

Xiaomin Zhu Chuan He Kenli Li Xiao Qin 《Journal of Parallel and Distributed Computing》2012

Developing energy-efficient clusters not only can reduce power electricity cost but also can improve system reliability. Existing scheduling strategies developed for energy-efficient clusters conserve energy at the cost of performance. The performance problem becomes especially apparent when cluster computing systems are heavily loaded. To address this issue, we propose in this paper a novel scheduling strategy–adaptive energy-efficient scheduling or AEES–for aperiodic and independent real-time tasks on heterogeneous clusters with dynamic voltage scaling. The AEES scheme aims to adaptively adjust voltages according to the workload conditions of a cluster, thereby making the best trade-offs between energy conservation and schedulability. When the cluster is heavily loaded, AEES considers voltage levels of both new tasks and running tasks to meet tasks’ deadlines. Under light load, AEES aggressively reduces the voltage levels to conserve energy while maintaining higher guarantee ratios. We conducted extensive experiments to compare AEES with an existing algorithm–MEG, as well as two baseline algorithms–MELV, MEHV. Experimental results show that AEES significantly improves the scheduling quality of MELV, MEHV and MEG. 相似文献

3.

Energy saving strategies for parallel applications with point-to-point communication phases

Vaibhav Sundriyal Masha Sosonkina Alexander Gaenko Zhao Zhang 《Journal of Parallel and Distributed Computing》2013

Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This paper advocates for a runtime assessment of such overheads by means of characterizing point-to-point communications into phases followed by analyzing the time gaps between the communication calls. Certain communication and architectural parameters are taken into consideration in the three proposed frequency scaling strategies, which differ with respect to their treatment of the time gaps. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. For the latter, three different process-to-core mappings were studied as to their energy savings under the proposed frequency scaling strategies and under the existing state-of-the-art techniques. Close to the maximum energy savings were obtained with a low performance loss of 2% on the given platform. 相似文献

4.

How many cores do we need to run a parallel workload: A test drive of the Intel SCC platform?

Chen Liu Pollawat Thanarungroj Jean-Luc Gaudiot 《Journal of Parallel and Distributed Computing》2014

As semiconductor manufacturing technology continues to improve, it is possible to integrate more and more transistors onto a single processor. Many-core processor design has resulted in part from the search to utilize this enormous transistor real estate. The Single-Chip Cloud Computer (SCC) is an experimental many-core processor created by Intel Labs. In this paper we present a study in which we analyze this innovative many-core system by running several workloads with distinctive parallelism characteristics. We investigate the effect on system performance by monitoring specific hardware performance counters. Then, we experiment on varying different hardware configuration parameters such as number of cores, clock frequency and voltage levels. We execute the chosen workloads and collect the timing, power consumption and energy consumption information on such a many-core research platform. Thus, we can comprehensively analyze the behavior and scalability of the Intel SCC system with the introduced workload in terms of performance and energy consumption. Our results show that the profiled parallel workload execution has a communication bottleneck on the Intel SCC system. Moreover, our results indicate that we should carefully choose the number of cores to execute different workloads in order to yield a balance between execution performance and energy efficiency for different applications. 相似文献

5.

Slack computation for DVS algorithms in fixed-priority real-time systems using fluid slack analysis

Da-Ren Chen Author vitae 《Journal of Systems Architecture》2011,57(9):850-865

This work presents a scheduling algorithm to reduce the energy of hard real-time tasks with fixed priorities assigned in a rate-monotonic policy. Sets of independent tasks running periodically on a processor with dynamic voltage scaling (DVS) are considered as well. The proposed online approach can cooperate with many slack-time analysis methods based on low-power work demand analysis (lpWDA) without increasing the computational complexity of DVS algorithms. The proposed approach introduces a novel technique called low-power fluid slack analysis (lpFSA) that extends the analysis interval produced by its cooperative methods and computes the available slack in the extended interval. The lpFSA regards the additional slack as fluid and computes its length, such that it can be moved to the current job. Therefore, the proposed approach provides the cooperative methods with additional slack. Experimental results show that the proposed approach combined with lpWDA-based algorithms achieves more energy reductions than do the initial algorithms alone. 相似文献

6.

模型指导的多维GPU软件低功耗优化方法

王桂彬《计算机学报》2012,35(5):979-989

作为众核体系结构的典型代表,GPU(Graphics Processing Units)芯片集成了大量并行处理核心,其功耗开销也在随之增大,逐渐成为计算机系统中功耗开销最大的组成部分之一,而软件低功耗优化技术是降低芯片功耗的有效方法.文中提出了一种模型指导的多维低功耗优化技术,通过结合动态电压/频率调节和动态核心关闭技术,在不影响性能的情况下降低GPU功耗.首先,针对GPU多线程执行模型的特点,建立了访存受限程序的功耗优化模型;然后,基于该模型,分别分析了动态电压/频率调节和动态核心关闭技术对程序执行时间和能量消耗的影响,进而将功耗优化问题归纳为一般整数规划问题;最后,通过对9个典型GPU程序的评测以及与已有方法的对比分析,验证了该文提出的低功耗优化技术可以在不影响性能的情况下有效降低芯片功耗. 相似文献

7.

Dynamic slack allocation algorithms for energy minimization on parallel machines 总被引：1，自引：0，他引：1

Jaeyeon Kang Sanjay Ranka 《Journal of Parallel and Distributed Computing》2010

We explore novel algorithms for DVS (Dynamic Voltage Scaling) based energy minimization of DAG (Directed Acyclic Graph) based applications on parallel and distributed machines in dynamic environments. Static DVS algorithms for DAG execution use the estimated execution time. The estimated time in practice is overestimated or underestimated. Therefore, many tasks may be completed earlier or later than expected during the actual execution. For overestimation, the extra available slack can be added to future tasks so that energy requirements can be reduced. For underestimation, the increased time may cause the application to miss the deadline. Slack can be reduced for future tasks to reduce the possibility of not missing the deadline. In this paper, we present novel dynamic scheduling algorithms for reallocating the slack for future tasks to reduce energy and/or satisfy deadline constraints. Experimental results show that our algorithms are comparable to static algorithms applied at runtime in terms of energy minimization and deadline satisfaction, but require considerably smaller computational overhead. 相似文献

8.

Techniques for compiling programs on distributed memory multicomputers

PeiZong Lee 《Parallel Computing》1995,21(12):1895-1923

It is widely accepted that distributed memory parallel computers will play an important role in solving computation-intensive problems. However, the design of an algorithm in a distributed memory system is time-consuming and error-prone, because a programmer is forced to manage both parallelism and communication. In this paper, we present techniques for compiling programs on distributed memory parallel computers. We will study the storage management of data arrays and the execution schedule arrangement of Do-loop programs on distributed memory parallel computers. First, we introduce formulas for representing data distribution of specific data arrays across processors. Then, we define communication cost for some message-passing communication operations. Next, we derive a dynamic programming algorithm for data distribution. After that, we show how to improve the communication time by pipelining data, and illustrate how to use data-dependence information for pipelining data. Jacobi's iterative algorithm and the Gauss elimination algorithm for linear systems are used to illustrate our method. We also present experimental results on a 32-node nCUBE-2 computer. 相似文献