首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
DVFS is a ubiquitous technique for CPU power management in modern computing systems. Reducing processor frequency/voltage leads to a decrease of CPU power consumption and an increase in the execution time. In this paper, we analyze which application/platform characteristics are necessary for a successful energy-performance trade-off of large scale parallel applications. We present a model that gives an upper bound on performance loss due to frequency scaling using the application parallel efficiency. The model was validated with performance measurements of large scale parallel applications. Then we track how application sensitivity to frequency scaling evolved over the last decade for different cluster generations. Finally, we study how cluster power consumption characteristics together with application sensitivity to frequency scaling determine the energy effectiveness of the DVFS technique.  相似文献   

We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the way in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q.  相似文献   

Redundant array of independent SSDs (RAIS) is generally based on the traditional RAID design and implementation. The random small write problem is a serious challenge of RAIS. Random small writes in parity-based RAIS systems generate significantly more pre-reads and writes which can degrade RAIS performance and shorten SSD lifetime. In order to overcome the well-known write-penalty problem in the parity-based RAID5 storage systems, several logging techniques such as Parity Logging and Data Logging have been put forward. However, these techniques are originally based on mechanical characteristics of the HDDs, which ignore the properties of the flash memory. In this article, we firstly propose RAISL, a flash-aware logging method that improves the small write performance of RAIS storage systems. RAISL writes new data instead of new data and pre-read data to the log SSD by making full use of the invalid pages on the SSD of RAIS. RAISL does not need to perform the pre-read operations so that the original characteristics of workloads are kept. Secondly, we propose AGCRL on the basis of RAISL to further boost performance. AGCRL combines RAISL with access characteristic to guide read and write cost regulation to improve the performance of RAIS storage systems. Our experiments demonstrate that the RAISL significantly improves write performance and AGCRL improves both of write performance and read performance. AGCRL on average outperforms RAIS5 and RAISL by 39.15% and 16.59% respectively.  相似文献   

Scientific workflow orchestration interoperating HTC and HPC resources   总被引:1,自引:0,他引:1  
In this work we describe our developments towards the provision of a unified access method to different types of computing infrastructures at the interoperation level. For that, we have developed a middleware suite which bridges not interoperable middleware stacks used for building distributed computing infrastructures, UNICORE and gLite. Our solution allows to transparently access and operate on HPC and HTC resources from a single interface. Using Kepler as workflow manager, we provide users with the needed integration of codes to create scientific workflows accessing both types of infrastructures.  相似文献   

The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. HPC users need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based on what they use. However, the growing demand drastically increases the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high energy cost which will reduce the profit margin of Cloud providers, but also high carbon emissions which are not environmentally sustainable. Hence, there is an urgent need for energy-efficient solutions that can address the high increase in the energy consumption from the perspective of not only the Cloud provider, but also from the environment. To address this issue, we propose near-optimal scheduling policies that exploit heterogeneity across multiple data centers for a Cloud provider. We consider a number of energy efficiency factors (such as energy cost, carbon emission rate, workload, and CPU power efficiency) which change across different data centers depending on their location, architectural design, and management system. Our carbon/energy based scheduling policies are able to achieve on average up to 25% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions.  相似文献   

In this paper we propose a methodology underlying a development of system-wide energy consumption models for servers, which is based on the analysis of performance counters. It enables to estimate the power usage of a machine under any load at runtime. By clustering applications we extract groups of programs having similar characteristics. This allows us to create more specialized and accurate power usage models. By using decision trees it is possible to automatically select an appropriate model to current system load. Training and test sets of programs were used to test the estimates. The presented models are accurate within an error of 4% as verified on servers from different vendors, including the latest pre-production one.  相似文献   

The advent of unprecedentedly scalable yet energy hungry Exascale supercomputers poses a major challenge in sustaining a high performance-per-watt ratio. With I/O management acquiring a crucial role in supporting scientific simulations, various I/O management approaches have been proposed to achieve high performance and scalability. However, the details of how these approaches affect energy consumption have not been studied yet. Therefore, this paper aims to explore how much energy a supercomputer consumes while running scientific simulations when adopting various I/O management approaches. In particular, we closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. To do so, we implement the three approaches within the Damaris I/O middleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-petaflop supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid’5000 platform highlight the differences among these three approaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption. Moreover, we propose and validate a mathematical model that estimates the energy consumption of a HPC simulation under different I/O approaches. Our proposed model gives hints to pre-select the most energy-efficient I/O approach for a particular simulation on a particular HPC system and therefore provides a step towards energy-efficient HPC simulations in Exascale systems. To the best of our knowledge, our work provides the first in-depth look into the energy-performance tradeoffs of I/O management approaches.  相似文献   

In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attack in about one fifth of the time required to complete it. We could not experience the presence of false positives in our test environment and the overhead caused by the detection systems is negligible. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios.  相似文献   

Low carbon footprint energy sources such as solar and wind power typically suffer from unpredictable or limited availability. By globally distributing a number of these renewable sources, these effects can largely be compensated for. We look at the feasibility of this approach for powering already distributed data centers in order to operate at a reduced total carbon footprint. From our study we show that carbon footprint reductions are possible, but that these are highly dependent on the approach and parameters involved. Especially the manufacturing footprint and the geographical region are critical parameters to consider. Deploying additional data centers can help in reducing the total carbon footprint, but substantial reductions can be achieved when data centers with nominal capacity well below maximum capacity redistribute processing to sites based on renewable energy availability.  相似文献   

绿色计算是一种先进的计算技术,其目的是利用先进的思想、技术和方法来降低 计算系统的能耗,从而减少对人和环境的影响。而今嵌入式系统占整个计算系统的绝大多数,因此,嵌入式系统也需要绿色计算技术,使其能耗降低而不影响其性能。先对绿色计算的研究现状进行综述。然后对绿色嵌入式系统进行定义,并对其内涵进行探讨。最后对绿色嵌入式系统的绿色评价进行了讨论,对绿色嵌入式系统待研究的内容进行了探究。主要创新点在于利用绿色计算思想,提出了绿色嵌入式系统的概念,并对其相关问题进行了研究,指出了绿色嵌入式系统待研究的内容和方向。  相似文献   

The exponential growth of Internet during the last decade leads us to make more efforts in the researching and developing of sustainable Web servers in order to decrease the global energy demand. In this paper, we cover, as a first step, a proper review of the literature related to the energy efficiency research in Web server systems to depict the state of the art to plan further contributions as more research in sustainable Web systems. We also propose and implement an energy metric that permits to establish a relation between the Quality of Service (QoS) obtained by the system and the power it consumes.  相似文献   

A complete and efficient CUDA-sharing solution for HPC clusters   总被引:1,自引:0,他引:1  
In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes, as well as permits a single node to exploit the whole set of GPUs installed in the cluster. In our proposal, CUDA applications can seamlessly interact with any GPU in the cluster, independently of its physical location. Thus, GPUs can be either distributed among compute nodes or concentrated in dedicated GPGPU servers, depending on the cluster administrator’s policy. This proposal leads to savings not only in space but also in energy, acquisition, and maintenance costs. The performance evaluation in this paper with a series of benchmarks and a production application clearly demonstrates the viability of this proposal. Concretely, experiments with the matrix–matrix product reveal excellent performance compared with regular executions on the local GPU; on a much more complex application, the GPU-accelerated LAMMPS, we attain up to 11x speedup employing 8 remote accelerators from a single node with respect to a 12-core CPU-only execution. GPGPU service interaction in compute nodes, remote acceleration in dedicated GPGPU servers, and data transfer performance of similar GPU virtualization frameworks are also evaluated.  相似文献   

A key characteristic of cloud computing is elasticity, automatically adjusting system resources to an application's workload. Both reactive and horizontal approaches represent traditional means to offer this capability, in which rule‐condition‐action statements and upper and lower thresholds occur to instantiate or consolidate compute nodes and virtual machines. Although elasticity can be beneficial for many HPC (high‐performance computing) scenarios, it also imposes significant challenges in the development of applications. In addition to issues related to how we can incorporate this new feature in such applications, there is a problem associated with the performance and resource pair and, consequently, with energy consumption. Further exploring this last difficulty, we must be capable of analyzing elasticity effectiveness as a function of employed thresholds with clear metrics to compare elastic and non‐elastic executions properly. In this context, this article explores elasticity metrics in two ways: (i) the use of a cost function that combines application time with different energy models; (ii) the extension of speedup and efficiency metrics, commonly used to evaluate parallel systems, to cover cloud elasticity. To accomplish (i) and (ii), we developed an elasticity model known as AutoElastic, which reorganizes resources automatically across synchronous parallel applications. The results, obtained with the AutoElastic prototype using the OpenNebula middleware, are encouraging. Considering a CPU‐bound application, an upper threshold close to 70% was the best option for obtaining good performance with a non‐prohibitive elasticity cost. In addition, the value of 90% for this threshold was the best option when we plan an efficiency‐driven execution. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

Energy efficiency is a major concern in modern high performance computing (HPC) systems and a power-aware scheduling approach is a promising way to achieve that. While there are a number of studies in power-aware scheduling by means of dynamic power management (DPM) and/or dynamic voltage and frequency scaling (DVFS) techniques, most of them only consider scheduling at a steady state. However, HPC applications like scientific visualization often need deadline constraints to guarantee timely completion. In this paper we present power-aware scheduling algorithms with deadline constraints for heterogeneous systems. We formulate the problem by extending the traditional multiprocessor scheduling and design approximation algorithms with analysis on the worst-case performance. We also present a pricing scheme for tasks in the way that the price of a task varies as its energy usage as well as largely depending on the tightness of its deadline. Last we extend the proposed algorithm to the control dependence graph and the online case which is more realistic. Through the extensive experiments, we demonstrate that the proposed algorithm achieves near-optimal energy efficiency, on average 16.4% better for synthetic workload and 12.9% better for realistic workload than the EDD (Earliest Due Date)-based algorithm; The extended online algorithm also outperforms the EDF (Earliest Deadline First)-based algorithm with an average up to 26% of energy saving and 22% of deadline satisfaction. It is experimentally shown as well that the pricing scheme provides a flexible trade-off between deadline tightness and price.  相似文献   

在诸如星载并行计算机这样的系统中,一定周期内,系统能耗存在上限。围绕高产出率计算的概念,针对这种能耗受限条件下性能最大化的作业管理需求,提出了一种作业分配的模型。该模型基于计算节点、作业的性能和能耗参数,动态地进行作业分配,从而在满足能耗上限的前提下获得尽可能大的计算性能。通过模拟实验,验证了该算法的有效性。  相似文献   

In recent years, we have witnessed a growing interest in high performance computing (HPC) using a cluster of workstations. This growth made it affordable to individuals to have exclusive access to their own supercomputers. However, one of the challenges in a clustered environment is to keep system failure to the minimum and to achieve the highest possible level of system availability. High-Availability (HA) computing attempts to avoid the problems of unexpected failures through active redundancy and preemptive measures. Since the price of hardware components are significantly dropping, we propose to combine both HPC and HA concepts and layout the design of a HA-HPC cluster, considering all possible measures. In particular, we explore the hardware and the management layers of the HA-HPC cluster design, as well as a more focused study on the parallel-applications layer (i.e. FT-MPI implementations). Our findings show that combining HPC and HA architectures is feasible, in order to achieve HA cluster that is used for High Performance Computing.  相似文献   

Simulation has become an indispensable tool for researchers to explore systems without having recourse to real experiments. Depending on the characteristics of the modeled system, methods used to represent the system may vary. Multi-agent systems are often used to model and simulate complex systems. In any cases, increasing the size and the precision of the model increases the amount of computation, requiring the use of parallel systems when it becomes too large. In this paper, we focus on parallel platforms that support multi-agent simulations and their execution on high performance resources as parallel clusters. Our contribution is a survey on existing platforms and their evaluation in the context of high performance computing. We present a qualitative analysis of several multi-agent platforms, their tests in high performance computing execution environments, and the performance results for the only two platforms that fulfill the high performance computing constraints.  相似文献   

As the mean-time-between-failures (MTBF) continues to decline with the increasing number of components on large-scale high performance computing (HPC) systems, program failures might occur during the execution period with high probability. Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned. From the user perspective, if the program failure cannot be detected and handled in time, it would waste resources and delay the progress of program execution. Unfortunately, the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege. Currently, automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing. This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs. The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs. In addition, we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher (ARL) and evaluate it on the Tianhe-2 system. Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system. In addition, the communication and performance overhead caused by ARL is negligible. The good scalability of ARL makes it applicable for large-scale HPC systems.  相似文献   

We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithm-Based Fault Tolerance technique [K. Huang, J. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Transactions on Computers (Spec. Issue Reliable & Fault-Tolerant Comp.) 33 (1984) 518–528] to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault-tolerant matrix–matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix–matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号