首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure.SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analysed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform.The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.  相似文献   

2.
The power consumption of modern high‐performance computing (HPC) systems that are built using power hungry commodity servers is one of the major hurdles for achieving Exascale computation. Several efforts have been made by the HPC community to encourage the use of low‐powered system‐on‐chip (SoC) embedded processors in large‐scale HPC systems. These initiatives have successfully demonstrated the use of ARM SoCs in HPC systems, but there is still a need to analyze the viability of these systems for HPC platforms before a case can be made for Exascale computation. The major shortcomings of current ARM‐HPC evaluations include a lack of detailed insights about performance levels on distributed multicore systems and performance levels for benchmarking in large‐scale applications running on HPC. In this paper, we present a comprehensive evaluation of results that covers major aspects of server and HPC benchmarking for ARM‐based SoCs. For the experiments, we built an unconventional cluster of ARM Cortex‐A9s that is referred to as Weiser and ran single‐node benchmarks (STREAM, Sysbench, and PARSEC) and multi‐node scientific benchmarks (High‐performance Linpack (HPL), NASA Advanced Supercomputing (NAS) Parallel Benchmark, and Gadget‐2) in order to provide a baseline for performance limitations of the system. Based on the experimental results, we claim that the performance of ARM SoCs depends heavily on the memory bandwidth, network latency, application class, workload type, and support for compiler optimizations. During server‐based benchmarking, we observed that when performing memory intensive benchmarks for database transactions, x86 performed 12% better for multithreaded query processing. However, ARM performed four times better for performance to power ratios for a single core and 2.6 times better on four cores. We noticed that emulated double precision floating point in Java resulted in three to four times slower performance as compared with the performance in C for CPU‐bound benchmarks. Even though Intel x86 performed slightly better in computation‐oriented applications, ARM showed better scalability in I/O bound applications for shared memory benchmarks. We incorporated the support for ARM in the MPJ‐Express runtime and performed comparative analysis of two widely used message passing libraries. We obtained similar results for network bandwidth, large‐scale application scaling, floating‐point performance, and energy‐efficiency for clusters in message passing evaluations (NBP and Gadget 2 with MPJ‐Express and MPICH). Our findings can be used to evaluate the energy efficiency of ARM‐based clusters for server workloads and scientific workloads and to provide a guideline for building energy‐efficient HPC clusters. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
The energy consumption of High Performance Computing (HPC) systems, which are the key technology for many modern computation-intensive applications, is rapidly increasing in parallel with their performance improvements. This increase leads HPC data centers to focus on three major challenges: the reduction of overall environmental impacts, which is driven by policy makers; the reduction of operating costs, which are increasing due to rising system density and electrical energy costs; and the 20 MW power consumption boundary for Exascale computing systems, which represent the next thousandfold increase in computing capability beyond the currently existing petascale systems. Energy efficiency improvements will play a major part in addressing these challenges.This paper presents a toolset, called Power Data Aggregation Monitor (PowerDAM), which collects and evaluates data from all aspects of the HPC data center (e.g. environmental information, site infrastructure, information technology systems, resource management systems, and applications). The aim of PowerDAM is not to improve the HPC data center's energy efficiency, but is to collect energy relevant data for analysis without which energy efficiency improvements would be non-trivial and incomplete. Thus, PowerDAM represents a first step towards a truly unified energy efficiency evaluation toolset needed for improving the overall energy efficiency of HPC data centers.  相似文献   

4.
The large processing requirements of seismic wave propagation simulations make High Performance Computing (HPC) architectures a natural choice for their execution. However, to keep both the current pace of performance improvements and the power consumption under a strict power budget, HPC systems must be more energy efficient than ever. As a response to this need, energy-efficient and low-power processors began to make their way into the market. In this paper we employ a novel low-power processor, the MPPA-256 manycore, to perform seismic wave propagation simulations. It has 256 cores connected by a NoC, no cache-coherence and only a limited amount of on-chip memory. We describe how its particular architectural characteristics influenced our solution for an energy-efficient implementation. As a counterpoint to the low-power MPPA-256 architecture, we employ Xeon Phi, a performance-centric manycore. Although both processors share some architectural similarities, the challenges to implement an efficient seismic wave propagation kernel on these platforms are very different. In this work we compare the performance and energy efficiency of our implementations for these processors to proven and optimized solutions for other hardware platforms such as general-purpose processors and a GPU. Our experimental results show that MPPA-256 has the best energy efficiency, consuming at least 77% less energy than the other evaluated platforms, whereas the performance of our solution for the Xeon Phi is on par with a state-of-the-art solution for GPUs.  相似文献   

5.
Abstract

As an alternative to traditional computing architecture, cloud computing now is rapidly growing. However, it is based on models like cluster computing in general. Now supercomputers are getting more and more powerful, helping scientists have more indepth understanding of the world. At the same time, clusters of commodity servers have been mainstream in the IT industry, powering not only large Internet services but also a growing number of data-intensive scientific applications, such as MPI based deep learning applications. In order to reduce the energy cost, more and more efforts are made to improve the energy consumption of HPC systems. Because I/O accesses account for a large portion of the execution time for data intensive applications, it is critical to design energy-aware parallel I/O functions for addressing challenges related to HPC energy efficiency. As the de facto standard for designing parallel applications in cluster environment, the Message Passing Interface has been widely used in high performance computing, therefore, getting the energy consumption information of MPI applications is critical for improving the energy efficiency of HPC systems. In this work we first present our energy measurement tool, a software framework that eases the energy collection in cluster environment. And then we present an approach which can optimise the parallel I/O operation’s energy efficiency. The energy scheduling algorithm is evaluated in a cluster.  相似文献   

6.
Due to the explosive increases of data from both the cyber and physical worlds, the demand for database support in embedded systems is increasing. Databases for embedded systems, or embedded databases, are expected to provide timely in situ data services under various resource constraints, such as limited energy. However, traditional buffer cache management schemes, in which the primary goal is to minimize the number of I/O operations, is problematic since they do not consider the constraints of modern embedded devices such as limited energy and distinctive underlying storage. In particular, due to asymmetric read/write characteristics of flash memory-based storage of modern embedded devices, minimum buffer cache misses neither coincide with minimum power consumption nor minimum I/O deadline misses. In this paper we propose a novel power- and time-aware buffer cache management scheme for embedded databases. A novel multi-dimensional feedback control architecture is proposed and the characteristics of underlying storage of modern embedded devices is exploited for the simultaneous support of the desired I/O power consumption and the I/O deadline miss ratio. We have shown through an extensive simulation that our approach satisfies both power and timing requirements in I/O operations under a variety of workloads while consuming significantly smaller buffer space than baseline approaches.  相似文献   

7.
Dynamic power management (DPM) and dynamic voltage scaling (DVS) are crucial techniques to reduce the energy consumption in embedded real-time systems. Many previous studies have focused on the energy consumption of the processor or I/O devices. In this paper, we focus on the problem of energy management integrating DVS and DPM techniques for periodic embedded real-time applications with rate monotonic (RM) policy and present a system level fixed priority energy-efficient scheduling (SLFPEES) algorithm. The SLFPEES algorithm consists of I/O device scheduling and job scheduling. I/O device scheduling is based on the dynamic power management with rate monotonic (DPM-RM) policy which puts devices into the sleep state when the idle interval is larger than devices break even time. Job scheduling is based on the RM policy and uses stack resource protocol (SRP) to guarantee exclusive access to the shared resources. For energy efficiency, the SLFPEES algorithm schedules the task with a lower speed and a higher speed. The experimental result shows that the SLFPEES algorithm can yield significantly energy savings with respect to the existing techniques.  相似文献   

8.
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.  相似文献   

9.
RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project. In particular, the need for predictive reliability approaches to maximizing hardware lifetime and guarantee application performance is identified as the key concern for RECIPE. We address it through hierarchical resource management of the heterogeneous architectural components of the system, driven by estimates of the application latency and hardware reliability obtained respectively through timing analysis and modeling thermal properties and mean-time-to-failure of subsystems. We show the impact of prediction accuracy on the overheads imposed by the checkpointing policy, as well as a possible application to a weather forecasting use case.  相似文献   

10.
In this paper, we present two heuristic energy-aware scheduling algorithms (EGMS and EGMSIV) for scheduling task precedence graphs in an embedded multiprocessor system having processing elements with dynamic voltage scaling capabilities. Unlike most energy-aware scheduling algorithms that consider task ordering and voltage scaling separately from task mapping, our algorithms consider them in an integrated way. EGMS uses the concept of energy gradient to select tasks to be mapped onto new processors and voltage levels. EGM-SIV extends EGMS by introducing intra-task voltage scaling using a Linear Programming (LP) formulation to further reduce the energy consumption. Through rigorous simulations, we compare the performance of our proposed algorithms with a few approaches presented in the literature. The results demonstrate that our algorithms are capable of obtaining energy-efficient schedules using less optimization time. On the average, our algorithms produce schedules which consume 10% less energy with more than 47% reduction in optimization time when compared to a few approaches presented in the literature. In particular, our algorithms perform better in generating energy-efficient schedules for larger task graphs. Our results show a reduction of up to 57% in energy consumption for larger task graphs compared to other approaches.  相似文献   

11.
Achieving a high sustained simulation performance is the most important concern in the HPC community. To this end, many kinds of HPC system architectures have been proposed, and the diversity of the HPC systems grows rapidly. Under this circumstance, a vector-parallel supercomputer SX-ACE has been designed to achieve a high sustained performance of memory-intensive applications by providing a high memory bandwidth commensurate with its high computational capability. This paper examines the potential of the modern vector-parallel supercomputer through the performance evaluation of SX-ACE using practical engineering and scientific applications. To improve the sustained simulation performances of practical applications, SX-ACE adopts an advanced memory subsystem with several new architectural features. This paper discusses how these features, such as MSHR, a large on-chip memory, and novel vector processing mechanisms, are beneficial to achieve a high sustained performance for large-scale engineering and scientific simulations. Evaluation results clearly indicate that the high sustained memory performance per core enables the modern vector supercomputer to achieve outstanding performances that are unreachable by simply increasing the number of fine-grain scalar processor cores. This paper also discusses the performance of the HPCG benchmark to evaluate the potentials of supercomputers with balanced memory and computational performance against heterogeneous and cutting-edge scalar parallel systems.  相似文献   

12.
The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. HPC users need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based on what they use. However, the growing demand drastically increases the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high energy cost which will reduce the profit margin of Cloud providers, but also high carbon emissions which are not environmentally sustainable. Hence, there is an urgent need for energy-efficient solutions that can address the high increase in the energy consumption from the perspective of not only the Cloud provider, but also from the environment. To address this issue, we propose near-optimal scheduling policies that exploit heterogeneity across multiple data centers for a Cloud provider. We consider a number of energy efficiency factors (such as energy cost, carbon emission rate, workload, and CPU power efficiency) which change across different data centers depending on their location, architectural design, and management system. Our carbon/energy based scheduling policies are able to achieve on average up to 25% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions.  相似文献   

13.
面对大数据带来的能耗及环境方面的严峻问题,构建节能的绿色数据库系统已成为关键需求和重要挑战。针对现有数据库系统主要以性能优化为目标,缺少对能耗的感知及优化的问题,提出基于数据库负载的能耗感知模型,并将模型应用于基于固态硬盘(SSD)的数据库系统中。首先,将数据库负载执行过程中对主要系统资源(CPU、固态硬盘)的消耗解析为时间开销和功耗开销,并基于SSD数据库负载的基本I/O类型构建时间开销模型和功耗开销模型,实现为数据库构建资源开销单位统一的能耗感知模型;然后,利用多元线性回归实现对模型的求解,并分别在独占环境和竞争环境下,验证模型对不同I/O类型的数据库负载能耗估算的准确性;最后,分析实验结果,并讨论了影响模型准确性的因素。经实验验证模型准确度较高,在DBMS独占系统资源情况下的平均误差为5.15%,绝对误差不超过9.8%;竞争环境下的准确率相对下降,但平均误差也低于12.21%,可有效构建能耗感知的绿色数据库系统。  相似文献   

14.
Exascale computing is one of the major challenges of this decade, and several studies have shown that communications are becoming one of the bottlenecks for scaling parallel applications. The analysis on the characteristics of communications can effectively aid to improve the performance of scientific applications. In this paper, we focus on the statistical regularity in time-dimension communication characteristics for representative scientific applications on supercomputer systems, and then prove that the distribution of communication-event intervals has a power-law decay, which is common in scientific interests and human activities. We verify the distribution of communication-event intervals has really a power-lawdecay on the Tianhe-2 supercomputer, and also on the other six parallel systems with three different network topologies and two routing policies. In order to do a quantitative study on the power-law distribution, we exploit two groups of statistics: bursty vs. memory and periodicity vs. dispersion. Our results indicate that the communication events show a “strong-bursty and weak-memory” characteristic and the communication event intervals show the periodicity and the dispersion. Finally, our research provides an insight into the relationship between communication optimizations and time-dimension communication characteristics.  相似文献   

15.
With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability.MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Pflops, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H2FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability.  相似文献   

16.
Object-based parallel file systems have emerged as promising storage solutions for high-performance computing (HPC) systems. Despite the fact that object storage provides a flexible interface, scheduling highly concurrent I/O requests that access a large number of objects still remains as a challenging problem, especially in the case when stragglers (storage servers that are significantly slower than others) exist in the system. An efficient I/O scheduler needs to avoid possible stragglers to achieve low latency and high throughput. In this paper, we introduce a log-assisted straggler-aware I/O scheduling to mitigate the impact of storage server stragglers. The contribution of this study is threefold. First, we introduce a client-side, log-assisted, straggler-aware I/O scheduler architecture to tackle the storage straggler issue in HPC systems. Second, we present three scheduling algorithms that can make efficient decision for scheduling I/Os while avoiding stragglers based on such an architecture. Third, we evaluate the proposed I/O scheduler using simulations, and the simulation results have confirmed the promise of the newly introduced straggler-aware I/O scheduler.  相似文献   

17.
The increasing data demands of applications from various domains and the decreasing relative power cost of CPU computation have gradually exposed data movement cost as the prominent factor of energy consumption in computing systems. The traditional organization of the computer system software into a layered stack, while providing a straightforward modularity, poses a significant challenge for the global optimization of data movement in particular and, thus, the energy efficiency in general. Optimizing the energy efficiency of data movement in large-scale systems is a difficult tasks because it depends on a complex interplay of various factors at different system layers. In this work, we address the challenge of optimizing the data movement of the storage I/O stack in a holistic manner. Our approach consists of a model-based system driver that obtains the current I/O power regime and adapts the CPU frequency level according to this information. On the one hand, for simplifying the understanding of the relation between data movement and energy efficiency, this paper proposes novel energy prediction models for data movement based on series of runtime metrics from several I/O stack layers. We provide an in-depth study of the energy consumption in the data path, including the identification and analysis of power and performance regimes that synthesize the energy consumption patterns in a cross-layer approach. On the other hand, we propose and prototype a kernel driver that exploits data movement awareness for improving the current CPU-centric energy management.  相似文献   

18.
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware, to increase flexibility and to reduce development efforts. We evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general. The processing power and memory bandwidth of current GPUs are highly imbalanced for correlation purposes. While the production correlator on the Blue Gene/P achieves a superb 96% of the theoretical peak performance, this is only 16% on ATI GPUs, and 32% on NVIDIA GPUs. The Cell/B.E. processor, in contrast, achieves an excellent 92%. We found that the Cell/B.E. and NVIDIA GPUs are the most energy-efficient solutions, they run the correlator at least 4 times more energy efficiently than the Blue Gene/P. The research presented is an important pathfinder for next-generation telescopes.  相似文献   

19.
Virtualized datacenters and clouds are being increasingly considered for traditional High-Performance Computing (HPC) workloads that have typically targeted Grids and conventional HPC platforms. However, maximizing energy efficiency and utilization of datacenter resources, and minimizing undesired thermal behavior while ensuring application performance and other Quality of Service (QoS) guarantees for HPC applications requires careful consideration of important and extremely challenging tradeoffs. Virtual Machine (VM) migration is one of the most common techniques used to alleviate thermal anomalies (i.e., hotspots) in cloud datacenter servers as it reduces load and, hence, the server utilization. In this article, the benefits of using other techniques such as voltage scaling and pinning (traditionally used for reducing energy consumption) for thermal management over VM migrations are studied in detail. As no single technique is the most efficient to meet temperature/performance optimization goals in all situations, an autonomic approach that performs energy-efficient thermal management while ensuring the QoS delivered to the users is proposed. To address the problem of VM allocation that arises during VM migrations, an innovative application-centric energy-aware strategy for Virtual Machine (VM) allocation is proposed. The proposed strategy ensures high resource utilization and energy efficiency through VM consolidation while satisfying application QoS by exploiting knowledge obtained through application profiling along multiple dimensions (CPU, memory, and network bandwidth utilization). To support our arguments, we present the results obtained from an experimental evaluation on real hardware using HPC workloads under different scenarios.  相似文献   

20.
An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and developed resource allocation mechanisms with a variety of architectures and services. In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. Resource allocation mechanisms and strategies play a vital role towards the performance improvement of all the HPCs classifications. Therefore, a comprehensive discussion of widely used resource allocation strategies deployed in HPC environment is required, which is one of the motivations of this survey. Moreover, we have classified the HPC systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud systems and define the characteristics of each class by extracting sets of common attributes. All of the aforementioned systems are cataloged into pure software and hybrid/hardware solutions. The system classification is used to identify approaches followed by the implementation of existing resource allocation strategies that are widely presented in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号