期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A survey of the research on power management techniques for high‐performance systems

Yongpeng Liu Hong Zhu 《Software》2010,40(11):943-964

This paper surveys the research on power management techniques for high‐performance systems. These include both commercial high‐performance clusters and scientific high‐performance computing (HPC) systems. Power consumption has rapidly risen to an intolerable scale. This results in both high operating costs and high failure rates so it is now a major cause for concern. It has imposed new challenges to the development of high‐performance systems. In this paper, we first review the basic mechanisms that underlie power management techniques. Then we survey two fundamental techniques for power management: metrics and profiling. After that, we review the research for the two major types of high‐performance systems: commercial clusters and supercomputers. Based on this, we discuss the new opportunities and problems presented by the recent adoption of virtualization techniques, and again we present the most recent research on this. Finally, we summarize and discuss the future research directions. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

2.

The FEAST indices—realistic evaluation of modern software components and processor technologies

《Computers & Mathematics with Applications》2001,41(10-11):1431-1464

We examine the computational efficiency of linear algebra components in iterative solvers for grid-oriented simulations of PDEs. While the standard sparse matrix-vector (MV) techniques show significant losses of performance, especially on modern processors, our sparse banded components have the potential to exploit today's high computing power. We explain the major concepts of the FEAST software which contains such highly tuned numerical linear algebra basic components (Sparse Banded Blas) up to complete multigrid solvers, all being optimized with respect to the actual hardware platform. Based on algorithmic and computational studies, we present the FEAST indices which are indicators for the true performance of many modern processors, depending on the underlying FEM space, the problem size and the implementation style. These indices allow a new rating of the various hardware platforms with regard to different mathematical solution strategies, for academic and realistic numerical problems and ranging from ‘low cost’ PCs up to supercomputers. 相似文献

3.

A new rule-based power-aware job scheduler for supercomputers

Jun Wang Dezhi Han Ruijun Wang 《The Journal of supercomputing》2018,74(6):2508-2527

The fast processing speeds of the current generation of supercomputers provide a great convenience to scientists dealing with extremely large data sets. The next generation of exascale supercomputers could provide accurate simulation results for the automobile industry, aerospace industry, and even nuclear fusion reactors for the very first time. However, the energy cost of super-computing is extremely high, with a total electricity bill of 9 million dollars per year. Thus, conserving energy and increasing the energy efficiency of supercomputers have become critical in recent years. Many researchers have studied this problem and are trying to conserve energy by incorporating the dynamic voltage frequency scaling technique into their methods. However, this approach is limited, especially when the workload is high. In this paper, we developed a power-aware job scheduler by applying a rule-based control method and taking into consideration real-world power and speedup profiles to improve power efficiency while adhering to predetermined power constraints. The intensive simulation results showed that our proposed method is able to achieve the maximum utilization of computing resources as compared to baseline scheduling algorithms while keeping the energy cost under the threshold. Moreover, by introducing a power performance factor based on the real-world power and speedup profiles, we are able to increase the power efficiency by up to 75%. 相似文献

4.

Energy-efficient deadline scheduling for heterogeneous systems

Yan Ma Bin Gong Ryo Sugihara Rajesh Gupta 《Journal of Parallel and Distributed Computing》2012

Energy efficiency is a major concern in modern high performance computing (HPC) systems and a power-aware scheduling approach is a promising way to achieve that. While there are a number of studies in power-aware scheduling by means of dynamic power management (DPM) and/or dynamic voltage and frequency scaling (DVFS) techniques, most of them only consider scheduling at a steady state. However, HPC applications like scientific visualization often need deadline constraints to guarantee timely completion. In this paper we present power-aware scheduling algorithms with deadline constraints for heterogeneous systems. We formulate the problem by extending the traditional multiprocessor scheduling and design approximation algorithms with analysis on the worst-case performance. We also present a pricing scheme for tasks in the way that the price of a task varies as its energy usage as well as largely depending on the tightness of its deadline. Last we extend the proposed algorithm to the control dependence graph and the online case which is more realistic. Through the extensive experiments, we demonstrate that the proposed algorithm achieves near-optimal energy efficiency, on average 16.4% better for synthetic workload and 12.9% better for realistic workload than the EDD (Earliest Due Date)-based algorithm; The extended online algorithm also outperforms the EDF (Earliest Deadline First)-based algorithm with an average up to 26% of energy saving and 22% of deadline satisfaction. It is experimentally shown as well that the pricing scheme provides a flexible trade-off between deadline tightness and price. 相似文献

5.

A new hybrid solver with two‐level parallel computing for large‐scale structural analysis

Xinqiang Miao Xianlong Jin Junhong Ding 《Concurrency and Computation》2015,27(14):3661-3675

With the advancement of new processor and memory architectures, supercomputers of multicore and multinode architectures have become general tools for large‐scale engineering and scientific simulations. However, the nonuniform latencies between intranode and internode communications on these machines introduce new challenges that need to be addressed in order to achieve optimal performance. In this paper, a novel hybrid solver that is especially designed for supercomputers of multicore and multinode architectures is proposed. The new hybrid solver is characterized by its two‐level parallel computing approach on the basis of the strategies of two‐level partitioning and two‐level condensation. It distinguishes intranode and internode communications to minimize the communication overheads. Moreover, it further reduces the size of interface equation system to improve its convergence rate. Three numerical experiments of structural linear static analysis were conducted on DAWNING‐5000A supercomputer to demonstrate the validity and efficiency of the proposed method. Test results show that the proposed approach was superior in performance compared with the conventional Schur complement method. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

6.

面向人工智能和大数据的高效能计算

李肯立阳王东陈岑陈建国丁岩《数据与计算发展前沿》2020,2(1):27-37

【目的】本文主要分析人工智能和大数据应用随着迅速增大的数据规模,给计算机系统带来的主要挑战,并针对计算机系统的发展趋势给出了一些面向人工智能和大数据亟待解决的高效能计算的若干研究方向。【文献范围】本文广泛查阅国内外在超级计算和高性能计算平台进行大数据和人工智能计算的最新研究成果及解决的挑战性问题。【方法】大数据既为人工智能提供了日益丰富的训练数据集合,但也给计算机系统的算力提出了更高的要求。近年来我国超级计算机处于世界的前列,为大数据和人工智能的大规模应用提供了强有力的计算平台支撑。【结果】而目前以超级计算机为代表的高性能计算平台大多采用CPU+加速器构成的异构并行计算系统,其数量众多的计算核心能够为人工智能和大数据应用提供强大的计算能力。【局限性】由于体系结构复杂,在充分发挥计算能力和提高计算效率方面存在较大挑战。尤其针对有别于科学计算的人工智能和大数据领域,其并行计算效率的提升更为困难。【结论】因此需要从底层的资源管理、任务调度、以及基础算法设计、通信优化,到上层的模型并行化和并行编程等方面展开高效能计算的研究,全面提升人工智能和大数据应用在高性能计算平台上的计算能效。相似文献

7.

用于高性能计算的作业调度能效性研究综述

郑文旭潘晓东马迪汪浩《计算机工程与科学》2019,41(9):1526-1533

由于科学研究与商业应用等对高性能计算的需求与日俱增,高性能计算的性能和系统规模得到迅速发展。但是,急剧增长的功耗严重限制了高性能计算系统的设计和使用,使得低功耗技术成为高性能计算领域的关键技术。作为整个系统的核心组件,作业调度系统立足有限的系统资源,对用户提交的应用进行作业-资源分配,其能效性对于整个高性能计算系统的能耗控制与调节起到至关重要的作用。首先介绍主要的能量效率技术和常用的作业调度策略,然后对当前高性能计算作业调度能效性进行分析,并讨论了其面临的挑战及未来发展方向。相似文献

8.

高分辨率数值计算研究

张晓霞郝一正邵京云袁国兴《计算机工程与科学》2011,33(6):102

高分辨率计算是高置信度计算中一个极其重要而复杂的研究问题。相对传统的数值计算,高分辨率计算对计算机系统和应用程序(物理建模、参数、计算方法和算法等)提出了很高的要求。并行计算机的发展为大规模科学计算,特别是数值计算分辨率的提高提供了条件。同时,数值计算分辨率的提高也对计算机的计算能力、计算方法、物理建模和参数等提出了新的、更高的要求。本文以一个二维流体力学程序计算平面爆轰问题为例,研究在计算分辨率提高时初始起爆区域、时间步长、网格构造、人为粘性、计算机模拟误差、计算量增长等方面出现的问题,提出了相应的解决办法,提高了计算的精确度。相似文献

9.

Comparing system level power management policies 总被引：1，自引：0，他引：1

Yung-Hsiang Lu De Micheli G. 《Design & Test of Computers, IEEE》2001,18(2):10-19

Reducing power consumption is a challenge to system designers. Portable systems, such as laptop computers and personal digital assistants (PDAs), draw power from batteries, so reducing power consumption extends their operating times. For desktop computers or servers, high power consumption raises temperature and deteriorates performance and reliability. Soaring energy prices and rising concern about the environmental impact of electronics systems further highlight the importance of low power consumption. Power reduction techniques can be classified as static and dynamic. Static techniques, such as synthesis and compilation for low power, are applied at design time. In contrast, dynamic techniques use runtime behavior to reduce power when systems are serving light workloads or are idle. These techniques are known as dynamic power management (DPM). DPM can be achieved in different ways; for example, dynamic voltage scaling (DVS) changes supply voltage at runtime as a method of power management. Here, we use DPM specifically for shutting down unused I/O devices. We built an experimental environment on a laptop computer running Microsoft Windows. We implemented existing power management policies and quantitatively compared their effects on power saving and performance degradation 相似文献

10.

软硬件节能原理深度融合之绿色异构调度算法

王静莲龚斌刘弘李少辉《软件学报》2021,32(12):3768-3781

虚拟云高性能向高效能计算演进,已是环境保护、人类可持续发展的迫切需求.然而目前,一方面,硬件级物理节能空间需要适度延展;另一方面,以遗传或人工免疫算法为代表的元启发式调度中间件大多存在进化动力不足,以致收敛性和分布性冲突难平衡等瓶颈.事实上,每个候选解（调度方案）都蕴含一定的物理反馈效应,而拟配资源的非线性和异构性,则意味着不同方案间与能效相关的实时动态反馈的巨大差异化.因此,尊重科学规律,巧妙地借力于硬件节能原理,给算法优化动力注入新能量,并进一步增强软件方法的节能主导性,是本文研究方法;继而提出一种着眼于软硬件节能原理深度融合的新的绿色异构调度算法（GHSA_di/Ⅱ）,以多角度、全方位提升元启发式算法之协同进化模拟的内驱力.大量仿真实验结果显示：无论对于数据密集型还是计算密集型实例,GHSA_di/Ⅱ算法较其他3种元启发式异构调度算法,在整体性能、节能降耗以及可扩展性等方面都具明显优势. 相似文献

11.

虚拟化云计算平台的能耗管理 总被引：15，自引：0，他引：15

叶可江吴朝晖姜晓红何钦铭《计算机学报》2012,35(6):1262-1285

数据中心的高能耗是一个亟待解决的问题.近年来,虚拟化技术和云计算模式快速发展起来,因其具有资源利用率高、管理灵活、可扩展性好等优点,未来的数据中心将广泛采用虚拟化技术和云计算技术.将传统的能耗管理技术与虚拟化技术相结合,为云计算数据中心的能耗管理问题提供了新的解决思路,是一个重要的研究方向.文中从能耗测量、能耗建模、能耗管理实现机制、能耗管理优化算法4个方面对虚拟化云计算平台能耗管理的最新研究成果进行了介绍.论文分析了虚拟化云计算平台面临的操作管理和能耗管理两方面的问题,指出了虚拟化云计算平台能耗监控与测量的难点;介绍了能耗监测步骤及能耗轮廓分析方法;提出了虚拟机系统的整体能耗模型及服务器整合和在线迁移两种关键技术本身的能耗模型;从虚拟化层和云平台层两个层次总结了目前能耗管理机制方面取得的进展;并对能耗管理算法进行分类、比较.最后对全文进行总结,提出了未来十个值得进一步研究的方向. 相似文献

12.

面向云计算环境的能耗测量和管理方法

林伟伟吴文泰《软件学报》2016,27(4):1026-1041

云计算引领了计算机科学的一场重大变革,但与此同时,也不可避免地带来了日益凸显的能源消耗问题,因此,云计算能耗管理成为近几年的研究热点.云计算系统的能耗测量和管理直接关系到云计算的可持续发展,能耗数据不仅关系到能耗模型的建立,而且也是检验云计算资源调度算法的基础.为此,在广泛研究现有能耗测量方法的基础上,归纳总结了当前云计算环境的4种能耗测量方法:基于软件或硬件的直接测量方法、基于能耗模型的估算方法、基于虚拟化技术的能耗测量方法、基于仿真的能耗评估方法,并分析和比较了它们的优势、缺陷和适用环境.在此基础上,指出了云计算能耗管理的未来重要研究趋势:智能主机电源模块、面向不同类型应用的能耗模型、混合任务负载的能耗模型、可动态管理的高效云仿真工具、动态异构分布式集群的能耗管理、面向大数据分析处理和任务调度的节能方法以及新能源供电环境下的节能规划,为云计算节能领域的研究指明了方向. 相似文献

13.

An Energy-Oriented Evaluation of Buffer Cache Algorithms Using Parallel I/O Workloads

Jianhui Yue Yifeng Zhu Zhao Cai 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(11):1565-1578

Power consumption is an important issue for cluster supercomputers as it directly affects running cost and cooling requirements. This paper investigates the memory energy efficiency of high-end data servers used for supercomputers. Emerging memory technologies allow memory devices to dynamically adjust their power states and enable free rides by overlapping multiple DMA transfers from different I/O buses to the same memory device. To achieve maximum energy saving, the memory management on data servers needs to judiciously utilize these energy-aware devices. As we explore different management schemes under five real-world parallel I/O workloads, we find that the memory energy behavior is determined by a complex interaction among four important factors: (1) cache hit rates that may directly translate performance gain into energy saving, (2) cache populating schemes that perform buffer allocation and affect access locality at the chip level, (3) request clustering that aims to temporally align memory transfers from different buses into the same memory chips, and (4) access patterns in workloads that affect the first three factors. 相似文献

14.

Power saving-aware prefetching for SSD-based systems

Laura Prada Javier Garcia J. Daniel Garcia Jesus Carretero 《The Journal of supercomputing》2011,58(3):323-331

Energy saving for computing systems has recently become an important and worrying need. Energy demand has been increasing in many systems, especially in data centers and supercomputers. This article considers the problem of saving energy on storage systems taking advantage of SSD drives. SSD and magnetic disk devices offer different power characteristics, being SSD drives much less power consuming than conventional magnetic disk drives. 相似文献

15.

高性能计算中处理器功耗特征的评测与分析

刘勇鹏卢凯刘勇燕武林平陈娟《计算机工程与科学》2009,31(11)

高性能计算系统的系统结构和应用模式与单机系统或商用机群服务器系统都有很大的不同,掌握功耗特征是提高能效的前提。本文将支撑功耗管理的低功耗技术分为动态资源休眠和动态速率调节两类,并就处理器的这两类机制在高性能计算中的应用进行评测,验证了功耗管理在高性能计算中的有效性,量化分析了处理器功耗特征,指出了当前管理方案的不足及改进设想,对进一步能耗优化有重要的指导意义。相似文献

16.

Accelerating geoscience and engineering system simulations on graphics hardware

Stuart D.C. Walsh Martin O. Saar Peter Bailey David J. Lilja 《Computers & Geosciences》2009,35(12):2353-2364

Many complex natural systems studied in the geosciences are characterized by simple local-scale interactions that result in complex emergent behavior. Simulations of these systems, often implemented in parallel using standard central processing unit (CPU) clusters, may be better suited to parallel processing environments with large numbers of simple processors. Such an environment is found in graphics processing units (GPUs) on graphics cards.This paper discusses GPU implementations of three example applications from computational fluid dynamics, seismic wave propagation, and rock magnetism. These candidate applications involve important numerical modeling techniques, widely employed in physical system simulations, that are themselves examples of distinct computing classes identified as fundamental to scientific and engineering computing. The presented numerical methods (and respective computing classes they belong to) are: (1) a lattice-Boltzmann code for geofluid dynamics (structured grid class); (2) a spectral-finite-element code for seismic wave propagation simulations (sparse linear algebra class); and (3) a least-squares minimization code for interpreting magnetic force microscopy data (dense linear algebra class). Significant performance increases (between 10× and 30× in most cases) are seen in all three applications, demonstrating the power of GPU implementations for these types of simulations and, more generally, their associated computing classes. 相似文献

17.

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

Jack Dongarra Mathieu Faverge Hatem Ltaief Piotr Luszczek 《Concurrency and Computation》2014,26(7):1408-1431

The LU factorization is an important numerical algorithm for solving systems of linear equations in science and engineering and is a characteristic of many dense linear algebra computations. For example, it has become the de facto numerical algorithm implemented within the LINPACK benchmark to rank the most powerful supercomputers in the world, collected by the TOP500 website. Multicore processors continue to present challenges to the development of fast and robust numerical software due to the increasing levels of hardware parallelism and widening gap between core and memory speeds. In this context, the difficulty in developing new algorithms for the scientific community resides in the combination of two goals: achieving high performance while maintaining the accuracy of the numerical algorithm. This paper proposes a new approach for computing the LU factorization in parallel on multicore architectures, which not only improves the overall performance but also sustains the numerical quality of the standard LU factorization algorithm with partial pivoting. While the update of the trailing submatrix is computationally intensive and highly parallel, the inherently problematic portion of the LU factorization is the panel factorization due to its memory‐bound characteristic as well as the atomicity of selecting the appropriate pivots. Our approach uses a parallel fine‐grained recursive formulation of the panel factorization step and implements the update of the trailing submatrix with the tile algorithm. Based on conflict‐free partitioning of the data and lockless synchronization mechanisms, our implementation lets the overall computation flow naturally without contention. The dynamic runtime system called QUARK is then able to schedule tasks with heterogeneous granularities and to transparently introduce algorithmic lookahead. The performance results of our implementation are competitive compared to the currently available software packages and libraries. For example, it is up to 40% faster when compared to the equivalent Intel MKL routine and up to threefold faster than LAPACK with multithreaded Intel MKL BLAS. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

18.

Automatic runtime frequency-scaling system for energy savings in parallel applications

Vaibhav Sundriyal Masha Sosonkina Zhao Zhang 《The Journal of supercomputing》2014,68(2):777-797

Although high-performance computing has always been about efficient application execution, both energy and power consumption have become critical concerns owing to their effect on operating costs and failure rates of large-scale computing platforms. Modern processors provide techniques, such as dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (called throttling), to improve energy efficiency on-the-fly. Without careful application, however, DVFS and throttling may cause a significant performance loss due to system overhead. This paper proposes a novel runtime system that maximizes energy saving by selecting appropriate values for DVFS and throttling in parallel applications. Specifically, the system automatically predicts communication phases in parallel applications and applies frequency scaling considering both the CPU offload, provided by the network-interface card, and the architectural stalls during computation. Experiments, performed on NAS parallel benchmarks as well as on real-world applications in molecular dynamics and linear system solution, demonstrate that the proposed runtime system obtaining energy savings of as much as 14 % with a low performance loss of about 2 %. 相似文献

19.

云计算系统中数据中心的节能算法研究 总被引：3，自引：0，他引：3

张小庆贺忠堂李春林张恒喜钱琼芬《计算机应用研究》2013,30(4):961-964

简要介绍了云计算的定义和特点,重点研究了云计算数据中心的高能耗问题,对目前的节能算法进行了分类,重点综述了基于DVFS的节能算法、基于虚拟化的节能算法以及基于主机关闭/开启的节能算法,并对算法的优缺点和适用环境作了比较分析。最后总结了云计算数据中心的能耗管理中进一步的研究难题。相似文献

20.

Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures

Azzam Haidar Hatem Ltaief Asim YarKhan Jack Dongarra 《Concurrency and Computation》2012,24(3):305-321

The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared‐memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems mainly because of their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design of the multicore platform. The parallel linear algebra for scalable multicore architectures library developed at the University of Tennessee tackles this challenge by using tile algorithms to achieve a finer task granularity. These tile algorithms can then be represented by directed acyclic graphs, where nodes are the tasks and edges are the dependencies between the tasks. The paramount key to achieve high performance is to implement a runtime environment to efficiently schedule the execution of the directed acyclic graph across the multicore platform. This paper studies the impact on the overall performance of some parameters, both at the level of the scheduler (e.g., window size and locality) and the algorithms (e.g., left‐looking and right‐looking variants). We conclude that some commonly accepted rules for dense linear algebra algorithms may need to be revisited. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献