首页 | 本学科首页   官方微博 | 高级检索  
     


System-Level Resource Monitoring in High-Performance Computing Environments
Authors:Sandip Agarwala  Christian Poellabauer  Jiantao Kong  Karsten Schwan  Matthew Wolf
Affiliation:(1) College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA E-mail: sandip@cc.gatech.edu
Abstract:Low-overhead resource monitoring is key to the successful management of distributed high-performance computing environments, particularly when applications have well-defined quality of service (QoS) requirements. The dproc system-level monitoring mechanisms provide tools both for efficiently monitoring system-level events and for notifying remote hosts of events relevant to their operation. Implemented as extension to the Linux kernel, dproc provides several key functions. First, utilizing the familiar /proc virtual filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which implements events and event channels. Third, and the focus of this paper, is dproc's run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that (a) data streams can be customized according to a client's resource availabilities (dynamic stream management), (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), an appropriate balance can be maintained between monitoring overheads and application quality, and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications.
Keywords:cluster computing  customizability  distributed systems  dynamic adaptation  high performance computing  kernel-level resource management  monitoring
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号