System-Level Resource Monitoring in High-Performance Computing Environments |
| |
Authors: | Sandip Agarwala Christian Poellabauer Jiantao Kong Karsten Schwan Matthew Wolf |
| |
Affiliation: | (1) College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA E-mail: sandip@cc.gatech.edu |
| |
Abstract: | Low-overhead resource monitoring is key to the successful management of distributed high-performance computing environments, particularly when applications have well-defined quality of service (QoS) requirements. The dproc system-level monitoring mechanisms provide tools both for efficiently monitoring system-level events and for notifying remote hosts of events relevant to their operation. Implemented as extension to the Linux kernel, dproc provides several key functions. First, utilizing the familiar /proc virtual filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which implements events and event channels. Third, and the focus of this paper, is dproc's run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that (a) data streams can be customized according to a client's resource availabilities (dynamic stream management), (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), an appropriate balance can be maintained between monitoring overheads and application quality, and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications. |
| |
Keywords: | cluster computing customizability distributed systems dynamic adaptation high performance computing kernel-level resource management monitoring |
本文献已被 SpringerLink 等数据库收录! |
|