Improving prediction of computational job execution times with machine learning |
| |
Authors: | Bartosz Balis Tomasz Lelek Jakub Bodera Michal Grabowski Costin Grigoras |
| |
Affiliation: | 1. Institute of Computer Science, AGH University of Science and Technology, Kraków, Poland;2. CERN, European Organization for Nuclear Research, Geneva, Switzerland |
| |
Abstract: | Predicting resource consumption and run time of computational workloads is crucial for efficient resource allocation, or cost and energy optimization. In this paper, we evaluate various machine learning techniques to predict the execution time of computational jobs. For experiments we use datasets from two application areas: scientific workflow management and data processing in the ALICE experiment at CERN. We apply a two-stage prediction method and evaluate its performance. Other evaluated aspects include: (1) comparing performance of global (per-workflow) versus specialized (per-job) models; (2) impact of prediction granularity in the first stage of the two-stage method; (3) using various feature sets, feature selection, and feature importance analysis; (4) applying symbolic regression in addition to classical regressors. Our results provide new valuable insights on using machine learning techniques to predict the runtime behavior of computational jobs. |
| |
Keywords: | ALICE experiment job run-time prediction machine learning scientific workflows |
|
|