首页 | 本学科首页   官方微博 | 高级检索  
     

基于NR-Transformer的集群作业运行时间预测
引用本文:陈奉贤.基于NR-Transformer的集群作业运行时间预测[J].计算机工程与科学,2022,44(7):1181-1190.
作者姓名:陈奉贤
作者单位:(兰州大学网络安全与信息化办公室,甘肃 兰州 730000)
摘    要:高性能集群的作业调度通常使用作业调度系统来实现,准确填写作业运行时间能在很大程度上提升作业调度效率。现有的研究通常使用机器学习的预测方式,在预测精度和实用性上还存在一定的提升空间。为了进一步提高集群作业运行时间预测的准确率,考虑先对集群作业日志进行聚类,将作业类别信息添加到作业特征中,再使用基于注意力机制的NR-Transformer网络对作业日志数据建模和预测。在数据处理上,根据与预测目标的相关性、特征的完整性和数据的有效性,从历史日志数据集中筛选出7维特征,并按作业运行时间的长度将其划分为多个作业集,再对各作业集分别进行训练和预测。实验结果表明,相比于传统机器学习和BP神经网络,时序神经网络结构有更好的预测性能,其中NR-Transformer在各作业集上都有较好的性能。

关 键 词:高性能计算  并行作业调度  用户聚类  时序神经网络  注意力机制  
收稿时间:2021-04-02
修稿时间:2021-09-14

Cluster job runtime prediction based on NR-Transformer
CHEN Feng-xian.Cluster job runtime prediction based on NR-Transformer[J].Computer Engineering & Science,2022,44(7):1181-1190.
Authors:CHEN Feng-xian
Affiliation:(Office of Network Security and Information,Lanzhou University,Lanzhou 730000,China)
Abstract:Job scheduling of high-performance clusters is usually implemented by the job scheduling system. Filling in the job running time accurately can greatly improve the efficiency of job scheduling. Existing research usually uses machine learning for prediction, and the prediction accuracy and practicality can be further improved. In order to further improve the accuracy of cluster job running time prediction, cluster job logs are firstly clustered, and job category information is added to job features. Secondly, the job log data is modeled and predicted using the attention-based NR-Transformer network. In data processing, according to the correlation with the prediction target, the integrity of the feature and the validity of the data, 7-dimensional features are selected from the historical log dataset, the dataset is divided into multiple job sets according to the length of the job running time, and then each job set is trained and predicted separately. The experimental results show that, compared with traditional machine learning and BP neural network, its timing neural network structure has better prediction performance, and NR-Transformer has better performance on each job set.
Keywords:high performance computing  parallel job scheduling  user clustering  timing neural network  attention mechanism  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号