首页 | 本学科首页   官方微博 | 高级检索  
     

基于回归模型的Spark任务性能分析方法
引用本文:阚忠良,李建中. 基于回归模型的Spark任务性能分析方法[J]. 哈尔滨工业大学学报, 2018, 50(3): 192-198
作者姓名:阚忠良  李建中
作者单位:黑龙江大学计算机科学技术学院
摘    要:为解决Spark任务运行过程中的性能评估与改进问题,本文提出一种基于启发式算法和支持向量机回归模型的Spark性能评价与分析方法.本文首先提出一种启发式性能评价算法,该方法采用Ganglia收集并处理Spark任务运行时的集群资源消耗数据,根据k-means算法划分任务类型,并根据任务类型确定启发式性能评价算法的评价指标和初始权重.然后,从Spark历史服务器中收集并处理任务运行效率数据,与集群资源消耗数据一并作为Spark任务运行时的状态数据.最后,根据状态数据迭代确定启发式性能评价算法的最终权重,以此建立Spark性能评价回归模型.本文随后提出一种基于支持向量机SVM回归算法(SVR)的Spark性能分析方法.该方法对Spark配置参数与整体性能建立回归模型,然后对该回归模型进行敏感度分析,找到能够影响Spark性能的重要参数.实验结果表明,启发式性能评价算法能够量化Spark任务资源消耗和运行效率等各方面性能,比较全面地评估任务的整体性能.基于SVR的性能分析方法能够比较有效地应用于Spark任务的实际分析中,形成初步的Spark任务性能调优建议.

关 键 词:Spark  性能评价  回归模型  敏感度分析
收稿时间:2017-10-09

Spark task performance analysis method based on regression model
KAN Zhongliang and LI Jianzhong. Spark task performance analysis method based on regression model[J]. Journal of Harbin Institute of Technology, 2018, 50(3): 192-198
Authors:KAN Zhongliang and LI Jianzhong
Affiliation:School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China and School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
Abstract:To solve the problem of performance evaluation and improvement when the Spark tasks are performed, this paper proposes a Spark performance evaluation and an analysis method based on the heuristic algorithm and support vector machine regression model. A heuristic performance evaluation algorithm is proposed, which uses Ganglia to collect and process the consumption data of cluster resource when performing the Spark tasks. According to the k-means algorithm, the task type is determined and the evaluation index and the initial weight of the heuristic performance evaluation algorithm are determined according to the task type. The task efficiency data is collected and processed from the Spark history server, and it is regarded as the state data of the Spark run-time task along with the cluster resource consumption data. The final weight of the heuristic performance evaluation algorithm is determined according to the state data iteration process, and then the Spark Performance Evaluation Regression Model is established. A Spark performance analysis method based on support vector machine SVM regression algorithm (SVR) is proposed subsequently. This method establishes a regression model for the Spark configuration parameter and the overall performance, and then analyzes the sensitivity of the regression model to find important parameters that affect the performance of Spark. The experimental results show that the heuristic performance evaluation algorithm can quantify the performance of Spark task resource consumption and operation efficiency, and can comprehensively evaluate the overall performance of the task. The SVR-based performance analysis method can be applied to the actual analysis of Spark task effectively, which can form the initial tuning advice about the Spark mission performance.
Keywords:spark   performance evaluation   regression   sensitivity analysis
本文献已被 CNKI 等数据库收录!
点击此处可从《哈尔滨工业大学学报》浏览原始摘要信息
点击此处可从《哈尔滨工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号