首页 | 官方网站   微博 | 高级检索  
     

Spark迭代密集型应用的优化方法研究
引用本文:魏占辰,刘晓宇,黄秋兰,孙功星.Spark迭代密集型应用的优化方法研究[J].计算机工程与应用,2020,56(23):68-73.
作者姓名:魏占辰  刘晓宇  黄秋兰  孙功星
作者单位:1.中国科学院 高能物理研究所,北京 100049 2.中国科学院大学,北京 100049
摘    要:Spark是一个非常流行且广泛适用的大数据处理框架,具有良好的易用性和可扩展性。但在实际应用中,仍然存在一些问题需要解决。例如在部分迭代计算场景中,得到的加速效果并不理想,究其原因在于使用Spark等分布式系统后引入的额外损耗较大。为准确分析并降低这些损耗,提出了Spark效率分析公式,以分布式计算代价衡量额外损耗,以有效计算比衡量执行效率。在此基础上,还针对Spark迭代密集型应用设计并实现了一种优化策略。测试结果表明,有效计算比和程序执行性能得到了大幅提升,其中有效计算比提升了约0.373,程序执行时间缩短了约68.2%。

关 键 词:Spark  迭代密集型应用优化  分布式计算代价  有效计算比  

Research on Optimization for Iteration-Intensive Applications on Spark
WEI Zhanchen,LIU Xiaoyu,HUANG Qiulan,SUN Gongxing.Research on Optimization for Iteration-Intensive Applications on Spark[J].Computer Engineering and Applications,2020,56(23):68-73.
Authors:WEI Zhanchen  LIU Xiaoyu  HUANG Qiulan  SUN Gongxing
Affiliation:1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China 2.University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Spark is a very popular and widely applicable big data processing framework with good easy-using and scalability. However, there are still some problems that need to be solved in practical applications. For example, in some iteration-intensive computing scenarios, the acceleration effect is not ideal. The reason is that the application efficiency is influenced by large additional loss introduced when using Spark. In order to accurately analyze and reduce these losses, this paper proposes a Spark efficiency formula. Additional losses are measured with the distributed calculation cost and application efficiency is measured with effective calculation ratio. This paper also proposes an optimization strategy for iteration-intensive applications on Spark according to the formula. Test results show that the effective calculation ratio has been greatly improved by about 0.373 and the execution time has been reduced by about 68.2%.
Keywords:Spark  optimization for iteration-intensive application  distributed calculation cost  effective calculation ratio  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号