首页 | 本学科首页   官方微博 | 高级检索  
     

Spark性能优化技术研究综述
引用本文:廖湖声,黄珊珊,徐俊刚,刘仁峰.Spark性能优化技术研究综述[J].计算机科学,2018,45(7):7-15, 37.
作者姓名:廖湖声  黄珊珊  徐俊刚  刘仁峰
作者单位:北京工业大学信息学部 北京100124,北京工业大学信息学部 北京100124,中国科学院大学计算机与控制学院 北京101408,中国科学院大学计算机与控制学院 北京101408
基金项目:本文受国家自然科学基金项目:云中并行程序性能分析方法研究(61372171)资助
摘    要:近年来,随着大数据时代的到来,大数据处理平台发展迅速,产生了诸如Hadoop,Spark,Storm等优秀的大数据处理平台,其中Spark最为突出。随着Spark在国内外的广泛应用,其许多性能问题尚待解决。由于Spark底层 的执行机制极为复杂,用户很难找到其性能瓶颈,更不要说进一步的优化。针对以上问题, 从开发原则优化、内存优化、配置参数优化、调度优化、Shuffle过程优化5个方面对 目前国内外的Spark优化技术进行总结和分析。最后,总结了目前Spark优化技术新的核心问题,并提出了未来的主要研究方向。

关 键 词:Spark  开发原则优化  参数优化  内存优化  调度优化  Shuffle过程优化
收稿时间:2017/7/1 0:00:00
修稿时间:2017/8/15 0:00:00

Survey on Performance Optimization Technologies for Spark
LIAO Hu-sheng,HUANG Shan-shan,XU Jun-gang and LIU Ren-feng.Survey on Performance Optimization Technologies for Spark[J].Computer Science,2018,45(7):7-15, 37.
Authors:LIAO Hu-sheng  HUANG Shan-shan  XU Jun-gang and LIU Ren-feng
Affiliation:Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China,Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China,School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 101408,China and School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 101408,China
Abstract:In recent years,with the advent of the era of big data,big data processing platform is developing very fast.A large number of big data processing platforms,including Hadoop,Spark,Strom and etc.,have appeared,among which Apache Spark is the most prominent one.With the wide applications of Spark at home and abroad, there are many performance problems to be solved.As the underlying implementation mechanism of Spark is very complex,it is difficult for ordinary users to find performance bottlenecks,let alone further optimization.In light of the above problems, the performance optimization technologies for Spark were summarized and analyzed from five aspects,including development principles optimization, memory optimization,configuration parameter optimization,scheduling optimization and shuffle process optimization.Finally,the key problems of Spark optimization technologies were summarized and future research issues were proposed.
Keywords:Spark  Development principle optimization  Configuration parameter optimization  Memory optimization  Scheduling optimization  Shuffle process optimization
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号