首页 | 本学科首页   官方微博 | 高级检索  
     

Spark并行计算框架的内存优化
引用本文:廖旺坚,黄永峰,包从开.Spark并行计算框架的内存优化[J].计算机工程与科学,2018,40(4):587-593.
作者姓名:廖旺坚  黄永峰  包从开
作者单位:(1.清华大学电子工程系,北京 100084;2.清华大学信息科学与技术国家实验室(筹),北京 100084)
基金项目:国家科技支撑计划(2014BAH41B00);国家自然科学基金(U1405254,U1536207)
摘    要:以Spark为代表的集群并行计算框架在大数据、云计算浪潮中广泛应用,其运行性能优化是应用的关键。为提高运行性能,分析了Spark框架执行流程、内存管理机制,结合Spark和JVM两个层面内存管理的特点,提出3条优化策略:(1)通过序列化和压缩方式减少缓存数据大小,使得GC消耗降低,提升性能;(2)在一定范围内减少运行内存大小,用重算代替缓存,可以提升性能;(3)配置适当的JVM新生代和老生代的比例、Spark计算与缓存空间比例等内存分配参数,能够较大程度地提升性能。实验结果表明,序列化和压缩能够减少缓存占用空间42%;提交运行内存由1 000 MB减少到800 MB时,性能增加21%;优化内存配比,性能比默认参数有10%~30%的提升。

关 键 词:Spark  性能优化  堆内存  
收稿时间:2016-11-16
修稿时间:2018-04-25

Memory optimization of Spark parallel computing framework
LIAO Wang jian,HUANG Yong feng,BAO Cong kai.Memory optimization of Spark parallel computing framework[J].Computer Engineering & Science,2018,40(4):587-593.
Authors:LIAO Wang jian  HUANG Yong feng  BAO Cong kai
Affiliation:(1.Department of Electronic Engineering,Tsinghua University,Beijing 100084; 2.National Laboratory for Information Science and Technology(TNList),Tsinghua University,Beijing 100084,China)
Abstract:The cluster parallel computing framework represented by Spark is widely used in the big data and cloud computing, and its performance optimization is the key in applications.The paper analyzes the framework of the execution process and memory management mechanism of Spark framework. Combining the characteristics of Spark and JVM memory management,three strategies are proposed:(1) Serialization and compression are used to reduce the cache data size and reduce the occupied memory space, then reduce the GC consumption, thus improving the performance.(2) The running memory size is reduced within a certain range, and recalculation replaces the cache, thus improving the performance. (3)By adjusting the proportion of the old generation and new generation of the JVM,the ratio of Spark computing and cache space,and other memory allocation parameters, the performance can be improved greatly.Experiments show that the serialization and compression can reduce the cache space by 42%,the performance is increased by 21% when the submitting memory is reduced from 1 000 MB to 800 MB, and optimizing the memory ratio can improve the performance by 10% to 30%.
Keywords:Spark  performance optimization  heap memory  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号