首页 | 本学科首页   官方微博 | 高级检索  
     

基于Spark的大数据混合计算模型
引用本文:胡俊,胡贤德,程家兴.基于Spark的大数据混合计算模型[J].计算机系统应用,2015,24(4):214-218.
作者姓名:胡俊  胡贤德  程家兴
作者单位:1. 安徽新华学院信息工程学院,合肥,230088
2. 安徽新华学院信息工程学院,合肥 230088; 安徽大学计算机科学与技术学院,合肥 230031
摘    要:现实世界大数据应用复杂多样,可能会同时包含不同特征的数据和计算,在这种情况下单一的计算模式多半难以满足整个应用的需求,因此需要考虑不同计算模式的混搭使用。混合计算模式之集大成者当属UCBerkeley AMPLab的Spark系统,其涵盖了几乎所有典型的大数据计算模式,包括迭代计算、批处理计算、内存计算、流式计算(Spark Streaming)、数据查询分析计算(Shark)、以及图计算(GraphX)。 Spark提供了一个强大的内存计算引擎,实现了优异的计算性能,同时还保持与Hadoop平台的兼容性。因此,随着系统的不断稳定和成熟, Spark有望成为与Hadoop共存的新一代大数据处理系统和平台。本文详细研究和分析了Spark生态系统,建立了基于Spark平台的混合计算模型架构,并说明通过spark生态系统可以有效地满足大数据混合计算模式的应用。

关 键 词:大数据  混合计算模式  spark  弹性分布数据集
收稿时间:2014/7/19 0:00:00
修稿时间:2014/8/25 0:00:00

Big Data Hybrid Computing Mode Based on Spark
HU Jun,HU Xian-De and CHEN Jia-Xing.Big Data Hybrid Computing Mode Based on Spark[J].Computer Systems& Applications,2015,24(4):214-218.
Authors:HU Jun  HU Xian-De and CHEN Jia-Xing
Affiliation:School of Information Engineering, Anhui Xinhua University, Hefei 230088, China;School of Information Engineering, Anhui Xinhua University, Hefei 230088, China;School of Information Engineering, Anhui Xinhua University, Hefei 230088, China;College of Computer Science and Technology, Anhui University, Hefei 230031, China
Abstract:The use of big data in the real world was complicated. It may contain different characteristic of data and computing. In this case, the single computing mode was mostly difficult to met the application requirements. Therefore we need to consider different computing mode of mix use. The ultimate evolution of hybrid computing mode is spark system which invented by UCBerkeley AMPLab. It covers almost all the typical big data computing mode, including iterative computing, batch computing, memory computing, flow computing (Spark Streaming), data query analysis (Shark), and map computing (GraphX). Spark provides a powerful memory computing engine and implents computing with excellent performance, while maintaining compatibility with the Hadoop platform. Therefore, with the continuous stable and mature, Spark is expected to be colocalized with Hadoop and became a new generation of big data processing systems and platforms. The paper has studied and analyed the Spark ecosystem, and set up the hybrid computing model architecture based on Spark platform, which also has illustrated the spark ecosystem can meet the application of hybrid computing model.
Keywords:big data  hybrid computing mode  spark  resilient distributed dataset
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号