首页 | 本学科首页   官方微博 | 高级检索  
     

分布式流处理技术综述
引用本文:崔星灿, 禹晓辉, 刘洋, 吕朝阳. 分布式流处理技术综述[J]. 计算机研究与发展, 2015, 52(2): 318-332. DOI: 10.7544/issn1000-1239.2015.20140268
作者姓名:崔星灿  禹晓辉  刘洋  吕朝阳
作者单位:(山东大学计算机科学与技术学院 济南 250101) (xccui@mail.sdu.edu.cn)
基金项目:国家自然科学基金项目,山东省自然科学基金项目,山东省科技发展计划基金项目,国家“九七三”重点基础研究发展计划基金项目,山东大学自主创新基金项目,泰山学者计划基金
摘    要:随着计算机和网络技术的迅猛发展以及数据获取手段的不断丰富,在越来越多的领域出现了对海量、高速数据进行实时处理的需求.由于此类需求往往超出传统数据处理技术的能力,分布式流处理模式应运而生.首先回顾分布式流处理技术产生的背景以及技术演进过程,然后将其与其他相关大数据处理技术进行对比,以界定分布式流数据处理的外延.进而对分布式流处理所需要考虑的数据模型、系统模型、存储管理、语义保障、负载控制、系统容错等主要问题进行深入分析,指出现有解决方案的优势和不足.随后,介绍S4,Storm,Spark Streaming等几种具有代表性的分布式流处理系统,并对它们进行系统地对比.最后,给出分布式流处理在社交媒体处理等领域的几种典型应用,并探讨分布式流处理领域进一步的研究方向.

关 键 词:大数据  数据流  分布式流处理  实时处理  分布式系统

Distributed Stream Processing: A Survey
Cui Xingcan, Yu Xiaohui, Liu Yang, Lü Zhaoyang. Distributed Stream Processing: A Survey[J]. Journal of Computer Research and Development, 2015, 52(2): 318-332. DOI: 10.7544/issn1000-1239.2015.20140268
Authors:Cui Xingcan  Yu Xiaohui  Liu Yang  Lü Zhaoyang
Affiliation:(School of Computer Science and Technology, Shandong University, Jinan 250101)
Abstract:The rapid growth of computing and networking technologies, along with the increasingly richer ways of data acquisition, has brought forth a large array of applications that require real-time processing of massive data with high velocity. As the processing of such data often exceeds the capacity of existing technologies, there has appeared a class of approaches following the distributed stream processing paradigm. In this survey, we first review the application background of distributed stream processing and discuss how the technology has evolved to its current form. We then contrast it with other big data processing technologies to help the readers better understand the characteristics of distributed stream processing. We provide an in-depth discussion of the main issues involved in distributed stream processing, such as data models, system models, storage management, semantic guarantees, load control, and fault tolerance, pointing out the pros and cons of existing solutions. This is followed by a systematic comparison of several popular distributed stream processing platforms including S4, Storm, Spark Streaming, etc. Finally, we present a few typical applications of distributed stream processing and discuss possible directions for future research in this area.
Keywords:big data  data stream  distributed stream processing  real-time processing  distributed system
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号