针对高速数据流的大规模数据实时处理方法 Real-Time Processing for High Speed Data Stream over Large Scale Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

针对高速数据流的大规模数据实时处理方法

引用本文：	亓开元,赵卓峰,房俊,马强.针对高速数据流的大规模数据实时处理方法[J].计算机学报,2012,35(3):477-490.

作者姓名：	亓开元赵卓峰房俊马强

作者单位：	1. 中国科学院计算技术研究所北京 100190;中国科学院研究生院北京 100190 2. 中国科学院计算技术研究所北京 100190;北方工业大学信息工程学院北京 100144

摘要：	以实时传感数据和历史感知数据为基础的各类计算需求逐渐成为当前物联网应用建设中的关键,如何实现基于高速数据流和大规模历史数据的实时计算成为数据处理领域的新挑战.现有批处理方式的MapReduce大规模数据处理技术难以满足此类计算的实时要求.文中结合城市车辆数据的实时采集与处理应用,在理论和实践分析的基础上,提出了一种针对高速数据流的大规模数据实时处理方法,并对方法中的本地阶段化流水线、中间结果缓存等关键技术瓶颈进行了改进.其中,根据系统参数控制阶段化流水线,使CPU得到了充分、有效利用;通过改造内外存数据结构、读写策略和替换算法,优化了本地中间结果的高并发读写性能.实验表明,上述方法可以显著提升大规模历史数据上数据流处理的实时性和可伸缩性.
关键词：	数据流处理大规模数据处理 MapReduce 物联网大数据云计算
Real-Time Processing for High Speed Data Stream over Large Scale Data

QI Kai-Yuan , ZHAO Zhuo-Feng , FANG Jun , MA Qiang.Real-Time Processing for High Speed Data Stream over Large Scale Data[J].Chinese Journal of Computers,2012,35(3):477-490.

Authors:	QI Kai-Yuan ZHAO Zhuo-Feng FANG Jun MA Qiang

Affiliation:	1),2) 1)(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190) 2)(Graduate University of Chinese Academy of Sciences,Beijing 100190) 3)(College of Information Engineering,North China University of Technology,Beijing 100144)

Abstract:	With the development of Internet of Things,the computing based on real-time and historical sensor data becomes the key point to the IoT applications,and how to support the real-time processing for high speed data stream over large scale data brings a new challenge.However,the existing large scale data processing technology based on the MapReduce model is designed for batch processing and cannot satisfy the real-time requirement.Based on the theory and practice analysis,this paper proposes a method for large scale data processing under high speed data stream,and improves the technical bottlenecks such as local staged pipeline and intermediate result storage.We tune the configuration of staged pipeline dynamically using system information to efficiently utilize CPU,and design the data structure,read/write operation strategy and replacement algorithm to optimize the high concurrency access performance of local intermediate results.The experiment shows that this method can improve real-time performance and scalability of data stream processing over large scale history data.

Keywords:	data stream processing large scale data processing MapReduce Internet of Things big data cloud computing
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏