面向海量NetFlow数据的存储和查询处理方法研究 Research on storage and query processing for massive NetFlow data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向海量NetFlow数据的存储和查询处理方法研究

引用本文：	陈重韬,王伟平,孟丹,崔甲,胡斌.面向海量NetFlow数据的存储和查询处理方法研究[J].高技术通讯,2016(6):534-541.

作者姓名：	陈重韬王伟平孟丹崔甲胡斌

作者单位：	1. 中国科学院计算技术研究所计算机应用研究中心北京100190;中国科学院大学北京100049;中国科学院信息工程研究所北京100093;2. 中国科学院信息工程研究所北京100093;3. 中国信息安全测评中心北京100085;4. 中国科学院计算技术研究所计算机应用研究中心北京100190;中国科学院大学北京100049

基金项目：	国家科技支撑计划(2012BAH46B03)，国家自然科学基金(61402473)，核高基(2013ZX01039-002-001-001)，中国科学院先导专项(XDA06030200)

摘要：	针对全国骨干网高速海量Net Flow数据到来速度快、数据量大以及对所存数据进行频繁多维查询操作的特点,提出了一种多维属性聚簇存储(MACS)模型。该模型根据实际应用环境中查询的特点对数据进行空间分片,以并行加流水的方式对数据进行存储。此外,为Net Flow提出了一种超多面体的查询模式。真实环境实验结果表明,运用MACS模型实现的系统单点数据实时存储速度达到270万条/s,远远快于其他的数据分析系统,并且多维属性查询的速度优于Hive和Impala。
关键词：	NetFlow 多维属性聚簇存储( MACS)模型实时数据存储超多面体
Research on storage and query processing for massive NetFlow data

Abstract:	Considering that China backbone network' s NetFlow data has the features of high arrival rate, large amount and need of frequent multidimensional query operation, the study proposed a multidimensional attributes clustering storage ( MACS) model.According to the properties of real applicable queries, the proposed MACS model conducts space partition on NetFlow data, and stores the data in the way of parallel pipelining.Moreover, a hyper-polyhed-ron query mode for NetFlow data was presented.The experiments performed in real application environments show that the real time data storing rate of a single system realized with the model can achieve the storing rate up to 2.7 million records per second, which is more faster than all the other systems.Especially, the speed of the proposed multidimensional query is faster than Hive and Impala.

Keywords:	NetFlow multidimensional attributes clustering storage ( MACS) model real time data storage super polyhedron
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏