首页 | 本学科首页   官方微博 | 高级检索  
     

多格式海量数据统一存取的索引结构
引用本文:冯亚丽,丁良奎,刘永江,王兴兆.多格式海量数据统一存取的索引结构[J].计算机应用研究,2013,30(6):1664-1667.
作者姓名:冯亚丽  丁良奎  刘永江  王兴兆
作者单位:1. 东北石油大学 计算机与信息技术学院,黑龙江 大庆,163318
2. 中海油研究中心技术研究部 地球物理重点实验室,北京,100027
3. 中国石油管道公司 信息中心,河北 廊坊,065000
基金项目:国家科技重大专项资助项目(2011ZX05023-005-012)
摘    要:为提高多格式海量数据统一存取效率,提出了一种基于Hadoop的分布式数据读取模式。并通过对海量数据非主键索引结构的研究,结合统一存取的描述理念,提出了基于HDFS的一种可适用于B-树和R-树及其变种的层次索引结构,改变了原键—值存储在非主键索引结构中的劣势。通过提出Hadoop缓冲策略、基于随机读取的新数据传输模型以及相应的查询处理策略,进一步降低了数据传输开销。实验表明,该系列方法优化了统一存取中随机存取效率,减少了相应的查询响应时间和数据传输开销,提高了多格式海量数据统一存取的性能。

关 键 词:R-树  索引  海量数据  查询处理

Index structure of unified access in big data of multi-format
FENG Ya-li,DING Liang-kui,LIU Yong-jiang,WANG Xing-zhao.Index structure of unified access in big data of multi-format[J].Application Research of Computers,2013,30(6):1664-1667.
Authors:FENG Ya-li  DING Liang-kui  LIU Yong-jiang  WANG Xing-zhao
Affiliation:1. College of Computer & Information Technology, Northeast Petroleum University, Daqing Heilongjiang 163318, China; 2. Geophysical Laboratory, Dept. of Technology Research, CNOOC Research Center, Beijing 100027, China; 3. Information Center, China Petroleum Pipeline Company, Langfang Hebei 065000, China
Abstract:This paper proposed a distributed data access mode based on Hadoop underlying the unified access design of the big data of multi-format. Through the research on the structure of the big data with non-primary key index, combining with the theory of the unified access, this paper gave a HDFS-based hierarchical index structure, which was applied to both B-tree and R-tree and their variants, thereby changing the disadvantage of the original key-value storage in the structure with non-primary key index. To further reduce the data transfer overhead, it put forward Hadoop buffering strategy, a new data transfer model based on random read and the corresponding query processing strategy. The experiments show that the series above well improve the efficiency of the unified access and reduce the corresponding query response time and data transfer overhead.
Keywords:R-tree  indexes  big data  query processing
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号