首页 | 本学科首页   官方微博 | 高级检索  
     

海量结构化数据存储检索系统
引用本文:吴广君,王树鹏,陈明,李超.海量结构化数据存储检索系统[J].计算机研究与发展,2012(Z1):1-5.
作者姓名:吴广君  王树鹏  陈明  李超
作者单位:中国科学院计算技术研究所;北京邮电大学;国家计算机网络应急技术处理协调中心
基金项目:国家自然科学基金项目(61003260);国家“八六三”高技术研究发展计划基金项目(2009AA01A403,2007AA010501,2007AA01Z467,2007AA01Z474)
摘    要:Big Data是近年在云计算领域中出现的一种新型数据,传统关系型数据库系统在数据存储规模、检索效率等方面不再适用.目前的分布式No-SQL数据库可以提供分布式数据存储环境,但是无法支持多列查询.设计并实现分布式海量结构化数据存储检索系统(MDSS).系统采用列存储结构,采用集中分布式B+Tree索引和局部索引相结合的方法提高检索效率.在此基础上讨论复杂查询条件的任务分解机制,支持大数据的多属性检索、模糊检索以及统计分析等查询功能.实验结果表明,提出的分布式结构化数据管理技术和查询任务分解机制可以显著提高分布式条件下大数据集的查询效率,适合应用在日志类数据、流记录数据等海量结构化数据的存储应用场合.

关 键 词:大数据  Hadoop  数据检索  No-SQL数据库  海量数据存储

Massive Structured Data Oriented Storage and Retrieve System
Wu Guangjun,Wang Shupeng,Chen Ming,and Li Chao.Massive Structured Data Oriented Storage and Retrieve System[J].Journal of Computer Research and Development,2012(Z1):1-5.
Authors:Wu Guangjun  Wang Shupeng  Chen Ming  and Li Chao
Affiliation:1(Institute of Computing Technology Chinese Academy of Sciences, Beijing 100190) 2(Beijing University of Posts and Telecommunications, Beijing 100876) 3(National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029)
Abstract:Big Data has emerged as a new type of data in the cloud computing. The traditional RDBMS is no longer fit to manage Big Data in face of the large storage size and high query efficiency. Currently, the No-SQL(not only SQL) DB can provide distributed storage environment, but it cannot support multi-columns queries. We design and implement distributed Massive Data Storage System (MDSS) for structured data storage. MDSS use global distributed B+ tree and local indexing structure to manage data source with column-based storage structure. The query planning mechanism was built for multi-attributes query, fuzzy query and data statistics query based on MDSS. The experiment results exposed that the techniques for distributed structured data and query planning methods can improve Big Data query efficiency significantly. MDSS is suitable to manage massive structured data, such as log-structured data, streaming data etc.
Keywords:big data  Hadoop  data query  No-SQL DB  massive storage
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号