首页 | 本学科首页   官方微博 | 高级检索  
     

ArchDB:一个高可靠高性能海量归档流数据库
引用本文:杜凯,付伟,王怀民,杨树强. ArchDB:一个高可靠高性能海量归档流数据库[J]. 计算机研究与发展, 2009, 46(Z2)
作者姓名:杜凯  付伟  王怀民  杨树强
作者单位:国防科学技术大学计算机学院,长沙,410073
基金项目:国家"九七三"重点基础研究发展计划基金项目,国家杰出青年基金项目,教育部新世纪优秀人才支持计划基金项目 
摘    要:当前,在科学实验、网站安全、内网审计等诸多领域,监视在线事务或跟踪用户行为会产生大规模归档流数据.这些归档系统规模可达PB级(10~(15)B).在如此规模下存储和分析这些结构化数据至少带来3个挑战;1)数据可靠性问题;2)高效存储和分析高速持续的流数据问题;3)高性能和高可靠目标之间的冲突问题.在分析归档流数据特征的基础上,提出了一种新的高可靠数据库体系结构ArchDB.ArchDB由两部分组成:其一负责加载和查询较小规模的当前数据;其二负责存储和查询大规模的历史归档数据.通过优化设计ArchDB中的数据分布策略、数据块尺寸和归档时机、数据存储和归档流水化机制来高效可靠管理大规模数据.实验结果表明ArchDB既能加倍数据加栽性能,又能加速恢复过程,其加速效果取决于恢复并发度.

关 键 词:数据可靠性  归档流数据  大规模数据库

ArchDB: A High Reliable and High Performance Large-Scale Archived Stream Database
Du Kai,Fu Wei,Wang Huaimin,Yang Shuqiang. ArchDB: A High Reliable and High Performance Large-Scale Archived Stream Database[J]. Journal of Computer Research and Development, 2009, 46(Z2)
Authors:Du Kai  Fu Wei  Wang Huaimin  Yang Shuqiang
Abstract:Monitoring online transactions or tracking users'behaviors will generate large-scale archived streaming data in some domains,such as scientific experiments,Web site access logs,innernetwork audit logs and so on.These archived systems may scale up to petabytes(10~(15)B).Storing and analyzing the structural data in such scale calls forth at least three notable challenging issues.The first is data reliability.The second is to efficiently store and analyze high-rate streaming data that is continuously online generated.The third is how to tradeoff between high reliability and high performance in one approach because in many cases these two objectives conflict.A novel high reliable log-free database architecture,ArchDB,is proposed.ArchDB consists of two key components:one is for loading and querying the small-scale current data,and the other is responsible for storing and querying the large-scale historical archived data.In order to meet the three challenges,data placement policy,data block size and data archiving occasion,pipelining and parallelizing archiving procedure are all optimized.The experimental results show ArchDB can double the insertion performance and speed up the recovery process by a factor of the parallel recovery degree.
Keywords:data reliability  archived stream  large-scale database
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号