首页 | 本学科首页   官方微博 | 高级检索  
     

基于GE码的HDFS优化方案
引用本文:朱媛媛 王晓京. 基于GE码的HDFS优化方案[J]. 计算机应用, 2013, 33(3): 730-733. DOI: 10.3724/SP.J.1087.2013.00730
作者姓名:朱媛媛 王晓京
作者单位:中国科学院 成都计算机应用研究所,成都 610041
基金项目:国家863计划项目(2008AAO1Z402)。
摘    要:针对Hadoop分布式文件系统(HDFS)数据容灾效率和小文件问题,提出了基于纠删码的解决方案。该方案引用了新型纠删码(GE码)的编码和译码模块,对HDFS中的文件进行编码分片,生成很多个Slice并随机均匀的分配保存到集群中,代替原来HDFS系统的多副本容灾策略。该方法中引入了Slice的新概念,将Slice进行分类合保存在block中并然后通过对Slice建立二级索引来解决小文件问题; 该研究方法中抛弃了三备份机制,而是在集群出现节点失效的情况下,通过收集与失效文件相关的任意70%左右的Slice进行原始数据的恢复。通过相关的集群实验结果表明,该方法在容灾效率、小文件问题、存储成本以及安全性上对HDFS作了很大的优化。

关 键 词:Hadoop分布式文件系统  纠删码  数据容灾  两级索引  
收稿时间:2012-09-17
修稿时间:2012-10-26

HDFS optimization program based on GE coding
ZHU Yuanyuan WANG Xiaojing. HDFS optimization program based on GE coding[J]. Journal of Computer Applications, 2013, 33(3): 730-733. DOI: 10.3724/SP.J.1087.2013.00730
Authors:ZHU Yuanyuan WANG Xiaojing
Affiliation:Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China
Abstract:Concerning Hadoop Distributed File System (HDFS) data disaster recovery efficiency and small files, this paper presented an improved solution based on coding and the solution introduced a coding module of erasure GE to HDFS. Different from the multiple-replication strategy adopted by the original system, the module encoded files of HDFS into a great number of slices, and saved them dispersedly into the clusters of the storage system in distributed fashion. The research methods introduced the new concept of the slice, slice was classified and merged to save in the block and the secondary index of slice was established to solve the small files issue. In the case of cluster failure, the original data would be recovered via decoding by collecting any 70% of the slice, the method also introduced the dynamic replication strategies, through dynamically creating and deleting replications to keep the whole cluster in a good load-balancing status and settle the hotspot issues. The experiments on analogous clusters of storage system show the feasibility and advantages of new measures in proposed solution.
Keywords:Hadoop Distributed File System (HDFS)   erasure code   data disaster recovery   secondary index
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号