首页 | 本学科首页   官方微博 | 高级检索  
     

分布式存储系统中的预测式纠删码研究
引用本文:张航,唐聃,蔡红亮.分布式存储系统中的预测式纠删码研究[J].计算机科学,2021,48(5):130-139.
作者姓名:张航  唐聃  蔡红亮
作者单位:成都信息工程大学软件工程学院 成都 610225
基金项目:四川省科技计划项目(20ZDYF1156);人工智能重大专项(2018GZDZX0030);四川省科技成果转移转化示范项目(2018CC0093)。
摘    要:纠删码消耗的存储空间较少,获得的数据可靠性较高,因此被分布式存储系统广泛采用。但纠删码在修复数据时较高的修复成本限制了其应用。为了降低纠删码的修复成本,研究人员在分组码和再生码上进行了大量的研究。由于分组码和再生码属于被动容错方式,对于一些容易出现失效的节点,采用主动容错的方式能更好地降低修复成本,维护系统的可靠性,因此,提出了一种主动容错的预测式纠删(Proactive basic-Pyramid,PPyramid)码。PPyramid码利用硬盘故障预测方法来调整basic-Pyramid码中冗余块和数据块之间的关联,将预测出的即将出现故障的硬盘划分到同一小组,使得在修复数据时,所有的读取操作在小组内进行,从而减少读取数据块的个数,节省修复成本。在基于Ceph搭建的分布式存储系统中,在修复多个硬盘故障时,将PPyramid码与其他常用的纠删码进行对比。实验结果表明,相比basic-Pyramid码,PPyramid码能降低6.3%~34.9%的修复成本和减少7.6%~63.6%的修复时间,相比LRC码、pLRC码、SHEC码、DLRC码,能降低8.6%~52%的修复成本和减少10.8%~52.4%的修复时间。同时,PPyramid码构造灵活,具有很强的实际应用价值。

关 键 词:分布式存储系统  硬盘故障  数据修复  纠删码  故障预测

Study on Predictive Erasure Codes in Distributed Storage System
ZHANG Hang,TANG Dan,CAI Hong-liang.Study on Predictive Erasure Codes in Distributed Storage System[J].Computer Science,2021,48(5):130-139.
Authors:ZHANG Hang  TANG Dan  CAI Hong-liang
Affiliation:(School of Software Engineering,Chengdu University of Information Technology,Chengdu 610225,China)
Abstract:Erasure coding consumes less storage space and obtains a higher data reliability,thus being widely used by distributed storage systems.However,when erasure codes are used to repair data,their high repair costs limit their application.In order to reduce the repair cost of erasure codes,researchers have researched a lot on block codes and regenerative codes.But block codes and regeneration codes are passive fault tolerance.For some nodes that are prone to failure,using active fault tolerance can better reduce repair costs and maintain the system reliability.Therefore,this paper proposes a proactive basic-Pyramid(PPyramid)code.The PPyramid code uses the hard disk failure prediction method to adjust the association between redundant and data blocks in the Pyramid code,divides hard disks that are predicted to fail into the same group,thus making all read operations to be performed within the team when recovering data,thereby reducing the number of read data blocks and saving repair costs.In a distributed storage system based on Ceph,it is compared with other commonly used erasure codes,when repairing multiple hard drives.Experimental results show that,PPyramid codes can reduce repair costs by 6.3%~34.9%and decrease repair time by 7.6%~63.6%compared with basic-Pyramid.Compared with LRC code,pLRC code,SHEC code and DLRC code,it can reduce repair costs by 8.6%~52%and decrease repair time by 10.8%~52.4%.Meanwhile,PPyramid codes are flexible in construction and have strong practical application value.
Keywords:Distributed storage system  Hard disk failure  Data repair  Erasure codes  Failure prediction
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号