首页 | 本学科首页   官方微博 | 高级检索  
     

相似性-局部性方法相关参数分析
引用本文:张星煜,张建,辛明军.相似性-局部性方法相关参数分析[J].计算机技术与发展,2014(11):47-50.
作者姓名:张星煜  张建  辛明军
作者单位:上海大学 计算机工程与科学学院,上海,200444
基金项目:国家自然科学基金资助项目
摘    要:大数据时代到来,备份数据量增大给存储空间带来新的挑战。重复数据删除技术在备份存储系统中正逐渐流行,但大量数据访问,造成了磁盘的很大负担。针对重复数据删除技术存在的块索引查询磁盘瓶颈问题,文中提出了文件相似性与数据流局部性结合方法改善磁盘I/O性能。该方法充分发挥了各自的优势,相似性优化了索引查找,可以检测到相同数据检测技术不能识别的重复数据;而数据局部性保留了数据流的序列,使得cache的命中率提高,减少磁盘访问次数。布鲁过滤器存储数据块索引可节省大量查询时间和空间开销。对于提出的解决方法所涉及的重要参数如块大小、段大小以及对误判率的影响做了深入分析。通过相关实验评估与性能分析,实验数据与结果为进一步系统性能优化问题提供了重要的数据依据。

关 键 词:重复数据删除技术  相似性与局部性  布鲁过滤器  磁盘瓶颈

Analysis of Related Parameters Based on Similarity-locality Approach
ZHANG Xing-yu,ZHANG Jian,XIN Ming-jun.Analysis of Related Parameters Based on Similarity-locality Approach[J].Computer Technology and Development,2014(11):47-50.
Authors:ZHANG Xing-yu  ZHANG Jian  XIN Ming-jun
Affiliation:(School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China)
Abstract:Big data era comes,and the increase of the backup brings new challenges to deduplication. Data deduplication is becoming in-creasingly popular in storage systems to data backup,but a lot of accesses cause a great burden of disk. For the block index-lookup disk bottleneck,present that combining file similarity with data stream locality is to improve disk I/O performance,and the approach reaches their full advantages. Similarity optimizes index-lookup and detect the duplicate data cannot be recognized by duplicate data detection technology. Locality reserves the sequence of the data stream,and it improves the hit rate of cache and reduces disk access. Bloom filter stores block index to save a lot of time and space overhead. The related parameters of the solution are made deep analysis,such as the block size,the segment size,and their sizes influence to false positive. Through the relevant experiment assessment and performance anal-ysis,the experimental data and results provide an important basis for the further system performance optimization problem.
Keywords:data deduplication technique  similarity-locality  Bloom filter  disk bottleneck
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号