首页 | 本学科首页   官方微博 | 高级检索  
     

新型存储设备上重复数据删除指纹查找优化
引用本文:何柯文, 张佳辰, 刘晓光, 王刚. 新型存储设备上重复数据删除指纹查找优化[J]. 计算机研究与发展, 2020, 57(2): 269-280. DOI: 10.7544/issn1000-1239.2020.20190543
作者姓名:何柯文  张佳辰  刘晓光  王刚
作者单位:1.(南开大学计算机学院 天津 300350) (天津市网络与数据安全技术重点实验室(南开大学) 天津 300350) (hekw@nbjl.nankai.edu.cn)
基金项目:天津市自然科学基金;国家自然科学基金;中央高校基本科研业务费专项;国家科技重大专项
摘    要:
指纹查找部分是I/O密集型工作负载,即外存存储设备的性能是指纹查找的性能瓶颈.因此关注重复数据删除系统的指纹查找部分,对比了传统的勤奋指纹查找算法和致力于减少磁盘访问次数的懒惰指纹查找算法,分析了2种方法在傲腾固态硬盘(Optane solid state drive, Optane SSD)和持久性内存(persistent memory, PM)两种新型存储设备上的性能表现,并给出了优化建议.对勤奋指纹查找算法和懒惰指纹查找算法的时间进行建模,分析得出了指纹查找算法在新型存储设备下的3点优化结论:1)应减少统一查找的指纹数;2)在较快设备上应减少懒惰指纹查找中局部性环的大小,并且局部性环大小存在一个最优值;3)在快速设备上,勤奋指纹查找的效果要优于懒惰指纹查找.最终,在实际机械硬盘(hard disk drive, HDD)、Optane SSD和PM模拟器上实验验证了模型的正确性.实验结果显示,快速设备上指纹查找的时间相较于HDD减少90%以上,并且采用勤奋算法要优于懒惰算法,局部性环最优值前移的现象,也与模型理论优化结果吻合.

关 键 词:重复数据删除  持久性内存  指纹索引  新型存储设备  数据空间局部性

Fingerprint Search Optimization for Deduplication on Emerging Storage Devices
He Kewen, Zhang Jiachen, Liu Xiaoguang, Wang Gang. Fingerprint Search Optimization for Deduplication on Emerging Storage Devices[J]. Journal of Computer Research and Development, 2020, 57(2): 269-280. DOI: 10.7544/issn1000-1239.2020.20190543
Authors:He Kewen  Zhang Jiachen  Liu Xiaoguang  Wang Gang
Affiliation:1.(College of Computer Science, Nankai University, Tianjin 300350) (Tianjin Key Laboratory of Network and Data Security Technology (Nankai University), Tianjin 300350)
Abstract:
Fingerprint search part is I/O intensive, and the performance of the external storage device is the bottleneck of fingerprint search. Therefore, this paper focuses on the fingerprint search of data deduplication system. This paper compares the traditional eager deduplication algorithm with lazy deduplication algorithms that reduce the number of disk accesses, and studies deduplication algorithm on the emerging storage devices: Optane SSD and persistent memory, and gives optimization suggestions. In this paper, we model the fingerprint search delay of the eager deduplication algorithm and the lazy deduplication algorithm, and three conclusions under the new storage device are obtained through the modeling results: 1) The number of fingerprints for batched search should be reduced; 2) The local ring size should be reduced on faster devices, and the local loop size has an optimal value; 3) On fast devices, the eager fingerprint lookup is better than the lazy fingerprint lookup. Finally, the experimental results verify the correctness of our model on the actual HDD, Optane SSD and emulated persistent memory. The eager algorithm is better than the lazy algorithm on the emerging storage devices, and the locality ring optimal value is advanced, which basically conforms to the conclusions of the proposed model.
Keywords:deduplication  persistent memory  fingerprint index  emerging storage device  data spatial locality
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号