首页 | 本学科首页   官方微博 | 高级检索  
     

基于文件相似性分簇的重复数据消除模型
引用本文:王灿,秦志光,王娟,蔡博.基于文件相似性分簇的重复数据消除模型[J].计算机应用研究,2012,29(5):1684-1689.
作者姓名:王灿  秦志光  王娟  蔡博
作者单位:1. 电子科技大学计算机科学与工程学院,成都611731;网络与数据安全四川省重点实验室,成都611731
2. 成都信息工程学院网络工程学院,成都,610225
基金项目:教育部培育基金资助项目(708078);国家自然科学基金资助项目(60873075, 60973118)
摘    要:为解决现有提高重复数据消除系统吞吐量方法的局部性依赖和多节点依赖问题,提出了一种基于文件相似性分簇的重复数据消除模型。该模型将传统平面型索引结构拓展为空间结构,并依据Broder定理仅选择少量最具代表性的索引驻留在内存中;同时对索引进行横向分片并分布到完全自治的多个节点。实验结果表明,该方法能有效提高大规模云存储环境下重复数据消除性能和平均吞吐量,且各节点数据负载量均衡,故该模型可扩展性强。

关 键 词:云存储  重复数据消除  吞吐量  文件相似性分簇  负载均衡

Deduplication model based on file-similarity clustering
WANG Can,QIN Zhi-guang,WANG Juan,CAI Bo.Deduplication model based on file-similarity clustering[J].Application Research of Computers,2012,29(5):1684-1689.
Authors:WANG Can  QIN Zhi-guang  WANG Juan  CAI Bo
Affiliation:1. School of Computer Science & Engineering, University of Electronic Science & Technology of China, Chengdu 611731, China; 2. Network & Data Security Key Laboratory of Sichuan Province, Chengdu 611731, China; 3. School of Network Engineering, Chengdu University of Information Technology, Chengdu 610225, China
Abstract:To resolve the locality dependence and multiple-nodes dependence problems of the current throughput improving methods for deduplication system, this paper proposed a deduplication model based on file-similarity clustering. This model expanded the traditional flat index structure into spatial structure. According to the Broder's theorem, it kept only a handful of the most representative indices in RAM. It partitioned the index horizontally and distributed on several totally autonomous storage nodes. The experimental results indicate that the model can effectively improve the deduplication performance and the throughput on average in the large scale cloud-storage environment, and the data loads are balanced. Therefore, the model can be extended smoothly.
Keywords:cloud-storage  deduplication  throughput  file-similarity clustering  load balancing
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号