首页 | 本学科首页   官方微博 | 高级检索  
     

基于热度的Hadoop快速副本复制算法
引用本文:张倩,郑烇,王嵩.基于热度的Hadoop快速副本复制算法[J].计算机系统应用,2015,24(9):146-151.
作者姓名:张倩  郑烇  王嵩
作者单位:中国科学技术大学 自动化系, 合肥 230027;中国科学技术大学 自动化系, 合肥 230027;中国科学技术大学 自动化系, 合肥 230027
基金项目:国家自然科学基金(61174062)
摘    要:在云存储中心, 由于节点失效带来的文件数据块副本丢失不仅会影响系统的可靠性, 还会影响文件的并发访问效率. 针对Hadoop中默认的副本复制方法存在的问题, 即副本复制过程某些节点数据传输过于集中, 负载不均衡, 磁盘I/O吞吐率低, 提出一种基于热度的快速副本复制算法. 该算法优先复制热度高的数据块, 合理选择数据块复制的源节点和目的节点. 仿真结果表明, 该算法平衡了系统的工作负载, 提高了磁盘I/O吞吐率, 显著降低用户请求平均响应时间.

关 键 词:云存储  节点失效  Hadoop  副本复制  热点
收稿时间:2014/12/30 0:00:00
修稿时间:2/2/2015 12:00:00 AM

Rapid Replica Copy Algorithm Based on Popularity in Hadoop
ZHANG Qian,ZHENG Quan and WANG Song.Rapid Replica Copy Algorithm Based on Popularity in Hadoop[J].Computer Systems& Applications,2015,24(9):146-151.
Authors:ZHANG Qian  ZHENG Quan and WANG Song
Affiliation:Department of Automation, University of Science and Technology of China, Hefei 230027, China;Department of Automation, University of Science and Technology of China, Hefei 230027, China;Department of Automation, University of Science and Technology of China, Hefei 230027, China
Abstract:In cloud storage centers, replica of file may be lost because of the failure of nodes, which will affect the reliability of system, as well as the efficiency of file concurrent access. There are some deficiencies in the default replica copy algorithm in Hadoop, such as a concentration of data transfer process on a few DataNodes, load imbalance, low disk I/O throughput. To address this issue, this paper proposes a rapid replica copy algorithm based on popularity in Hadoop. It handles the popular block firstly, and chooses source and destination DataNodes properly. The simulation results show that the proposed algorithm improves the disk I/O throughput, load balance, and reduces average service response time significantly.
Keywords:cloud storage  node failure  Hadoop  replica copy  popularity
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号