首页 | 本学科首页   官方微博 | 高级检索  
     

面向Hadoop分布式文件系统的小文件存取优化方法
引用本文:李铁,燕彩蓉,黄永锋,宋亚龙.面向Hadoop分布式文件系统的小文件存取优化方法[J].计算机应用,2014,34(11):3091-3095.
作者姓名:李铁  燕彩蓉  黄永锋  宋亚龙
作者单位:东华大学 计算机科学与技术学院,上海 201620
基金项目:国家自然科学基金资助项目,中央高校基本科研业务费专项资金资助项目,上海市自然科学基金资助项目
摘    要:为提高Hadoop分布式文件系统(HDFS)的小文件处理效率,提出了一种面向HDFS的智能小文件存取优化方法--SmartFS。SmartFS通过分析小文件访问日志,获取用户访问行为,建立文件关联概率模型,并根据基于文件关联关系的合并算法将小文件组装成大文件之后存至HDFS;当从HDFS获取文件时,根据基于文件关联关系的预取算法来提高文件访问效率,并提出基于预取的缓存替换算法来管理缓存空间,从而提高文件的命中率。实验结果表明,SmartFS有效减少了HDFS中NameNode的元数据空间,减少了用户与HDFS的交互次数,提高了小文件的存储效率和访问速度。

关 键 词:Hadoop分布式文件系统  小文件  文件关联  预取  缓存
收稿时间:2014-07-18
修稿时间:2014-07-30

Optimization of small files storage and accessing on Hadoop distributed file system
LI Tie , YAN Cairong , HUANG Yongfeng , SONG Yalong.Optimization of small files storage and accessing on Hadoop distributed file system[J].journal of Computer Applications,2014,34(11):3091-3095.
Authors:LI Tie  YAN Cairong  HUANG Yongfeng  SONG Yalong
Affiliation:School of Computer Science and Technology, Donghua University, Shanghai 201620, China
Abstract:In order to improve the efficiency of processing small files in Hadoop Distributed File System (HDFS), a new efficient approach named SmartFS was proposed. By analyzing the file accessing log to obtain the accessing behavior of users, SmartFS established a probability model of file associations. This model was the reference of merging algorithm to merge the relevant small files into large files which would be stored on HDFS. When a file was accessed, SmartFS prefetched the related files according to the prefetching algorithm to accelerate the access speed. To guarantee the enough cache space, a cache replacement algorithm was put forward. The experimental results show that SmartFS can save the metadata space of NameNode in HDFS, reduce the interaction between users and HDFS, and improve the storing and accessing efficiency of small files on HDFS.
Keywords:Hadoop Distributed File System (HDFS)  small file  file relation  prefetching  caching
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号