首页 | 本学科首页   官方微博 | 高级检索  
     

一种面向HDFS中海量小文件的存取优化方法*
引用本文:孙玉强,王文闻,顾玉宛. 一种面向HDFS中海量小文件的存取优化方法*[J]. 计算机应用研究, 2017, 34(8)
作者姓名:孙玉强  王文闻  顾玉宛
作者单位:常州大学,常州大学,常州大学
基金项目:国家自然科学(11271057);江苏省普通高校研究生科研创新计划项目(SCZ1412800004)。
摘    要:为了解决HDFS(Hadoop Distributed File System)在存储海量小文件时遇到的NameNode内存瓶颈等问题,提高HDFS处理海量小文件的效率,提出一种基于小文件合并与预取的存取优化方案。首先通过分析大量小文件历史访问日志,得到小文件之间的关联关系,然后根据文件相关性将相关联的小文件合并成大文件后再存储到HDFS。从HDFS中读取数据时,根据文件之间的相关性,对接下来用户最有可能访问的文件进行预取,减少了客户端对NameNode节点的访问次数,提高文件命中率和处理速度。实验结果证明,该方法有效提升了Hadoop对小文件的存取效率,降低了NameNode节点的内存占用率。

关 键 词:海量小文件  文件相关性  合并  预取
收稿时间:2016-08-19
修稿时间:2017-04-14

An optimization of massive small files storage and accessing on HDFS
Sun Yu-qiang,Wang Wen-wen and Gu Yu-wan. An optimization of massive small files storage and accessing on HDFS[J]. Application Research of Computers, 2017, 34(8)
Authors:Sun Yu-qiang  Wang Wen-wen  Gu Yu-wan
Affiliation:School of Information Science Engineering,ChangZhou University,,
Abstract:In order to solve the problem of NameNode memory bottleneck when HDFS(Hadoop Distributed File System) store a massive amount of small files, this paper proposed an optimization of massive small files storage and accessing on HDFS to improve the efficiency of HDFS. First,, we can get the relationship between small files by analyzing a large number of history access logs, and then merge these correlative small files into a big file which will be stored on HDFS. When the client reads data from HDFS, the system would prefetch the related files which are most likely to be visited next according to the relevance of small files to reduce the number of request for NameNode, thereby increasing the hit rate and processing speed. The results of experiment show that this method can effectively improve the efficiency of storing and accessing mass small files on HDFS, and cut down the memory utilization of NameNode.
Keywords:massive small files   relationship between files   merge   prefetch
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号