首页 | 本学科首页   官方微博 | 高级检索  
     

基于HDFS的小文件存储与读取优化策略
引用本文:张海,马建红.基于HDFS的小文件存储与读取优化策略[J].计算机系统应用,2014,23(5):167-171.
作者姓名:张海  马建红
作者单位:河北工业大学 计算机科学与软件学院, 天津 300401;河北工业大学 计算机科学与软件学院, 天津 300401
摘    要:本文对HDFS分布式文件系统进行了深入的研究,在HDFS中以流式的方式访问大文件时效率很高但是对海量小文件的存取效率比较低. 本文针对这个问题提出了一个基于关系数据库的小文件合并策略,首先为每个用户建立一个用户文件,其次当用户上传小文件时把文件的元数据信息存入到关系数据库中并将文件追加写入到用户文件中,最后用户读取小文件时通过元数据信息直接以流式方式进行读取. 此外当用户读取小于一个文件块大小的文件时还采取了数据节点负载均衡策略,直接由存储数据的DataNode传送给客户端从而减轻主服务器压力提高文件传送效率. 实验结果表明通过此方案很好地解决了HDFS对大量小文件存取支持不足的缺点,提高了HDFS文件系统对海量小文件的读写性能,此方案适用于具有海量小文件的云存储系统,可以降低NameNode内存消耗提高文件读写效率.

关 键 词:HDFS  小文件优化  文件合并  负载均衡  云存储
收稿时间:2013/10/4 0:00:00
修稿时间:2013/10/29 0:00:00

Optimizational Strategy of Small Files Stored and Readed on HDFS
ZHANG Hai and MA Jian-Hong.Optimizational Strategy of Small Files Stored and Readed on HDFS[J].Computer Systems& Applications,2014,23(5):167-171.
Authors:ZHANG Hai and MA Jian-Hong
Affiliation:Computer Science and Software Engineering, Hebei University of Technology, Tianjin 300401, China;Computer Science and Software Engineering, Hebei University of Technology, Tianjin 300401, China
Abstract:In this paper, the HDFS distributed file system is conducted in-depth research. In HDFS the way of streaming to read and write large files is very efficient, but the efficiency on reading and writing of the mass of small files is relatively low. According to this problem this paper presents a small files based on relational database consolidation strategy. Firstly creating a user's file for each user, then uploading file's metadata information to relational database and the file is written to the user's file when user uploads small files. Finally user via streaming mode to read small files according to the metadata information. When user reads file which size is smaller than the file block, datanode takes load balancing strategy, the datanode of storing data transfers data directly so as to reduce the pressure of the main server and improve the efficiency of file's transfer. The experimental results show that this scheme solves the shortcoming of HDFS reading and writing small files, improves the HDFS file system of reading and writing performance on massive small files. This scheme can apply to massive small files on cloud storage system, and reduce memory consumption of NameNode to improve the efficiency of file's reading and writing.
Keywords:HDFS  optimization of small files  merge files  load balance  cloud storage
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号