首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapFile 的HDFS 小文件存储效率问题
引用本文:洪旭升,林世平.基于MapFile 的HDFS 小文件存储效率问题[J].计算机系统应用,2012,21(11):179-182.
作者姓名:洪旭升  林世平
作者单位:福州大学数学与计算机科学学院,福州350108
摘    要:针对HDFS最初是为流式访问大文件而开发的,而对于大量小文件的存储效率不高问题,采用MapFile设计一个HDFS中存储小文件的方案.该方案的主要思想是在上传HDFS时增加一个文件类型判断模块,建立一个小文件队列,将小文件序列化存入一个MapFile容器,合并成大文件,并建立相应的索引文件,有效降低文件数目和提高访问效率.通过和现有的HadoopArchives(HARfiles)文件归档解决小文件问题的方案对比,实验结果表明,基于MapFile的存储小文件方案可以更为有效的提高小文件存储性能和减少HDFS文件系统的节点内存消耗.

关 键 词:HDFS  小文件  MapFile  SequenceFile  云存储
收稿时间:2012/3/28 0:00:00
修稿时间:5/1/2012 12:00:00 AM

Efficiency of Storaging Small Files in HDFS Based on MapFile
HONG Xu-Sheng and LIN Shi-Ping.Efficiency of Storaging Small Files in HDFS Based on MapFile[J].Computer Systems& Applications,2012,21(11):179-182.
Authors:HONG Xu-Sheng and LIN Shi-Ping
Affiliation:(School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China)
Abstract:The Hadoop distributes file system(HDFS) which can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small file. An approach based on MapFile is proposed to improve storage efficiency of small files in HDFS.The main idea is to add a file type judgment module while uploading a file, and create a small file queue, put the small file serialization in a MapFile container.and establishes the index file. Experimental results show that, the storage efficiency of small files is improved contrast to Hadoop Archives(HAR files).
Keywords:HDFS  small file  MapFile  sequence file  cloud storage
本文献已被 维普 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号