首页 | 本学科首页   官方微博 | 高级检索  
     

SQL-DFS:一种基于HDFS的海量小文件存储系统
引用本文:马志强,杨双涛,闫瑞,张泽广.SQL-DFS:一种基于HDFS的海量小文件存储系统[J].北京工业大学学报,2016,42(1):134-141.
作者姓名:马志强  杨双涛  闫瑞  张泽广
作者单位:1.内蒙古工业大学信息工程学院, 呼和浩特 010080
基金项目:国家自然科学基金资助项目,内蒙古自治区自然科学基金资助项目,内蒙古自治区高等学校科学研究项目
摘    要:针对Hadoop分布式文件系统(Hadoop distributed file system,HDFS)进行小文件存储时Name Node内存占用率高的问题,通过分析HDFS基础架构,提出了基于元数据存储集群的SQL-DFS文件系统.通过在Name Node中加入小文件处理模块实现了小文件元数据由Name Node内存到元数据存储集群的迁移,借助关系数据库集群实现了小文件元数据的快速读写,并对小文件读取过程进行优化,减少了文件客户端对Name Node的请求次数;通过将部分Data Node文件块的校验工作交由元数据存储集群完成,进一步降低了Name Node节点的负载压力.最终通过搭建HDFS和SQL-DFS实验平台,对HDFS和SQL-DFS 2种架构进行了小文件读写的对比测试,实验结果表明:SQLDFS在文件平均耗时(file average cost,FAC)和内存占用率方面均明显优于原HDFS架构,具有更好的小文件存储能力,可用于海量小文件的存储.

关 键 词:Hadoop分布式文件系统(HDFS)  元数据存储集群  小文件  元数据  内存占用率
收稿时间:2015-06-12

SQL-DFS: A Massive Small File Storage System Based on HDFS
MA Zhiqiang,YANG Shuangtao,YAN Rui,ZHANG Zeguang.SQL-DFS: A Massive Small File Storage System Based on HDFS[J].Journal of Beijing Polytechnic University,2016,42(1):134-141.
Authors:MA Zhiqiang  YANG Shuangtao  YAN Rui  ZHANG Zeguang
Affiliation:1.School of Information Engineering,Inner Mongolia University of Technology, Hohhot 010080, China
Abstract:In order to solve the problem of high occupancy rate of NameNode memory while using Hadoop distributed file system ( HDFS ) to store massive small files, this paper analyzed the HDFS storage structure and presented a SQL-DFS file system based on metadata storage cluster. In SQL-DFS, in order to move small file metadata from NameNode memory to metadata storage cluster a small file processing module was added in NameNode. In order to improve the reading and writing speed of the metadata, relational database cluster was used, and in order to reduce the time of request for NameNode the reading process of the small file was optimized. To further reduce the load pressure of NameNode, the checking of file block from DataNode was completed by metadata storage cluster. Finally the contrast experiments were carried out between HDFS and SQL-DFS experimental platform. The experimental results show that SQL-DFS in the file average cost ( FAC) and memory occupancy rate are significantly better than that of the original HDFS architecture and has better small file storage capacity. It can be used for the storage of massive small files.
Keywords:Hadoop distributed file system (HDFS)  metadata storage clusters  small files  metadata
本文献已被 万方数据 等数据库收录!
点击此处可从《北京工业大学学报》浏览原始摘要信息
点击此处可从《北京工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号