首页 | 本学科首页   官方微博 | 高级检索  
     

基于新型存储器件的分布式文件系统性能优化
引用本文:董聪,张晓,程文迪,石佳.基于新型存储器件的分布式文件系统性能优化[J].计算机应用,2020,40(12):3594-3603.
作者姓名:董聪  张晓  程文迪  石佳
作者单位:1. 西北工业大学 软件学院, 西安 710129;2. 大数据存储与管理工业和信息化部重点实验室(西北工业大学), 西安 710129;3. 西北工业大学 计算机学院, 西安 710129
基金项目:国家重点研发计划;北京市自然科学基金
摘    要:新型存储器件的I/O性能通常比传统固态驱动器(SSD)高一个数量级,然而使用新型存储器件的分布式文件系统相对于使用SSD的分布式文件系统性能并没有显著的提高,这说明目前的分布式文件系统并不能充分发挥新型存储器件的性能。针对这个问题,对Hadoop分布式文件系统(HDFS)的数据写入流程及传输过程进行了量化分析。通过量化分析HDFS数据写入过程各阶段的时间开销,发现在写入数据的各个阶段中,节点间数据传输的时间占比较大。因此提出了对应的优化方案,通过异步写入的方式并行化数据传输与处理过程,使得不同数据包的处理阶段叠加起来,减少了数据包整体的处理时间,从而提升了HDFS的写入性能。实验结果表明,所提方案将HDFS的写入吞吐量提升了15%~24%,总体的写入执行时间降低了28%~36%。

关 键 词:分布式文件系统  Hadoop分布式文件系统  非易失性存储器  性能优化  异步写入  
收稿时间:2020-05-13
修稿时间:2020-06-19

Performance optimization of distributed file system based on new type storage devices
DONG Cong,ZHANG Xiao,CHENG Wendi,SHI Jia.Performance optimization of distributed file system based on new type storage devices[J].journal of Computer Applications,2020,40(12):3594-3603.
Authors:DONG Cong  ZHANG Xiao  CHENG Wendi  SHI Jia
Affiliation:1. School of Software, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China;2. Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology(Northwestern Polytechnical University), Xi'an Shaanxi 710129, China;3. College of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
Abstract:The I/O performance of new type storage devices is usually an order of magnitude higher than that of traditional Solid State Disk (SSD). However, simply replacing SSD with new type storage device will not significantly improve the performance of distributed file system. This means that the current distributed file system cannot give full play to the performance of new type storage devices. To solve the problem, the data writing process and transmission process of Hadoop Distributed File System (HDFS) were analyzed quantitatively. Through quantitative analysis of the time consumptions of different stages of HDFS writing process, the most time-consuming data transmission between nodes was found in each stage of writing data. Therefore, the corresponding optimization strategy was proposed, that is, the processes of data transmission and processing were parallelized by using asynchronous write. So that the processing stages of different data packets were parallel to each other, shortening the total processing time of data writing, thereby the write performance of HDFS was improved. Experimental results show the proposed scheme improves the HDFS write throughput by 15%-24%, and reduces the overall write execution time by 28%-36%.
Keywords:distributed file system  Hadoop Distributed File System (HDFS)  non-volatile memory  performance optimization  asynchronous write  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号