首页 | 本学科首页   官方微博 | 高级检索  
     


Overcoming data locality: An in-memory runtime file system with symmetrical data distribution
Affiliation:1. Delft University of Technology, The Netherlands;2. University of Zurich, Switzerland;1. Department of DIETI, University of Naples Federico II, Italy;2. Department of Computer Science, University of Salerno, Italy;3. Department of Computer Science, University College London, United Kingdom;1. University of Cambridge, United Kingdom;2. Harvard Business School, United States;3. University of California, Berkeley, United States;4. Facebook, United States
Abstract:In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage.
Keywords:Many-task computing  In-memory file system  Distributed hashing  Scalability
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号