首页 | 本学科首页   官方微博 | 高级检索  
     


Improving Data Availability for Better Access Performance: A Study on Caching Scientific Data on Distributed Desktop Workstations
Authors:Xiaosong Ma  Sudharshan S. Vazhkudai  Zhe Zhang
Affiliation:1. North Carolina State University, Raleigh, NC, USA
2. Oak Ridge National Laboratory, Oak Ridge, TN, USA
Abstract:Client-side data caching serves as an excellent mechanism to store and analyze the rapidly growing scientific data, motivating distributed, client-side caches built from unreliable desktop storage contributions to store and access large scientific data. They offer several desirable properties, such as performance impedance matching, improved space utilization, and high parallel I/O bandwidth. In this context, we are faced with two key challenges: (1) the finite amount of contributed cache space is stretched by the ever increasing scientific dataset sizes and (2) the transient nature of volunteered storage nodes impacts data availability. In this article, we address these challenges by exploiting the existence of external, primary copies of datasets. We propose a novel combination of prefix caching, collective download, and remote partial data recovery (RPDR), to deal with optimal cache space consumption and storage node volatility. Our evaluation, performed on our FreeLoader prototype, indicates that prefix caching can significantly improve the cache hit rate and partial data recovery is better than (or comparable to) many persistent-data availability techniques.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号