首页 | 本学科首页   官方微博 | 高级检索  
     


Generating request streams on Big Data using clustered renewal processes
Authors:Cristina L. Abad  Mindi Yuan  Chris X. Cai  Yi Lu  Nathan Roberts  Roy H. Campbell
Affiliation:1. University of Illinois at Urbana-Champaign, United States;2. Yahoo! Inc., United States;3. Escuela Superior Politécnica del Litoral, Ecuador
Abstract:The performance evaluation of large file systems, such as storage and media streaming, motivates scalable generation of representative traces. We focus on two key characteristics of traces, popularity and temporal locality. The common practice of using a system-wide distribution obscures per-object behavior, which is important for system evaluation. We propose a model based on delayed renewal processes which, by sampling interarrival times for each object, accurately reproduces popularity and temporal locality for the trace. A lightweight version reduces the dimension of the model with statistical clustering. It is workload-agnostic and object type-aware, suitable for testing emerging workloads and ‘what-if’ scenarios. We implemented a synthetic trace generator and validated it using: (1) a Big Data storage (HDFS) workload from Yahoo!, (2) a trace from a feature animation company, and (3) a streaming media workload. Two case studies in caching and replicated distributed storage systems show that our traces produce application-level results similar to the real workload. The trace generator is fast and readily scales to a system of 4.3 million files. It outperforms existing models in terms of accurately reproducing the characteristics of the real trace.
Keywords:Big Data   Workload generation   HDFS   Popularity   Temporal locality   Storage
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号