首页 | 本学科首页   官方微博 | 高级检索  
     

HC-Store: putting MapReduce's foot in two camps
作者姓名:Huiju WANG,  Furong LI,  Xuan ZHOU,  Yu CAO,  Xiongpai QIN,  Jidong CHEN,  Shan WANG
作者单位:[1]DEKE Lab, Renmin University of China, Ministry of Education, Beijing 100872, China; [2]School of Information, Renmin University of China, Beijing 100872, China; [3]EMC Labs China, Beijing 100084, China; [4]School of Computing, National University of Singapore, Singapore 117417, Singapore
基金项目:Acknowledgements This work was sponsored by the National Key Basic Research Program of China (973 Program) (2014CB340403), the National Natural Science Foundation of China (Grant Nos. 61170013, 61272138 and 61232007).
摘    要:MapReduce is a popular framework for large- scale data analysis. As data access is critical for MapReduce's performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storage model is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models - pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store. We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.

关 键 词:商店  数据访问模式  存储模型  最佳性能  数据分析  存储模式  存储系统  成本模型
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号