首页 | 本学科首页   官方微博 | 高级检索  
     


Single‐scan: a fast star‐join query processing algorithm
Authors:Vasile Purdilă  Ştefan‐Gheorghe Pentiuc
Affiliation:Stefan cel Mare University of Suceava, Suceava, Romania
Abstract:A data warehouse can store very large amounts of data that should be processed in parallel in order to achieve reasonable query execution times. The MapReduce programming model is a very convenient way to process large amounts of data in parallel on commodity hardware clusters. A very popular query used in data warehouses is star‐join. In this paper, we present a fast and efficient star‐join query execution algorithm built on top of a MapReduce framework called Hadoop. By using dynamic filters against dimension tables, the algorithm needs a single scan of the fact table, which means a significant reduction of input/output operations and computational complexity. Also, the algorithm requires only two MapReduce iterations in total–one to build the filters against dimension tables and one to scan the fact table. Our experiments show that the proposed algorithm performs much better than the existing solutions in terms of execution time and input/output. Copyright © 2014 John Wiley & Sons, Ltd.
Keywords:algorithm  data warehouse  dimension table  fact table  Hadoop  MapReduce  parallel processing  Bloom filter  star‐join
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号