首页 | 本学科首页   官方微博 | 高级检索  
     

Spark环境下基于多维布隆过滤器的星型连接算法
引用本文:周国亮,萨初日拉,朱永利.Spark环境下基于多维布隆过滤器的星型连接算法[J].计算机应用,2016,36(2):353-357.
作者姓名:周国亮  萨初日拉  朱永利
作者单位:1. 国网冀北电力有限公司 技能培训中心, 河北 保定 071051;2. 华北电力大学 控制与计算机工程学院, 河北 保定 071003
基金项目:中央高校基本科研业务费专项资金资助项目(13MS103);河北省自然科学基金资助项目(F2014502069)。
摘    要:为了适应联机分析处理(OLAP)系统中实时数据高性能分析需求不断提高的需求,提出一种能够适合Spark环境并结合多维Bloom Filter(MDBF)的星型连接算法SMDBFSJ。首先,根据多个维表构建MDBF,利用其占用空间小的特点,广播到所有节点;然后,在本地节点完成事实表过滤操作,事实表不需要在节点间移动数据;最后,过滤后的事实表与维表采用重划分方式进行连接,进而得到最终结果。SMDBFSJ算法避免了事实表数据移动,通过MDBF减小了需要广播的数据量,充分结合了广播连接和重划分连接的优势。实验结果表明了该算法的有效性,在单机和集群环境下,该算法相比重划分连接均获得了3倍左右的性能提升。

关 键 词:布隆过滤器  星型连接  联机分析处理  Spark  
收稿时间:2015-09-15
修稿时间:2015-09-23

Star join algorithm based on multi-dimensional Bloom filter in Spark
ZHOU Guoliang,SA Churila,ZHU Yongli.Star join algorithm based on multi-dimensional Bloom filter in Spark[J].journal of Computer Applications,2016,36(2):353-357.
Authors:ZHOU Guoliang  SA Churila  ZHU Yongli
Affiliation:1. Skill Training Center, State Grid Jibei Electric Power Company, Baoding Hebei 071051, China;2. School of Control and Computer Engineering, North China Electric Power University, Baoding Hebei 071003, China
Abstract:To meet the high performance analysis requirements for real-time data in On-Line Analytical Processing (OLAP) system, a new star join algorithm which is suitable for Spark platform was proposed based on Multi-Dimensional Bloom Filter (MDBF), namely SMDBFSJ (Spark Multi-Dimensional Bloom Filter Star Join). First of all, MDBF was built according to the dimension tables and broadcasted to all the nodes based on the feature of small size. Then the fact table was filtered completely on the local node, and there was no data movement between nodes. Finally, the filtered fact table and dimension tables were joined using repartition join model to get the final result. SMDBFSJ algorithm avoides the data moving of fact table, reduces the size of broadcasting data using MDBF, as well as fully combines the advantages of broadcast join and repartition join. The experimental results prove the validity of SMDBFSJ algorithm, in stand-alone and cluster environments. SMDBFSJ algorithm can obtain about three times of performance improvement compared with repartition join in Spark.
Keywords:Bloom filter                                                                                                                        star join                                                                                                                        On-Line Analytical Processing(OLAP)                                                                                                                        Spark
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号