首页 | 本学科首页   官方微博 | 高级检索  
     

基于Spark的并行FP-Growth算法优化及实现
引用本文:顾军华,武君艳,许馨匀,谢志坚,张素琪.基于Spark的并行FP-Growth算法优化及实现[J].计算机应用,2018,38(11):3069-3074.
作者姓名:顾军华  武君艳  许馨匀  谢志坚  张素琪
作者单位:1. 河北工业大学 人工智能与数据科学学院, 天津 300401;2. 河北省大数据计算重点实验室(河北工业大学), 天津 300401;3. 天津商业大学 信息工程学院, 天津 300134
基金项目:河北省科技计划项目(17210305D);天津市科技计划项目(16ZXHLSF0023);天津市自然科学基金资助项目(15JCQNJC00600);天津市科技计划项目(15ZXHLGX00130)。
摘    要:为了进一步提高在Spark平台上的频繁模式增长(FP-Growth)算法执行效率,提出一种新的基于Spark的并行FP-Growth算法——BFPG。首先,从频繁模式树(FP-Tree)规模大小和分区计算量对F-List分组策略进行改进,保证每个分区负载总和近似相等;然后,通过创建列表P-List对数据集划分策略进行优化,减少遍历次数,降低时间复杂度。实验结果表明,BFPG算法提高了并行FP-Growth算法挖掘效率,且算法具有良好的扩展性。

关 键 词:大数据平台  关联规则  频繁项集  频繁模式增长算法  Spark  
收稿时间:2018-04-30
修稿时间:2018-05-31

Optimization and implementation of parallel FP-Growth algorithm based on Spark
GU Junhua,WU Junyan,XU Xinyun,XIE Zhijian,ZHANG Suqi.Optimization and implementation of parallel FP-Growth algorithm based on Spark[J].journal of Computer Applications,2018,38(11):3069-3074.
Authors:GU Junhua  WU Junyan  XU Xinyun  XIE Zhijian  ZHANG Suqi
Affiliation:1. School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China;2. Hebei Province Key Laboratory of Big Data Computing(Hebei University of Technology), Tianjin 300401, China;3. School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China
Abstract:In order to further improve the execution efficiency of Frequent Pattern-Growth (FP-Growth) algorithm on Spark platform, a new parallel FP-Growth algorithm based on Spark, named BFPG (Better Frequent Pattern-Growth), was presented. Firstly, the grouping strategy F-List was improved in the size of the Frequent Pattern-Tree (FP-Tree) and the amount of partition calculation to ensure that the load sum of each partition was approximately equal. Secondly, the data set partitioning strategy was optimized by creating a list P-List, and then the time complexity was reduced by reducing the traversal times. The experimental results show that the BFPG algorithm improves the mining efficiency of the parallel FP-Growth algorithm, and the algorithm has good scalability.
Keywords:big data platform                                                                                                                        association rule                                                                                                                        frequent itemset                                                                                                                        Frequent Pattern-Growth (FP-Growth) algorithm                                                                                                                        Spark
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号