首页 | 本学科首页   官方微博 | 高级检索  
     

基于负载均衡和冗余剪枝的并行FP-Growth算法
引用本文:刘祥哲 刘培玉任敏 伊静 高钊. 基于负载均衡和冗余剪枝的并行FP-Growth算法[J]. 数据采集与处理, 2016, 31(1): 223-230
作者姓名:刘祥哲 刘培玉任敏 伊静 高钊
作者单位:1.山东师范大学信息科学与工程学院,济南,250014;2.山东省分布式计算机软件新技术重点实验室,济南,250014;3.山东财经大学数学与数量经济学院,济南,250014; 4.山东建筑大学计算机科学与技术学院,济南,250101
摘    要:针对现有的并行FP-Growth算法在数据并行分组时存在数据冗余和负载不均的问题,提出了基于负载估算和冗余剪枝的优化算法。首先,在采用高频策略分组时,引入节点任务估算方法,把每个分组中最大模式树的最长路径和支持度作为该分组的估计值,将估计值远大于其他节点的分组进行分割,平均到其他分组中,并且对不同分组中重复的列表元素进行截断,去除冗余数据。实验表明,本文提出的算法能够有效防止并行化的数据倾斜,减少数据冗余,在时间和空间复杂度上要低于以前的并行化FP-Growth算法。

关 键 词:关联规则;MapReduce;冗余剪枝;FP-Growth算法

Parallel FP-Growth Algorithm Based on Load Balancing and Redundancy Pruning
Liu Xiangzhe,Liu Peiyu,Ren Min,Yi Jing,Gao Zhao. Parallel FP-Growth Algorithm Based on Load Balancing and Redundancy Pruning[J]. Journal of Data Acquisition & Processing, 2016, 31(1): 223-230
Authors:Liu Xiangzhe  Liu Peiyu  Ren Min  Yi Jing  Gao Zhao
Abstract:Focusing on the data redundancy and load-unbalancing problems in the existing parallel FP-Growth algorithm, an improved algorithm for redundancy pruning and load balancing is proposed. Firstly, the improved algorithm introduces group task estimation method when using high-frequency strategies group. The longest path in the maximum pattern tree and the highest frequency are used as estimation. The group task will be averaged to others again when the group estimated is much larger than the value of other group. Then, the repetitive elements are removed in the list of different groups. Experimental results show that the improved algorithm avoides the data skew in the MapReduce and it is superior to the original one due to its high execution efficiency and speedup.
Keywords:association rules   MapReduce   redundancy pruning   FP-Growth algorithm
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号