首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类划分的高效用模式并行挖掘算法
引用本文:邢淑凝,刘方爱,赵晓晖.基于聚类划分的高效用模式并行挖掘算法[J].计算机应用,2016,36(8):2202-2206.
作者姓名:邢淑凝  刘方爱  赵晓晖
作者单位:1. 山东师范大学 信息科学与工程学院, 济南 250014;2. 山东省分布式计算机软件新技术重点实验室(山东师范大学), 济南 250014
基金项目:国家自然科学基金资助项目(90612003,61572301)。
摘    要:针对在大规模数据库中挖掘高效用模式产生大量基于内存的效用模式树,从而导致内存空间占用较大以及丢失一些高效用项集的问题,提出在Hadoop分布式计算平台下的基于聚类划分的高效用模式并行挖掘算法PUCP。首先,采用聚类的方法把数据库中相似的事务划分为若干数据子集;然后,把若干划分好的数据子集分配到Hadoop平台的各个节点中构造效用模式树;最后,把各个节点中相同项的条件模式基分配到同一个节点中进行挖掘,以减少各个节点交叉操作的次数。通过实验结果和理论分析表明:PUCP算法在不影响挖掘结果可靠性的前提下,与主流串行高效用模式挖掘——效用模式增长挖掘算法(UP-Growth)和现有的并行高效用模式挖掘算法PHUI-Growth相比,挖掘效率分别提高了61.2%和16.6%;并且使用了Hadoop计算平台,能有效缓解挖掘大规模数据的内存压力。

关 键 词:大数据    高效用模式挖掘    聚类    并行计算    Hadoop
收稿时间:2016-01-11
修稿时间:2016-02-27

Parallel high utility pattern mining algorithm based on cluster partition
XING Shuning,LIU Fang'ai,ZHAO Xiaohui.Parallel high utility pattern mining algorithm based on cluster partition[J].journal of Computer Applications,2016,36(8):2202-2206.
Authors:XING Shuning  LIU Fang'ai  ZHAO Xiaohui
Affiliation:1. College of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China;2. Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology(Shandong Normal University), Jinan Shandong 250014, China
Abstract:The exiting algorithms generate a lot of utility pattern trees based on memory when mining high utility patterns in large-scale database, leading to occupying more memory spaces and losing some high utility itemsets. Using Hadoop platform, a parallel high utility pattern mining algorithm, named PUCP, based on cluster partition was proposed. Firstly, the clustering method was introduced to divide the transaction database into several sub-datasets. Secondly, sub-datasets were allocated to each node of Hadoop to construct utility pattern tree. Finally, the conditional pattern bases of the same item which generated from utility pattern trees were allocated to the same node, reducing the crossover operation times of each node. The theoretical analysis and experimental results show that, compared with the mainstream serial high utility pattern mining algorithm named UP-Growth (Utility Pattern Growth) and parallel high utility pattern mining algorithm named HUI-Growth (Parallel mining High Utility Itemsets by pattern-Growth), the mining efficiency of PUCP is increased by 61.2% and 16.6% respectively without affecting the reliability of the mining results; and the memory pressure of large data mining can be effectively relieved by using Hadoop platform.
Keywords:big data                                                                                                                        high utility pattern mining                                                                                                                        clustering                                                                                                                        parallel computing                                                                                                                        Hadoop
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号