首页 | 本学科首页   官方微博 | 高级检索  
     

改进的并行关联规则增量挖掘算法
引用本文:毛伊敏,邓千虎,邓小鸿,刘蔚.改进的并行关联规则增量挖掘算法[J].计算机应用研究,2021,38(10):2974-2980.
作者姓名:毛伊敏  邓千虎  邓小鸿  刘蔚
作者单位:江西理工大学 信息工程学院,江西 赣州341000;江西理工大学 应用科学学院,江西 赣州341000
基金项目:国家重点研发计划资助项目(2018YFC1504705);国家自然科学基金资助项目(41562019,61762046);江西省教育厅科技资助项目(GJJ209407)
摘    要:针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algo-rithm using rough set and merge pruning).首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力.最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理.

关 键 词:Can树  粗糙集  归并剪枝  大数据  增量挖掘
收稿时间:2021/3/18 0:00:00
修稿时间:2021/9/13 0:00:00

Improved parallel association rules incremental mining algorithm
MAO Yimin,DENG Qianhu,DENG Xiaohong and LIU Wei.Improved parallel association rules incremental mining algorithm[J].Application Research of Computers,2021,38(10):2974-2980.
Authors:MAO Yimin  DENG Qianhu  DENG Xiaohong and LIU Wei
Affiliation:School of Information Engineering,Jiangxi University of Science and Technology,,,
Abstract:In the big data environment, the Can-tree based on incremental association rule algorithm has problems such as too much space occupation of the tree structure, the efficiency of frequent pattern mining is poor, and the parallelization performance of MapReduce cluster is insufficient. Aiming at these problems, this paper proposed the MR-PARIRM. Firstly, it designed a RS-SIM to merge similar items in the dataset, and constructed Can-tree based on the merged data, thereby reducing the space occupation of the tree structure. Secondly, this paper proposed a MPS to prun and merge the propagation paths in the tree structure, thereby compressing the frequent pattern search space to speed up frequent item mining. Finally, MR-PARIRM used the DSS to dynamically schedule the computing tasks in the heterogeneous MapReduce cluster, thereby implementing the load balance and effectively improving the parallel computing capabilities of the cluster. The final experimental simulation results show that MR-PARIRM has relatively better performance in the big data environment and is suitable for parallel processing of large-scale data.
Keywords:Can-tree  rough set  merge pruning  big data  incremental mining
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号