首页 | 本学科首页   官方微博 | 高级检索  
     

Spark平台的分布式阶段自适应关联规则挖掘算法
引用本文:石慧,陈恩.Spark平台的分布式阶段自适应关联规则挖掘算法[J].计算机与现代化,2019,0(12):31.
作者姓名:石慧  陈恩
作者单位:汕尾职业技术学院信息工程系,广东 汕尾,516600;华为技术有限公司,广东 深圳,518129
基金项目:全国高等院校计算机基础教育研究会2016年度科研规划纵向课题(2016GHB02005); 广东省高职高专云计算与大数据专业委员会2019年度教育科研课题(GDYJSKT19-02)
摘    要:为满足日益增长的海量数据挖掘需求,迫切需要设计一种能够在多台机器上运行的分布式关联规则挖掘算法。Apriori这种高度迭代算法在Hadoop平台上运行时每次迭代执行大量的磁盘I/O操作,大大影响并限制了算法的运行效率。本文利用Spark对分布式计算内置支持的特点,在Spark平台上设计并实现一种分布式关联规则挖掘算法,称为阶段式自适应挖掘算法(Staged Adaptive Apriori)。算法使用自适应的数据集部分处理的策略对频繁项集进行高效挖掘,在每次迭代前初步评估执行时间,并采用较为合适的方法来减少时间和空间的复杂性,是一种基于数据集性质的自适应关联规则挖掘算法。实验结果表明了算法的有效性。

关 键 词:关联规则挖掘  Apriori算法  MapReduce  Spark
收稿时间:2019-12-11

Distributed Stage Adaptive Association Rules Mining Algorithm Based on Spark
SHI Hui,CHEN En.Distributed Stage Adaptive Association Rules Mining Algorithm Based on Spark[J].Computer and Modernization,2019,0(12):31.
Authors:SHI Hui  CHEN En
Abstract:In order to meet the growing demand for massive data mining, it is urgent to design a distributed association rule mining algorithm that can run on multiple machines. Apriori is a highly iterative algorithm that performs a large number of disk I/O operations per iteration when running on the Hadoop platform, greatly affecting and limiting the efficiency of the algorithm. This paper uses Spark to support the characteristics of memory distribution calculation and designs and implements a distributed association rule mining algorithm called Staged Adaptive Apriori on the Spark platform. The algorithm uses the adaptive data set partial processing strategy to efficiently mine frequent itemsets. The algorithm initially evaluates the execution time before each iteration, and adopts a more appropriate method to reduce the complexity of time and space. It is an adaptive association rule mining algorithm based on the nature of data sets. The experimental results demonstrate the effectiveness of the algorithm.
Keywords:association rule mining  Apriori  MapReduce  Spark  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号