首页 | 本学科首页   官方微博 | 高级检索  
     

基于项编码的分布式频繁项集挖掘算法
引用本文:郑静益,邓晓衡.基于项编码的分布式频繁项集挖掘算法[J].计算机应用研究,2019,36(4).
作者姓名:郑静益  邓晓衡
作者单位:中南大学软件学院,长沙,410075;中南大学软件学院,长沙,410075
基金项目:中南大学研究生科研创新项目(2017zzts612)
摘    要:Apriori算法是解决频繁项集挖掘最常用的算法之一,但多轮迭代扫描完整数据集的计算方式,严重影响算法效率且难以并行化处理。随着数据规模的持续增大,这一问题日益严重。针对这一问题,提出了一种基于项编码和Spark计算框架的Apriori并行化处理方法——IEBDA算法,利用项编码完整保存项集信息,在不重复扫描完整数据集的情况下完成频繁项集挖掘,同时利用Spark的广播变量实现并行化处理。与其他分布式Apriori算法在不同规模的数据集上进行性能比较,发现IEBDA算法从第一轮迭代后加速效果明显。结果表明,该算法可以提高大数据环境下的多轮迭代的频繁项集挖掘效率。

关 键 词:频繁项集挖掘  Apriori算法  大数据  分布式计算
收稿时间:2017/11/30 0:00:00
修稿时间:2018/4/19 0:00:00

Novel distributed itemset mining algorithm based on item encoding
Zheng Jingyi and Deng Xiaoheng.Novel distributed itemset mining algorithm based on item encoding[J].Application Research of Computers,2019,36(4).
Authors:Zheng Jingyi and Deng Xiaoheng
Affiliation:College of Software,Central South University,
Abstract:Apriori is one of the most widely used algorithm to discover frequent patterns. However, scanning the entire dataset in each iteration makes this algorithm inefficient and hard to be in parallel. With the size of datasets gets larger continuously, this problem is becoming more and more serious. Therefore, a novel algorithm called IEBDA is proposed. This algorithm is a kind of parallelization of Apriori based on item encoding and Spark framework. Saving information of each itemset by item encoding so that it can finish frequent itemset mining without scanning the whole dataset repeatedly. The broadcast variables of Spark enables this algorithm to be in parallel. Compared with other distributed Apriori algorithms on datasets with different sizes, the acceleration of mining after the first iteration is obvious. The results show that this algorithm do efficiently improve the multi-iteratively frequent itemset mining in big data environment.
Keywords:frequent itemset mining  Apriori algorithm  big data  distributed computation
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号