首页 | 本学科首页   官方微博 | 高级检索  
     

IABS:一个基于Spark的Apriori改进算法
引用本文:闫梦洁,罗 军,刘建英,侯传旺.IABS:一个基于Spark的Apriori改进算法[J].计算机应用研究,2017,34(8).
作者姓名:闫梦洁  罗 军  刘建英  侯传旺
作者单位:国防科学技术大学 计算机学院,国防科学技术大学 计算机学院,国防科学技术大学 计算机学院,国防科学技术大学 计算机学院
摘    要:Apriori算法是关联规则挖掘中最经典的算法之一,其核心问题是频繁项集的获取。针对经典Apriori算法存在的需多次遍历事务数据库及需产生候选项集等问题,首先通过转换存储结构、消除候选集产生过程等方法对Apriori算法进行优化,同时,随着大数据时代的到来,数据量与日俱增,传统算法面临巨大挑战,因此,又将优化的Apriori与Spark相结合,充分利用Spark的内存计算、弹性分布式数据集等优势,提出了IABS(Improved Apriori algorithm based on Spark)。通过与已有的同类算法进行比较,IABS的数据可扩展性和节点可扩展性得以验证,并且在多种数据集上平均获得了23.88%的性能提升,尤其随着数据量的增长,性能提升更加明显。

关 键 词:Apriori  算法  频繁项集  存储结构转换  Spark  内存计算
收稿时间:2016/5/19 0:00:00
修稿时间:2017/4/21 0:00:00

IABS: An parallel improved Apriori algorithm based on Spark
Yan Mengjie,Luo Jun,Liu Jianying and Hou Chuanwang.IABS: An parallel improved Apriori algorithm based on Spark[J].Application Research of Computers,2017,34(8).
Authors:Yan Mengjie  Luo Jun  Liu Jianying and Hou Chuanwang
Affiliation:School of Computer,National University of Defense Technology,School of Computer,National University of Defense Technology,School of Computer,National University of Defense Technology,School of Computer,National University of Defense Technology
Abstract:Apriori algorithm is one of the most classical algorithm in association rule mining, the core problem is the generation process of frequent itemsets. Firstly, aimed at the existing problems of classical Apriori algorithm, such as it needs to scan the transaction database global for several times and needs to generate candidate itemsets, optimized it by transforming storage structure and eliminating the process of candidate itemsets generation. Then, with the advent of the era of big data, data volume rises with the day, classical Apriori algorithm faces severe challenge, so based on the improved Apriori algorithm and combined with Spark platform, proposed the IABS (Improved Apriori algorithm Based on Spark) algorithm, which made full use of Spark, such as in-memory computation, resilient distributed datasets. Compared with already existing similar algorithms, the sizeup and node salability of IABS are validated, as well as, IABS achieves 23.88% performance improvement in average for various benchmarks. Especially, as the growth of data, performance improvement is more obvious.
Keywords:Apriori algorithm  frequent itemset  storage structure transformation  Spark  in-memory computation
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号