首页 | 本学科首页   官方微博 | 高级检索  
     

基于Spark的并行关联规则挖掘算法研究综述
引用本文:刘莉萍,章新友,牛晓录,郭永坤,丁亮.基于Spark的并行关联规则挖掘算法研究综述[J].计算机工程与应用,2019,55(9):1-9.
作者姓名:刘莉萍  章新友  牛晓录  郭永坤  丁亮
作者单位:江西中医药大学 计算机学院,南昌,330004;江西中医药大学 药学院,南昌,330004
摘    要:关联规则挖掘是数据挖掘的一个重要分支,但随着数据的快速增长,传统关联规则挖掘算法不能很好地适应大数据的要求,需要在分布式、并行计算的平台上寻找突破。Spark是专门为大数据处理而设计的一个适合迭代运算的并行计算模型,相比MapReduce具有更高效、充分利用内存、更适合迭代计算和交互式处理的优点。对已有的基于Spark的并行关联规则挖掘算法进行了分类和综述,并总结了各自的优缺点和适用范围,为下一步的研究提供参考。

关 键 词:SPARK  并行  关联规则挖掘  APRIORI  FP-GROWTH

Survey of Spark-Based Parallel Association Rules Mining Algorithm
LIU Liping,ZHANG Xinyou,NIU Xiaolu,GUO Yongkun,DING Liang.Survey of Spark-Based Parallel Association Rules Mining Algorithm[J].Computer Engineering and Applications,2019,55(9):1-9.
Authors:LIU Liping  ZHANG Xinyou  NIU Xiaolu  GUO Yongkun  DING Liang
Affiliation:1.School of Computer, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China 2.School of Pharmacy, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
Abstract:Association rule mining is an important branch of data mining. However, with the rapid growth of data, the traditional association rule mining algorithm can not adapt to the requirements of big data well, and it is necessary to find a breakthrough on the platform of distributed and parallel computing. Spark is a parallel computing model suitable for big data processing and suitable for iterative operation. Compared with MapReduce, it has the advantages of more efficient, full utilization of memory, more suitable for iterative calculation and interactive processing. The existing Spark-based parallel association rules mining algorithms are classified and summarized, and their advantages, disadvantages and scope of application are summarized, which provides reference for the next step.
Keywords:Spark  parallel  association rule mining  Apriori  FP-Growth  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号