首页 | 本学科首页   官方微博 | 高级检索  
     

基于Map Reduce的序列模式挖掘算法
引用本文:刘栋,尉永清,薛文娟.基于Map Reduce的序列模式挖掘算法[J].计算机工程,2012,38(15):43-45.
作者姓名:刘栋  尉永清  薛文娟
作者单位:1. 山东师范大学信息科学与工程学院,济南250014;山东省分布式计算机软件新技术重点实验室,济南250014
2. 山东警察学院公共基础部,济南,250014
基金项目:国家自然科学基金资助项目,山东省自然科学基金资助项目
摘    要:传统数据挖掘算法在处理海量数据集时计算能力有限。为解决该问题,提出一种基于Map Reduce的分布式序列模式挖掘算法MR-PrefixSpan。在PrefixSpan算法的基础上,对模式挖掘任务进行分割,利用Map函数处理由不同前缀得到的序列模式,并行构造投影数据库,从而提高挖掘效率及简化搜索空间。采用Reduce函数对中间结果进行规约,得到全局序列模式。在Hadoop集群上的实验结果表明,MR-PrefixSpan能减少数据库扫描时间,具有较高的并行加速比和较好的可扩展性。

关 键 词:云计算  并行处理  Map  Reduce模型  PrefixSpan算法  序列模式  Hadoop平台
收稿时间:2011-10-11

Sequential Pattern Mining Algorithm Based on Map Reduce
LIU Dong , WEI Yong-qing , XUE Wen-juan.Sequential Pattern Mining Algorithm Based on Map Reduce[J].Computer Engineering,2012,38(15):43-45.
Authors:LIU Dong  WEI Yong-qing  XUE Wen-juan
Affiliation:1,2(1.School of Information Science and Engineering,Shandong Normal University,Jinan 250014,China;2.Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology,Jinan 250014,China;3.Basic Education Department,Shandong Police College,Jinan 250014,China)
Abstract:Traditional data mining algorithm has computing power shortage in dealing with mass data set.Aiming at the problem,a distributed sequential pattern mining algorithm based on Map Reduce programming model named MR PrefixSpan is proposed.Mining tasks are decomposed to many,the Map function is used to mine each Prefix projected sequential pattern,and the projected databases are constructed parallelly.It simplifies the search space and acquires a higher mining efficiency.Then the intermediate values are passed to a Reduce function which merges together all these values to produce a possibly smaller set of values.Experimental results on Hadoop cluster show that MR PrefixSpan can reduce the time of scanning data base,has higher parallel speed up ratio and better expansibility.
Keywords:cloud computing  parallel processing  Map Reduce model  PrefixSpan algorithm  sequential pattern  Hadoop platform
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号