首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce的Web日志挖掘
引用本文:李彬,刘莉莉.基于MapReduce的Web日志挖掘[J].计算机工程与应用,2012,48(22):95-98.
作者姓名:李彬  刘莉莉
作者单位:中国矿业大学计算机科学与技术学院,江苏徐州,221116
摘    要:针对单一CPU节点的Web数据挖掘系统在挖掘Web海量数据源时存在的计算瓶颈问题,利用云计算的分布式处理和虚拟化技术优势以及蚁群算法并行性的优点,设计一种基于Map/Reduce架构的Web日志挖掘算法。为进一步验证该算法的高效性,通过搭建Hadoop平台,利用该算法挖掘Web日志中用户的偏爱访问路径。实验结果表明,充分利用了集群系统的分布式计算能力处理大量的Web日志文件,可以大大地提高Web数据挖掘的效率。

关 键 词:云计算  Map/Reduce  Hadoop平台  Web日志挖掘  蚁群算法

Weblog mining based on MapReduce
LI Bin , LIU Lili.Weblog mining based on MapReduce[J].Computer Engineering and Applications,2012,48(22):95-98.
Authors:LI Bin  LIU Lili
Affiliation:School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
Abstract:The current data mining system based on single CPU has developed to a bottleneck to deal with mass data from Web.Using the advantage of cloud computing distributed processing,virtualization and parallelism of ant colony algorithm,this paper presents a weblog mining algorithm based on Map/Reduce’s framework.To further verify the high efficiency of the algorithm,it uses the algorithm to mine users’preferred access path based on Hadoop platform.Experimental results show that,using distributed algorithm to process large number of Weblog files in the cluster,can significantly improve the efficiency of Web data mining.
Keywords:cloud computing  Map/Reduce  Hadoop platform  Web log mining  ant colony algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号