首页 | 本学科首页   官方微博 | 高级检索  
     

基于Web日志挖掘的Web文档聚类
引用本文:高哲,魏海平,王福威,赵晓碧.基于Web日志挖掘的Web文档聚类[J].计算机工程与设计,2008,29(18).
作者姓名:高哲  魏海平  王福威  赵晓碧
作者单位:辽宁石油化工大学,计算机与通信工程学院,辽宁,抚顺,113001
摘    要:Web日志挖掘是Web挖掘的一种,介绍了Web日志挖掘的一般过程,研究了k-means聚类算法,并分析了k-means聚类算法的不足.k-means聚类算法迭代过程中每次都需要计算每个数据对象到簇质心的距离,使得聚类效率不高,针对这个问题,提出了k-means聚类算法的改进算法,该算法避免了重复计算数据对象到簇质心的距离,并用这两种算法实现了Web文档的聚类.试验结果表明,该改进算法提高了聚类效率.

关 键 词:日志挖掘  Web日志  文档聚类  日志预处理

Web document clustering based on web-log mining
GAO Zhe,WEI Hai-ping,WANG Fu-wei,ZHAO Xiao-bi.Web document clustering based on web-log mining[J].Computer Engineering and Design,2008,29(18).
Authors:GAO Zhe  WEI Hai-ping  WANG Fu-wei  ZHAO Xiao-bi
Affiliation:GAO Zhe,WEI Hai-ping,WANG Fu-wei,ZHAO Xiao-bi(School of Computer , Communication Engineering,Liaoning Shihua University,Funshun 113001,China)
Abstract:Web log mining is one of the web mining.The process of the web log mining and the k-means algorithms are introduced.And the shortage of the k-means algorithm is analyzed.The k-means algorithm needs to compute the distance between every data object and the center of the clusters,which lowers the efficiency.To this problem,an enhanced algorithm of the k-means is put forward,which avoids computing the distance between every data object and the center of the clusters.Web document clustering is implemented with ...
Keywords:k-means
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号