一种新的Web日志聚类算法的研究与实现 Research and Realization on a New Clustering Algorithm for Web Log期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种新的Web日志聚类算法的研究与实现

引用本文：	王羽婷,徐维祥,张翼,李华华.一种新的Web日志聚类算法的研究与实现[J].现代电子技术,2007,30(24):139-142.

作者姓名：	王羽婷徐维祥张翼李华华

作者单位：	北京交通大学,交通运输学院,北京,100044

摘要：	传统的用于Web日志聚类的算法大都需要用户指定聚类个数。提出了一种新的自适应聚类算法并对Web日志用户会话进行聚类。该算法基于凝聚聚类思想和划分聚类思想,用初始数据集中每2个会话之间的相异度作为距离的度量,合并距离小于一定阈值的两个会话以产生初始聚类,再根据一定的规则动态地合并距离最小的会话类或会话,算法的结果是产生自然的聚类。最后,通过比较会话聚类的内部距离和类间距离来验证算法的有效性。这种聚类算法的最大优点在于,他能够产生自动的聚类,而不需要用户事先指定需要产生的聚类个数,并且能有效识别孤立点。实验表明,这种聚类能够产生较高质量的聚类效果。
关键词：	相异度凝聚聚类算法自适应聚类算法用户会话
文章编号：	1004-373X（2007）24-139-04
收稿时间：	2007-06-04
修稿时间：	2007年6月4日
Research and Realization on a New Clustering Algorithm for Web Log

WANG Yuting,XU Weixiang,ZHANG Yi,LI Huahua.Research and Realization on a New Clustering Algorithm for Web Log[J].Modern Electronic Technique,2007,30(24):139-142.

Authors:	WANG Yuting XU Weixiang ZHANG Yi LI Huahua

Abstract:	In most Web log clustering methods,the number of clusters is predefined and the clusters are highly dependent on the initial identification of elements that represent the clusters well.In this paper,we advance an adaptive clustering algorithm and use it on clustering user-sessions from Web log.The algorithm is based on agglomeration and division,which uses degree of dissimilitude as the distance between two user-sessions,merges two clusters or one session and a cluster according to some rules dynamically and produces natural clusters finally.The algorithm proves to be effective through comparing the average inner distance of a cluster and outer distances among clusters.The advantages of algorithm are that it can cluster without regard to the initial number of clusters and can identify outliers effectively.

Keywords:	degree of dissimilitude agglomerative clustering adaptive clustering user session
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏