基于Web-Log Mining的Web文档聚类 Document Clustering Based on Web-Log Mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Web-Log Mining的Web文档聚类

引用本文：	苏中,马少平,杨强,张宏江.基于Web-Log Mining的Web文档聚类[J].软件学报,2002,13(1):99-104.

作者姓名：	苏中马少平杨强张宏江

作者单位：	1. 清华大学,计算机科学与技术系,北京,100084;清华大学,智能技术与系统国家重点实验室,北京,100084 2. Simon,Fraser大学,加拿大 3. 微软中国研究院,北京,100080

基金项目：	国家重点基础研究发展规划973资助项目(G1998030509)

摘要：	速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(recursive density based clustering algorithm,简称RDBC),此算法可以智能地、动态地修改其密度参数.RDBC是基于DBSCAN的一种改进算法,其运算复杂度和DBSCAN相同.通过在Web文档上的聚类实验,结果表明,RDBC不但保留了DBSCAN高速度的优点,而且聚类效果大大优于DBSCAN.
关键词：	数据库聚类 Web mining 数据挖掘
文章编号：	1000-9825/2002/13(01)0099-06
收稿时间：	4/3/2000 12:00:00 AM
修稿时间：	2000年4月3日
Document Clustering Based on Web-Log Mining

SU Zhong,MA Shao-ping,YANG Qiang and ZHANG Hong-jiang.Document Clustering Based on Web-Log Mining[J].Journal of Software,2002,13(1):99-104.

Authors:	SU Zhong MA Shao-ping YANG Qiang and ZHANG Hong-jiang

Abstract:	The effectiveness and efficiency are two problems in clustering algorithms. DBSCAN is a typical density based clustering algorithm that is very efficient on large databases. In this paper, a recursive density based clustering algorithm that can adaptively change its parameters intelligently is presented. This clustering algorithm RDBC (recursive density based clustering algorithm) is based on DBSCAN. It can be shown that RDBC require the same time complexity as that of the DBSCAN algorithm. In addition, it is proved both analytically and experimentally that this method yields results more superior than that of DBSCAN.

Keywords:	databases clustering web mining data mining
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏