首页 | 本学科首页   官方微博 | 高级检索  
     

改进的共享最近邻聚类算法
引用本文:李霞,蒋盛益. 改进的共享最近邻聚类算法[J]. 计算机工程与应用, 2011, 47(8): 138-142. DOI: 10.3778/j.issn.1002-8331.2011.08.041
作者姓名:李霞  蒋盛益
作者单位:广东外语外贸大学 思科信息学院,广州 510006
摘    要:聚类是一种无监督的机器学习方法,其任务是发现数据中的自然簇。共享最近邻聚类算法(SNN)在处理大小不同、形状不同以及密度不同的数据集上具有很好的聚类效果,但该算法还存在以下不足:(1)时间复杂度为O(n2),不适合处理大规模数据集;(2)没有明确给出参数阈值的简单指导性操作方法;(3)只能处理数值型属性数据集。对共享最近邻算法进行改进,使其能够处理混合属性数据集,并给出参数阈值的简单选择方法,改进后算法运行时间与数据集大小成近似线性关系,适用于大规模高维数据集。在真实数据集和人造数据集上的实验结果表明,提出的改进算法是有效可行的。

关 键 词:共享最近邻聚类算法  一趟聚类算法  大规模数据集  
修稿时间: 

Improved shared nearest neighbor clustering algorithm
LI Xia,JIANG Shengyi. Improved shared nearest neighbor clustering algorithm[J]. Computer Engineering and Applications, 2011, 47(8): 138-142. DOI: 10.3778/j.issn.1002-8331.2011.08.041
Authors:LI Xia  JIANG Shengyi
Affiliation:Cisco School of Informatics,Guangdong University of Foreign Studies,Guangzhou 510006,China
Abstract:Clustering is a method of unsupervised learning in machine learning,the typical task of which is to discovery “natural” clusters present in the data.The shared nearest neighbor algorithm is one of the most efficient clustering algorithm which can handle datasets of different sizes,shapes and densities.But there are still some shortages about the algorithm.SNN can’t handle large dataset because of its high complexity.There are no definite methods about threshold of the algorithm.SNN can not process databases with mixture attributes.This paper improves the SNN algorithm to process the data with categorical attributes,gives a simple definite method to select threshold of the algorithm.The time complexity of the improved algorithm is nearly linear with the size of dataset and can be used to large dataset.The experimental results on real datasets and synthetic datasets show that the improved algorithm is effective and practicable.
Keywords:shared nearest neighbor clustering algorithm  one-pass clustering algorithm  large dataset
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号