首页 | 官方网站   微博 | 高级检索  
     

基于聚类准则函数的改进K-means算法
引用本文:张雪凤,张桂珍,刘鹏.基于聚类准则函数的改进K-means算法[J].计算机工程与应用,2011,47(11):123-127.
作者姓名:张雪凤  张桂珍  刘鹏
作者单位:1.上海财经大学 信息管理与工程学院,上海 200433 2.上海财经大学 继续教育学院,上海 200080
基金项目:上海财经大学"211工程"三期重点学科建设项目
摘    要:K-means算法所使用的聚类准则函数是将数据集中各个簇的误差平方值直接相加而得到的,不能有效处理簇的密度不均且大小差异较大的数据集。为此,将K-means算法的聚类准则函数定义为加权的簇内标准差之和,权重为簇内数据对象数占总数目的比例。同时,调整了传统K-means算法将数据对象重新分配给簇的方法,采用一个数据对象到中心点的加权距离代替传统K-means算法中的距离,将数据对象分配给使加权距离最小的中心点所在的簇。实验结果表明,针对模拟数据集的聚类,改进K-means算法可以明显减少大而稀的簇中数据对象被错误地分配到相邻的小而密簇的可能性,改善了聚类的质量;针对UCI数据集的聚类,改进算法使得各个簇更为紧凑,从而验证了改进K-means算法的有效性。

关 键 词:K-means算法    聚类准则函数  
修稿时间: 

Improved K-means algorithm based on clustering criterion function
ZHANG Xuefeng,ZHANG Guizhen,LIU Peng.Improved K-means algorithm based on clustering criterion function[J].Computer Engineering and Applications,2011,47(11):123-127.
Authors:ZHANG Xuefeng  ZHANG Guizhen  LIU Peng
Affiliation:1.School of Information Management and Engineering,Shanghai University of Finance & Economics,Shanghai 200433,China 2.School of Continuing Education,Shanghai University of Finance & Economics,Shanghai 200080,China
Abstract:The criterion function used in K-means algorithm is the sum of the squared error,which may not work well for dataset containing clusters with different sizes and densities.In this study,the criterion function is improved by being defined as the sum of the weighted standard deviation,and the weight is the ratio of the number of points in each cluster to the whole points.The way each point being assigned to the centroid in the K-means algorithm is also modified:Instead of being assigned to the closest centroid,each point is assigned to the centroid which has minimum weighted distance.Experiments on simulation datasets show that the improved K-means algorithm significantly enhances the clustering quality by reducing the probability of misclassifying the points of big sparse clusters to its neighboring compact clusters.Experiments on UCI datasets show that the improved algorithm can obtain more compact cluster.Therefore,the improved K-means algorithm is effective.
Keywords:K-means algorithm  cluster  clustering criterion function
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号