首页 | 本学科首页   官方微博 | 高级检索  
     

基于相异性度量选取初始聚类中心改进的K-means聚类算法
引用本文:廖纪勇,吴晟,刘爱莲. 基于相异性度量选取初始聚类中心改进的K-means聚类算法[J]. 控制与决策, 2021, 36(12): 3083-3090
作者姓名:廖纪勇  吴晟  刘爱莲
作者单位:昆明理工大学信息工程与自动化学院,昆明650500
摘    要:选取合理的初始聚类中心是正确聚类的前提,针对现有的K-means算法随机选取聚类中心和无法处理离群点等问题,提出一种基于相异性度量选取初始聚类中心改进的K-means聚类算法.算法根据各数据对象之间的相异性构造相异性矩阵,定义了均值相异性和总体相异性两种度量准则;然后据此准则来确定初始聚类中心,并利用各簇中数据点的中位数代替均值以进行后续聚类中心的迭代,消除离群点对聚类准确率的影响.此外,所提出的算法每次运行结果保持一致,在初始化和处理离群点方面具有较好的鲁棒性.最后,在人工合成数据集和UCI数据集上进行实验,与3种经典聚类算法和两种优化初始聚类中心改进的K-means算法相比,所提出的算法具有较好的聚类性能.

关 键 词:聚类分析  K-means算法  初始聚类中心  离群点  相异性度量  鲁棒性

Improved K-means clustering algorithm for selecting initial clustering centers based on dissimilarity measure
LIAO Ji-yong,WU Sheng,LIU Ai-lian. Improved K-means clustering algorithm for selecting initial clustering centers based on dissimilarity measure[J]. Control and Decision, 2021, 36(12): 3083-3090
Authors:LIAO Ji-yong  WU Sheng  LIU Ai-lian
Affiliation:School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
Abstract:Selecting a reasonable initial clustering center is the premise of correct clustering. Most of the existing K-means algorithms have some shortcomings, such as randomly selecting clustering centers and unable to deal with outliers, an improved K-means clustering algorithm for selecting initial clustering centers based on dissimilarity measure is proposed. According to the dissimilarity of each data object, the dissimilarity matrix is constructed, and two measures of mean dissimilarity and total dissimilarity are defined. Then the initial clustering center is determined according to the criteria, and the median of data points in each cluster is used to replace the mean value for the subsequent iteration of clustering center, so as to eliminate the effect of outliers on clustering accuracy. In addition, the proposed algorithm maintains consistent results every time, and has better robustness in initializing and handling outliers. Finally, experiments are performed on the synthetic datasets and UCI datasets. Compared with three classical clustering algorithms and two improved K-means algorithms, the proposed algorithm has better clustering performance.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号