首页 | 本学科首页   官方微博 | 高级检索  
     

融合集群度与距离均衡优化的K-均值聚类算法
引用本文:王日宏,崔兴梅.融合集群度与距离均衡优化的K-均值聚类算法[J].计算机应用,2018,38(1):104-109.
作者姓名:王日宏  崔兴梅
作者单位:青岛理工大学 计算机工程学院, 山东 青岛 266033
基金项目:国家自然科学基金资助项目(61502262);山东省研究生教育创新计划项目(SDYY16023)。
摘    要:针对传统K-均值算法对初始聚类中心选择较为敏感的问题,提出了一种基于融合集群度与距离均衡优化选择的K-均值聚类(K-MCD)算法。首先,基于"集群度"思想选取初始簇中心;然后,遵循所有聚类中心距离总和均衡优化的选择策略,获得最终初始簇中心;最后,对文本集进行向量化处理,并根据优化算法重新选取文本簇中心及聚类效果评价标准进行文本聚类分析。对文本数据集从准确性与稳定性两方面进行仿真实验分析,与K-均值算法相比,K-MCD算法在4个文本集上的聚类精确度分别提高了18.6、17.5、24.3与24.6个百分点;在平均进化代数方差方面,K-MCD算法比K-均值算法降低了36.99个百分点。仿真结果表明K-MCD算法能有效提高文本聚类精确度,并具有较好的稳定性。

关 键 词:初始聚类中心  K-均值算法  集群度  距离均衡优化  文本聚类  
收稿时间:2017-07-17
修稿时间:2017-09-04

K-means clustering algorithm based on cluster degree and distance equilibrium optimization
WANG Rihong,CUI Xingmei.K-means clustering algorithm based on cluster degree and distance equilibrium optimization[J].journal of Computer Applications,2018,38(1):104-109.
Authors:WANG Rihong  CUI Xingmei
Affiliation:College of Computer Engineering, Qingdao University of Technology, Qingdao Shandong 266033, China
Abstract:To deal with the problem that the traditional K-means algorithm is sensitive to the initial clustering center selection, an algorithm of K-Means clustering based on Clustering degree and Distance equalization optimization (K-MCD) was proposed. Firstly, the initial clustering center was selected based on the idea of "cluster degree". Secondly, the selection strategy of total clustering center distance equilibrium optimization was followed to obtain the final initial clustering center. Finally, the text set was vectorized, and the text cluster center and the evaluation criteria of text clustering were reselected to perform text clustering analysis according to the optimization algorithm. The analysis of simulation experiment for the text data set was carried out from the aspects of accuracy and stability. Compared with K-means algorithm, the clustering accuracy of K-MCD algorithm was improved by 18.6, 17.5, 24.3 and 24.6 percentage points respectively for four text sets; the average evolutionary algebraic variance of K-MCD algorithm was 36.99 percentage points lower than K-means algorithm. The experimental results show that K-MCD algorithm can improve text clustering accuracy with good stability.
Keywords:initial clustering center  K-means algorithm  cluster degree  distance equalization optimization  text clustering  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号