首页 | 本学科首页   官方微博 | 高级检索  
     

基于模拟退火与贪心策略的平衡聚类算法
引用本文:唐海波,林煜明,李优,蔡国永.基于模拟退火与贪心策略的平衡聚类算法[J].计算机应用,2018,38(11):3132-3138.
作者姓名:唐海波  林煜明  李优  蔡国永
作者单位:1. 广西可信软件重点实验室(桂林电子科技大学), 广西 桂林 541004;2. 广西自动检测技术与仪器重点实验室(桂林电子科技大学), 广西 桂林 541004
基金项目:国家自然科学基金资助项目(61562014,61763007,U1711263);广西自然科学基金资助项目(2015GXNSFAA139303);广西可信软件重点实验室研究课题;广西自动检测技术与仪器重点实验室主任基金立项项目(YQ17111)。
摘    要:针对现实应用通常要求聚类的结果相对平衡的问题,提出了一种基于模拟退火与贪心策略的平衡聚类算法(BCSG),该算法包括基于模拟退火的初始点选择算法(SACI)与基于贪心策略的平衡聚类算法(BCGS)2个步骤,以提高平衡聚类算法的聚类效果与时间性能。首先基于模拟退火在数据集中快速定位出K个合适的数据点作为平衡聚类初始点,然后每个中心点分阶段贪婪地将距离其最近的数据点加入簇中直至达到簇规模上限。在6个UCI真实数据集与2个公开图像数据集上进行的聚类对比实验结果表明:在簇数目较大时相比Fuzzy C-Means聚类结果平衡度最高提升了50%以上;聚类结果的准确率相比Balanced K-Means、BCLS两个表现较好的算法平均提高了8个百分点;算法时间复杂度也更低,在较大规模的数据集上运行时间比Balanced K-Means最高减少了近40%。实验结果表明BCSG具有更佳的聚类效果和时间性能。

关 键 词:平衡聚类  贪心算法  模拟退火  近似算法  数据挖掘  
收稿时间:2018-04-29
修稿时间:2018-06-21

Balanced clustering based on simulated annealing and greedy strategy
TANG Haibo,LIN Yuming,LI You,CAI Guoyong.Balanced clustering based on simulated annealing and greedy strategy[J].journal of Computer Applications,2018,38(11):3132-3138.
Authors:TANG Haibo  LIN Yuming  LI You  CAI Guoyong
Affiliation:1. Guangxi Key Laboratory of Trusted Software(Guilin University of Electronic Technology), Guilin Gunangxi 541004, China;2. Guangxi Key Laboratory of Automatic Testing and Instrument(Guilin University of Electronic Technology), Guilin Gunangxi 541004, China
Abstract:Concerning the problem that clustering results are usually required to be balanced in practical applications, a Balanced Clustering algorithm based on Simulated annealing and Greedy strategy (BCSG) was proposed. The algorithm includes two steps:Simulated Annealing Clustering Initialization (SACI) and Balanced Clustering based on Greedy Strategy (BCGS) to improve clustering effectiveness with less time cost. First of all, K suitable data points of data set were located based on simulated annealing as the initial point of balanced clustering, and the nearest data points to each center point were added into the cluster where it belongs in stages greedily until the cluster size reach the upper limit. A series of experiments carried on six UCI real datasets and two public image datasets show that the balance degree can be increased by more than 50 percentage points compared with Fuzzy C-Means when the number of clusters is large, and the accuracy of clustering result is increased by 8 percentage points compared with Balanced K-Means and BCLS (Balanced Clustering with Least Square regression) which have good balanced clustering performance. Meanwhile, the time complexity of the BCSG is also lower, the running time is decreased by nearly 40 percentage points on large datasets compared with Balanced K-Means. BCSG has better clustering effectiveness with less time cost than other balanced clustering algorithms.
Keywords:balanced clustering                                                                                                                        greedy algorithm                                                                                                                        simulated annealing                                                                                                                        approximate algorithm                                                                                                                        data mining
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号