首页 | 本学科首页   官方微博 | 高级检索  
     

基于可变染色体长度的遗传K均值聚类算法
引用本文:严宇平,肖菁.基于可变染色体长度的遗传K均值聚类算法[J].计算机工程与设计,2008,29(14).
作者姓名:严宇平  肖菁
作者单位:1. 中山大学软件学院,广东广州,510275
2. 中山大学广东省信息安全技术重点实验室,广东广州510275;中山大学计算机科学系,广东广州510275
基金项目:中山大学青年教师基金项目
摘    要:针对传统K-均值聚类算法需要事先确定聚类数,以及对初始质心的选择具有敏感性,从而容易陷入局部极值点的缺点,使用了一种基于可变染色体编码长度的遗传算法对传统K-均值聚类进行改进.该算法可以在事先不确定K值的情况下,通过多次的选择、交叉.变异的遗传操作,最终得到最优的聚类数,以及最优的初始质心集.通过Reuters数据集的实验结果表明,基于该算法的聚类划分结果明显优于传统K-均值聚类算法,并且好过基于固定染色体编码长度遗传算法的K-均值聚类算法.

关 键 词:文本聚类  K-均值算法  遗传算法  可变染色体长度编码  Reuters数据集

K-Means text clustering algorithm based on modified variable string length genetic algorithm
YAN Yu-ping,XIAO Jing.K-Means text clustering algorithm based on modified variable string length genetic algorithm[J].Computer Engineering and Design,2008,29(14).
Authors:YAN Yu-ping  XIAO Jing
Affiliation:YAN Yu-ping1,XIAO Jing2,3 (1.School of Software Engineering,Sun Yat-Sen University,Guangzhou 510275,China,2.Guangdong Key Laboratory of Information Security Technology,3.Department of Computer Science,China)
Abstract:The traditional K-Means clustering algorithm has two drawbacks.One is the number of clusters must be known in advance and the other is that the clustering result is sensitive to the selection of initial cluster centroids and this may make the algorithm converge to the local optima.An improved K-Means based on modified variable string length genetic algorithm(KMMVGA) is used.Without knowing the exact number of clusters,and after several iterations of GA selection,GA crossover and GA mutation,the algorithm ca...
Keywords:text clustering  K-Means  genetic algorithm  modified variable string length  Reuters data set  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号