首页 | 本学科首页   官方微博 | 高级检索  
     

基于双重遗传的聚类分析算法研究
引用本文:文静,曹妍,张琳,牟向伟.基于双重遗传的聚类分析算法研究[J].计算机工程与科学,2017,39(12):2320-2325.
作者姓名:文静  曹妍  张琳  牟向伟
作者单位:;1.大连海事大学交通运输管理学院
基金项目:辽宁省创新团队项目(LT2011007);辽宁省教育厅科技研究项目(L2014203);辽宁省社会科学规划基金(L14BGL012);中央高校基本科研业务费专项资金(3132015049,3132015050);中国博士后科学基金(2014M551063,2015M571292)
摘    要:针对影响k-means聚类效果的聚类数目和初始中心点两大因素,提出了基于双重遗传的kmeans算法。它用外层遗传算法控制聚类数目,用内层遗传算法控制聚类的初始中心点,并采用类间距离和类内距离以及二者之间的比值来评价聚类结果的好坏,在算法终止后,可同时求得较优的聚类数目和某聚类数目下的较优初始中心点。此外,根据内外层遗传算法的特殊性,采用不同的编码策略适应算法需求,为保留优质个体,采用精英个体保留策略。通过UCI数据集测试实例证明此算法有很好的实用性,对数据挖掘技术有一定参考价值。

关 键 词:双重遗传  聚类分析  k-means算法  分层编码  精英保留
收稿时间:2015-12-17
修稿时间:2017-12-25

A clustering analysis algorithm based on double genetic algorithm
WEN Jing,CAO Yan,ZHANG Lin,MU Xiang-wei.A clustering analysis algorithm based on double genetic algorithm[J].Computer Engineering & Science,2017,39(12):2320-2325.
Authors:WEN Jing  CAO Yan  ZHANG Lin  MU Xiang-wei
Affiliation:(College of Transportation Management,Dalian Maritime University,Dalian 116026,China)
Abstract:There are two major factors that affect the k-means clustering effect: the number of clustering and the initial choice of the centroids. We put forward an improved k-means algorithm based on the double genetic algorithm, which uses the outer sub-genetic algorithm to control the number of clustering, and the inner sub-genetic algorithm to control the initial choice of cluster centroids, and utilizes the intra-class distance and inter-lass distance as well as the ratio between them to evaluate the clustering results. We therefore can get both the optimal number of clustering and the corresponding optimal initial cluster centroids by this improved k-means method. In addition, given the specificity of the inner and outer sub-generic algorithms, the improved k-means algorithm uses two different encoding strategies, and in order to preserve excellent individuals, it also uses the elite individuals reserved strategy. Experiments on the UCI data set verify the effectiveness of the improved k-means algorithm and it has a reference value for data mining.
Keywords:double genetic  cluster analysis  k-means algorithm  layered coding  elitism preservation  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号