首页 | 本学科首页   官方微博 | 高级检索  
     

云计算下基于改进遗传算法的聚类融合算法
引用本文:徐占洋,郑克长. 云计算下基于改进遗传算法的聚类融合算法[J]. 计算机应用, 2018, 38(2): 458-463. DOI: 10.11772/j.issn.1001-9081.2017071749
作者姓名:徐占洋  郑克长
作者单位:南京信息工程大学 计算机与软件学院, 南京 210044
基金项目:国家自然科学基金资助项目(61572259)。
摘    要:针对无监督聚类缺少数据分类等先验信息、基聚类的准确性受聚类算法影响以及一般聚类融合算法空间复杂度高的问题,提出一种基于改进遗传算法的聚类融合算法(CEIGA);同时针对传统聚类融合算法已经不能满足大规模数据处理对于时间的要求的问题,提出一种云计算下使用Hadoop平台的基于改进遗传算法的并行聚类融合算法(PCEIGA)。首先,基聚类生成机制产生的基聚类划分在完成簇标签转化后进行基因编码作为遗传算法的初始种群。其次,通过改进遗传算法的选择算子,保证基聚类的多样性;再根据改进的选择算子对染色体进行交叉和变异操作并使用精英策略得到下一代种群,保证基聚类的准确性。如此循环,使聚类融合最终结果达到全局最优,提高算法准确度。通过设计两个MapReduce过程并加入Combine过程减少节点通信,提高算法运行效率。最后,在UCI数据集上比较了CEIGA、PCEIGA和四个先进的聚类融合算法。实验结果表明,与先进的聚类融合算法相比,CEIGA性能最好;而PCEIGA能在不影响聚类结果准确度的前提下明显降低算法运行时间,提高算法效率。

关 键 词:云计算  遗传算法  聚类融合  选择算子  并行  
收稿时间:2017-07-18
修稿时间:2017-09-10

Clustering ensemble algorithms based on improved genetic algorithm in cloud computing
XU Zhanyang,ZHENG Kezhang. Clustering ensemble algorithms based on improved genetic algorithm in cloud computing[J]. Journal of Computer Applications, 2018, 38(2): 458-463. DOI: 10.11772/j.issn.1001-9081.2017071749
Authors:XU Zhanyang  ZHENG Kezhang
Affiliation:School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing Jiangsu 210044, China
Abstract:Considering the problem that unsupervised clustering lacks priori information about data classification, the accuracy of base clustering is affected by clustering algorithm and general clustering ensemble algorithm has high space complexity, a Clustering Ensemble algorithm based on Improved Genetic Algorithm (CEIGA) was proposed. Focusing on the issue that traditional clustering ensemble algorithms can not meet the time requirement of large scale data processing, a Parallel Clustering Ensemble algorithm based on Improved Genetic Algorithm (PCEIGA) using Hadoop for cloud computing was also proposed. Firstly, the base clustering partitions produced by base clustering generation mechanism were encoded as the initial population of the improved Genetic Algorithm (GA) after changing cluster labels. Secondly, the diversity of base clustering was ensured by improving the selection operator of GA. According to the improved selection operator, crossover operation and mutation operation were adopted on chromosomes and the next generation population was gotten by elitist strategy to ensure the accuracy of base clustering. By this way, the final results of clustering ensemble reached global optimum and the accuracy of the algorithm was improved. To improve the efficiency of the proposed algorithms, two MapReduce processes were designed and one Combine process was added to reduce the communication among nodes. Finally, CEIGA, PCEIGA and four advanced clustering ensemble algorithms were compared on UCI data sets. The experimental results show that CEIGA performs better than other advanced clustering ensemble algorithms, and PCEIGA can significantly reduce running time and improve algorithm efficiency without decreasing the accuracy of clustering results.
Keywords:cloud computing   Genetic Algorithm (GA)   clustering ensemble   selection operator   parallel
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号