首页 | 本学科首页   官方微博 | 高级检索  
     

基于连通距离和连通强度的BIRCH改进算法
引用本文:樊仲欣,王兴,苗春生.基于连通距离和连通强度的BIRCH改进算法[J].计算机应用,2019,39(4):1027-1031.
作者姓名:樊仲欣  王兴  苗春生
作者单位:南京信息工程大学大气与环境实验教学中心,南京,210044;南京信大气象科技有限公司,南京,210044
摘    要:为解决利用层次方法的平衡迭代规约和聚类(BIRCH)算法聚类结果依赖于数据对象的添加顺序,且对非球状的簇聚类效果不好以及受簇直径阈值的限制每个簇只能包含数量相近的数据对象的问题,提出一种改进的BIRCH算法。该算法用描述数据对象个体间连通性的连通距离和连通强度阈值替代簇直径阈值,还将簇合并的步骤加入到聚类特征树的生成过程中。在自定义及iris、wine、 pendigits数据集上的实验结果表明,该算法比多阈值BIRCH、密度改进BIRCH等现有改进算法的聚类准确率更高,尤其在大数据集上比密度改进BIRCH准确率提高6个百分点,耗时降低61%。说明该算法能够适用于在线实时增量数据,可以识别非球形簇和体积不均匀簇,具有去噪功能,且时间和空间复杂度明显降低。

关 键 词:层次聚类  在线算法  BIRCH  聚类特征  聚类特征树
收稿时间:2018-08-30
修稿时间:2018-10-27

Improved BIRCH clustering algorithm based on connectivity distance and intensity
FAN Zhongxin,WANG Xing,MIAO Chunsheng.Improved BIRCH clustering algorithm based on connectivity distance and intensity[J].journal of Computer Applications,2019,39(4):1027-1031.
Authors:FAN Zhongxin  WANG Xing  MIAO Chunsheng
Affiliation:1. Experimental Teaching Center of Atmosphere and Environment, Nanjing University of Information Science & Technology, Nanjing Jiangsu 210044, China;2. Nanjing Xinda Meteorological Science and Technology Company Limited, Nanjing Jiangsu 210044, China
Abstract:Focusing on the issues that clustering results of Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) depend on the adding order of data objects, BIRCH has poor clustering effect on non-convex clusters, and each cluster of BIRCH can only contain a similar number of data objects because of the cluster diameter threshold, an improved BIRCH algorithm was proposed. In this algorithm, the cluster diameter threshold was replaced by connectivity distance and intensity threshold which described the connectivity between the data objects, and cluster merging step was added into the generation of cluster feature tree. Experimental result on custom and iris, wine, pendigits datasets show that the proposed algorithm has higher clustering accuracy than the existing improved algorithms such as multi-threshold BIRCH and density-improved BIRCH; especially on large datasets, the proposed algorithm has accuracy increased by 6 percentage points and running time reduced by 61% compared to density-improved BIRCH. The proposed algorithm can be applied to online real-time incremental data processing and identify non-convex clusters and clusters with uneven volume, has denoising function and significantly reduces time-complexity and space-complexity.
Keywords:hierarchical clustering  on-line algorithm  Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH)  Cluster Feature (CF)  Cluster Feature Tree (CF Tree)  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号