首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于几何概率的聚类有效性函数
引用本文:李晓雯,毛政元,李建微. 一种基于几何概率的聚类有效性函数[J]. 中国图象图形学报, 2008, 13(12): 2351-2356
作者姓名:李晓雯  毛政元  李建微
作者单位:福州大学福建省空间信息工程研究中心空间数据挖掘与信息共享教育部重点实验室,福州大学计算机图像图形研究所
摘    要:聚类有效性是聚类分析中尚未解决的基本问题,最佳聚类数的确定是聚类有效性问题中的主要研究内容。以几何概率为理论依据,针对2维数据集提出了一种新的聚类有效性函数,用于确定最佳聚类数。该函数利用2维数据集与2维离散点集之间存在的对应关系,以2维离散点集在特征空间中的分布特征为依据,测度对应数据集的聚类结构,思路直观、容易理解。测度过程中,将点集中的点两两相连生成一个线段集合保存点集的结构信息,通过比较线段集合中线段方向取值与完全随机条件下线段方向取值的相对大小,构造聚类有效性函数。实验结果表明,针对给定的样本数据集,生成该函数的曲线,再根据曲线的形态能够有效地确定2维数据集的最佳聚类数,指导聚类算法设计。

关 键 词:聚类有效性  几何概率  聚类分析  最佳聚类数
收稿时间:2007-09-04
修稿时间:2007-06-06

A Cluster Validity Function Based on Geometric Probability
LI Xiao Wen et al.,LI Xiao Wen et al. and LI Xiao Wen et al.. A Cluster Validity Function Based on Geometric Probability[J]. Journal of Image and Graphics, 2008, 13(12): 2351-2356
Authors:LI Xiao Wen et al.  LI Xiao Wen et al.  LI Xiao Wen et al.
Abstract:Determining optimum cluster number is a key research topic included in cluster validity,a fundamental unsolved problem in cluster analysis.In order to determine the optimum cluster number,this article proposes a new cluster validity function for two dimensional datasets theoretically based on geometric probability.The function uses of the relationship between a two dimensional dataset and the corresponding two dimensional discrete point set to measure the cluster structure of the dataset according to the distributive feature of the point set in the characteristic space.It is designed from the perspective of intuition and thus can be easily understood.During the process of measurement,the structure information of the point set has been stored in a line segment set generated by connecting each pair points in the point set.The cluster validity function is formed by comparing the values of line segment direction in the line segment set with those resulted from completely random condition.In the case study,it is testified that the pattern of the function curve generated with a given example dataset effectively enables the determination of the optimum cluster number of the dataset and supports the design of cluster algorithms.
Keywords:Cluster validity  Geometric probability  Cluster analysis  The optimum cluster number
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号