首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
K-means算法最佳聚类数确定方法   总被引:10,自引:0,他引:10  
K-means聚类算法是以确定的类数k为前提对数据集进行聚类的,通常聚类数事先无法确定。从样本几何结构的角度设计了一种新的聚类有效性指标,在此基础上提出了一种新的确定K-means算法最佳聚类数的方法。理论研究和实验结果验证了以上算法方案的有效性和良好性能。  相似文献   

2.
新的K-均值算法最佳聚类数确定方法   总被引:8,自引:0,他引:8       下载免费PDF全文
K-均值聚类算法是以确定的类数k和随机选定的初始聚类中心为前提对数据集进行聚类的。通常聚类数k事先无法确定,随机选定的初始聚类中心容易使聚类结果不稳定。提出了一种新的确定K-均值聚类算法的最佳聚类数方法,通过设定AP算法的参数,将AP算法产生的聚类数作为聚类数搜索范围的上界kmax,并通过选择合适的有效性指标Silhouette指标,以及基于最大最小距离算法思想设定初始聚类中心,分析聚类效果,确定最佳聚类数。仿真实验和分析验证了以上算法方案的可行性。  相似文献   

3.
The upper bound of the optimal number of clusters in fuzzy clustering   总被引:7,自引:0,他引:7  
The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤n~(1/n), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method.  相似文献   

4.
一种基于近邻传播算法的最佳聚类数确定方法   总被引:2,自引:0,他引:2  
在聚类分析中,决定聚类质量的关键是确定最佳聚类数,对此,从样本几何结构的角度定义了样本聚类距离和样本聚类离差距离,设计了一种新的聚类有效性指标.在此基础上,提出一种基于近邻传播算法确定样本最佳聚类数的方法.理论研究和实验结果表明,所提出的指标和方法能够有效地对聚类结果进行评估,适合于确定样本的最佳聚类数.  相似文献   

5.
基于近邻传播算法的最佳聚类数确定方法比较研究   总被引:2,自引:0,他引:2  
在聚类分析中,决定聚类质量的关键是确定最佳聚类数.提出采用聚类效果较好的近邻传播聚类算法对样本进行聚类,运用6种聚类有效性指标分别对聚类结果进行有效性分析,以确定最佳聚类数.具体分析了这些有效性指标,并改进了IGP指标确定最佳聚类数的方法.针对8个数据集,通过实验比较这些指标的性能.分析和实验结果表明,基于近邻传播聚类...  相似文献   

6.
针对模糊C-均值聚类算法偏好发现球形簇,以及对孤立点非常敏感的问题,提出了密集簇中心二次模糊聚类算法,其中引入聚类有效性度量函数,并进行了有效的孤立点处理,最终的模糊簇由多个代表点共同表示,故算法可有效发现数据集中的自然簇数目,对簇的大小和形状没有偏好性,且在孤立点的处理上具有较好的健壮性.另外,随机采样过程方便地实现了上述算法在大型数据集上的扩展;与模糊C-均值聚类算法的实验结果比较也表明了该算法的优越性.  相似文献   

7.
An important goal in cluster analysis is the internal validation of results using an objective criterion. Of particular relevance in this respect is the estimation of the optimum number of clusters capturing the intrinsic structure of your data. This paper proposes a method to determine this optimum number based on the evaluation of fuzzy partition stability under bootstrap resampling. The method is first characterized on synthetic data with respect to hyper-parameters, like the fuzzifier, and spatial clustering parameters, such as feature space dimensionality, clusters degree of overlap, and number of clusters. The method is then validated on experimental datasets. Furthermore, the performance of the proposed method is compared to that obtained using a number of traditional fuzzy validity rules based on the cluster compactness-to-separation criteria. The proposed method provides accurate and reliable results, and offers better generalization capabilities than the classical approaches.  相似文献   

8.
Estimation of the number of clusters and influence zones   总被引:2,自引:0,他引:2  
Whereas estimating the number of clusters is directly involved in the first steps of unsupervised classification procedures, the problem still remains topical. In our attempt to propose a solution, we focalize on procedures that do not make any assumptions on the cluster shapes. Indeed the classification approach we use is based on the estimation of the probability density function (PDF) using the Parzen–Rosenblatt method. The modes of the PDF lead to the construction of influence zones which are intrinsically related to the number of clusters. In this paper, using different sizes of kernel and different samplings of the data set, we study the effects they imply on the relation between influence zones and the number of clusters. This ends up in a proposal of a method for counting the clusters. It is illustrated in simulated conditions and then applied on experimental results chosen from the field of multi-component image segmentation.  相似文献   

9.
Selection of the number of clusters via the bootstrap method   总被引:1,自引:0,他引:1  
Here the problem of selecting the number of clusters in cluster analysis is considered. Recently, the concept of clustering stability, which measures the robustness of any given clustering algorithm, has been utilized in Wang (2010) for selecting the number of clusters through cross validation. In this paper, an estimation scheme for clustering instability is developed based on the bootstrap, and then the number of clusters is selected so that the corresponding estimated clustering instability is minimized. The proposed selection criterion’s effectiveness is demonstrated on simulations and real examples.  相似文献   

10.
模糊聚类是模式识别、机器学习和图像处理等领域的重要研究内容。模糊C-均值聚类算法是最常用的模糊聚类实现算法,该算法需要预先给定聚类数才能对数据集进行聚类。提出了一种新的聚类有效性指标,对聚类结果进行有效性验证。该指标从划分熵、隶属度、几何结构角度,定义了紧凑度、分离度、重叠度三个重要特征测量。在此基础上,提出了一种最佳聚类数确定方法。将新聚类有效性指标和传统有效性指标在6个人工数据集和3个真实数据集进行实验验证。实验结果表明,所提出的指标和方法能够有效地对聚类结果进行评估,适合确定样本的最佳聚类数。  相似文献   

11.
We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.  相似文献   

12.
利用FCM求解最佳聚类数的算法   总被引:2,自引:0,他引:2  
利用FCM求解最佳聚类数的算法中,每次应用FCM算法都要重新初始化类中心,而FCM算法对初始类中心敏感,这样使得利用FCM求解最佳聚类数的算法很不稳定。对该算法进行了改进,提出了一个合并函数,使得(c-1)类的类中心依赖于类的类中心。仿真实验表明:新的算法稳定性好,且运算速度明显比旧的算法要快。  相似文献   

13.
一种新的基于动态最优簇数目的WSN分簇协议   总被引:3,自引:1,他引:3  
何国圆  陈涤 《计算机应用》2008,28(11):2778-2780
针对低功耗自适应分簇(LEACH)协议不足,提出一种新的分簇协议,称为动态最优簇数目(DONC)分簇协议。在分簇阶段,它能够根据网络中剩余节点个数来确定最优簇数目而不是固定值,并在簇首选择中充分考虑节点能量和地理位置因素;在传输阶段,采用改进的簇首链式转发。仿真表明,协议能保证簇数目始终保持最优状态,并且簇首在网络中均匀分布,有效延长网络的生存期。  相似文献   

14.
目前说话人聚类时将说话人分割后的语音段作为初始类,直接对这些数量庞大语音段进行聚类的计算量非常大。为了降低说话人聚类时的计算量,提出一种面向说话人聚类的初始类生成方法。提取说话人分割后语音段的特征参数及特征参数的质心,结合层次聚类法和贝叶斯信息准则,对语音段进行具有宽松停止准则的“预聚类”,生成初始类。与直接对说话人分割后的语音段进行聚类的方法相比,该方法能在保持原有聚类性能的情况下,减少40.04%的计算时间;在允许聚类性能略有下降的情形下,减少60.03%以上的计算时间。  相似文献   

15.
确定数据集的最佳聚类数是聚类研究中的一个重要难题。为了更有效地确定数据集的最佳聚类数,该文提出了通过改进K-means算法并结合一个不依赖于具体算法的有效性指标Q(c)对数据集的最佳聚类数进行确定的方法。理论分析和实验结果证明了该方法具有良好的性能和有效性。  相似文献   

16.
In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of the data. The clusters can be of any size, shape or convexity as long as they possess the property of symmetry. Here the membership values of points to different clusters are computed using the newly proposed point symmetry based distance. A variable number of cluster centers are encoded in the chromosomes. A new fuzzy symmetry based cluster validity index, FSym-index is first proposed here and thereafter it is utilized to measure the fitness of the chromosomes. The proposed index can detect non-convex, as well as convex-non-hyperspherical partitioning with variable number of clusters. It is mathematically justified via its relationship to a well-defined hard cluster validity function: the Dunn’s index, for which the condition of uniqueness has already been established. The results of the Fuzzy-VGAPS are compared with those obtained by seven other algorithms including both fuzzy and crisp methods on four artificial and four real-life data sets. Some real-life applications of Fuzzy-VGAPS to automatically cluster the gene expression data as well as segmenting the magnetic resonance brain image with multiple sclerosis lesions are also demonstrated.  相似文献   

17.
Competitive learning approaches with individual penalization or cooperation mechanisms have the attractive ability of automatic cluster number selection in unsupervised data clustering. In this paper, we further study these two mechanisms and propose a novel learning algorithm called Cooperative and Penalized Competitive Learning (CPCL), which implements the cooperation and penalization mechanisms simultaneously in a single competitive learning process. The integration of these two different kinds of competition mechanisms enables the CPCL to locate the cluster centers more quickly and be insensitive to the number of seed points and their initial positions. Additionally, to handle nonlinearly separable clusters, we further introduce the proposed competition mechanism into kernel clustering framework. Correspondingly, a new kernel-based competitive learning algorithm which can conduct nonlinear partition without knowing the true cluster number is presented. The promising experimental results on real data sets demonstrate the superiority of the proposed methods.  相似文献   

18.
We propose an internal cluster validity index for a fuzzy c-means algorithm which combines a mathematical model for the fuzzy c-partition and a heuristic search for the number of clusters in the data. Our index resorts to information theoretic principles, and aims to assess the congruence between such a model and the data that have been observed. The optimal cluster solution represents a trade-off between discrepancy and the complexity of the underlying fuzzy c-partition. We begin by testing the effectiveness of the proposed index using two sets of synthetic data, one comprising a well-defined cluster structure and the other containing only noise. Then we use datasets arising from real life problems. Our results are compared to those provided by several available indices and their goodness is judged by an external measure of similarity. We find substantial evidence supporting our index as a credible alternative to the cluster validation problem, especially when it concerns structureless data.  相似文献   

19.
核模糊C均值算法的聚类有效性研究   总被引:12,自引:0,他引:12  
针对核模糊C均值聚类(Kemelized Fuzzy C-Means,KFCM)算法的有效性评价,以核非线性映射为工具,将原空间中的六个著名有效性指标推广到高维特征空间,得到其对应的核化形式,并通过数值比较实验考察这些核化指标的性能及其对高斯核宽度β和模糊指数m的敏感特性。结果表明,在所考察的指标中,著名的Xie-Beni指标VXB及其改进指标VK的核化版本具有最好的性能和可靠性,可优先作为KFCM聚类算法的有效性准则。  相似文献   

20.
Hierarchical clustering algorithms provide a set of nested partitions called a cluster hierarchy. Since the hierarchy is usually too complex it is reduced to a single partition by using cluster validity indices. We show that the classical method is often not useful and we propose SEP, a new method that efficiently searches in an extended partition set. Furthermore, we propose a new cluster validity index, COP, since many of the commonly used indices cannot be used with SEP. Experiments performed with 80 synthetic and 7 real datasets confirm that SEP/COP is superior to the method currently used and furthermore, it is less sensitive to noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号