首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Robust clustering by pruning outliers   总被引:1,自引:0,他引:1  
In many applications of C-means clustering, the given data set often contains noisy points. These noisy points will affect the resulting clusters, especially if they are far away from the data points. In this paper, we develop a pruning approach for robust C-means clustering. This approach identifies and prunes the outliers based on the sizes and shapes of the clusters so that the resulting clusters are least affected by the outliers. The pruning approach is general, and it can improve the robustness of many existing C-means clustering methods. In particular, we apply the pruning approach to improve the robustness of hard C-means clustering, fuzzy C-means clustering, and deterministic-annealing C-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. In addition, we integrate the pruning approach with the fuzzy approach and the possibilistic approach to design two new algorithms for robust C-means clustering. The numerical results demonstrate that the pruning approach can achieve good robustness.  相似文献   

2.
Traditionally, prototype-based fuzzy clustering algorithms such as the Fuzzy C Means (FCM) algorithm have been used to find “compact” or “filled” clusters. Recently, there have been attempts to generalize such algorithms to the case of hollow or “shell-like” clusters, i.e., clusters that lie in subspaces of feature space. The shell clustering approach provides a powerful means to solve the hitherto unsolved problem of simultaneously fitting multiple curves/surfaces to unsegmented, scattered and sparse data. In this paper, we present several fuzzy and possibilistic algorithms to detect linear and quadric shell clusters. We also introduce generalizations of these algorithms in which the prototypes represent sets of higher-order polynomial functions. The suggested algorithms provide a good trade-off between computational complexity and performance, since the objective function used in these algorithms is the sum of squared distances, and the clustering is sensitive to noise and outliers. We show that by using a possibilistic approach to clustering, one can make the proposed algorithms robust  相似文献   

3.
传统的快速聚类算法大多基于模糊C均值算法(Fuzzy C-means,FCM),而FCM对初始聚类中心敏感,对噪音数据敏感并且容易收敛到局部极小值,因而聚类准确率不高。可能性C-均值聚类较好地解决了FCM对噪声敏感的问题,但容易产生一致性聚类。将FCM和可能性C-均值聚类结合的聚类算法较好地解决了一致性聚类问题。为进一步提高算法收敛速度和鲁棒性,提出一种基于核的快速可能性聚类算法。该方法引入核聚类的思想,同时使用样本方差对目标函数中参数η进行优化。标准数据集和人造数据集的实验结果表明这种基于核的快速可能性聚类算法提高了算法的聚类准确率,加快了收敛速度。  相似文献   

4.
Fuzzy clustering algorithms are becoming the major technique in cluster analysis. In this paper, we consider the fuzzy clustering based on objective functions. They can be divided into two categories: possibilistic and probabilistic approaches leading to two different function families depending on the conditions required to state that fuzzy clusters are a fuzzy c-partition of the input data. Recently, we have presented in Menard and Eboueya (Fuzzy Sets and Systems, 27, to be published) an axiomatic derivation of the Possibilistic and Maximum Entropy Inference (MEI) clustering approaches, based upon an unifying principle of physics, that of extreme physical information (EPI) defined by Frieden (Physics from Fisher information, A unification, Cambridge University Press, Cambridge, 1999). Here, using the same formalism, we explicitly give a new criterion in order to provide a theoretical justification of the objective functions, constraint terms, membership functions and weighting exponent m used in the probabilistic and possibilistic fuzzy clustering. Moreover, we propose an unified framework including the two procedures. This approach is inspired by the work of Frieden and Plastino and Plastino and Miller (Physics A 235, 577) extending the principle of extremal information in the framework of the non-extensive thermostatistics. Then, we show how, with the help of EPI, one can propose extensions of the FcM and Possibilistic algorithms.  相似文献   

5.
A Possibilistic Fuzzy c-Means Clustering Algorithm   总被引:20,自引:0,他引:20  
In 1997, we proposed the fuzzy-possibilistic c-means (FPCM) model and algorithm that generated both membership and typicality values when clustering unlabeled data. FPCM constrains the typicality values so that the sum over all data points of typicalities to a cluster is one. The row sum constraint produces unrealistic typicality values for large data sets. In this paper, we propose a new model called possibilistic-fuzzy c-means (PFCM) model. PFCM produces memberships and possibilities simultaneously, along with the usual point prototypes or cluster centers for each cluster. PFCM is a hybridization of possibilistic c-means (PCM) and fuzzy c-means (FCM) that often avoids various problems of PCM, FCM and FPCM. PFCM solves the noise sensitivity defect of FCM, overcomes the coincident clusters problem of PCM and eliminates the row sum constraints of FPCM. We derive the first-order necessary conditions for extrema of the PFCM objective function, and use them as the basis for a standard alternating optimization approach to finding local minima of the PFCM objective functional. Several numerical examples are given that compare FCM and PCM to PFCM. Our examples show that PFCM compares favorably to both of the previous models. Since PFCM prototypes are less sensitive to outliers and can avoid coincident clusters, PFCM is a strong candidate for fuzzy rule-based system identification.  相似文献   

6.
In fuzzy clustering, the fuzzy c-means (FCM) clustering algorithm is the best known and used method. Since the FCM memberships do not always explain the degrees of belonging for the data well, Krishnapuram and Keller proposed a possibilistic approach to clustering to correct this weakness of FCM. However, the performance of Krishnapuram and Keller's approach depends heavily on the parameters. In this paper, we propose another possibilistic clustering algorithm (PCA) which is based on the FCM objective function, the partition coefficient (PC) and partition entropy (PE) validity indexes. The resulting membership becomes the exponential function, so that it is robust to noise and outliers. The parameters in PCA can be easily handled. Also, the PCA objective function can be considered as a potential function, or a mountain function, so that the prototypes of PCA can be correspondent to the peaks of the estimated function. To validate the clustering results obtained through a PCA, we generalized the validity indexes of FCM. This generalization makes each validity index workable in both fuzzy and possibilistic clustering models. By combining these generalized validity indexes, an unsupervised possibilistic clustering is proposed. Some numerical examples and real data implementation on the basis of the proposed PCA and generalized validity indexes show their effectiveness and accuracy.  相似文献   

7.
Fuzzy clustering is a widely applied method for extracting the underlying models within data. It has been applied successfully in many real-world applications. Fuzzy c-means is one of the most popular fuzzy clustering methods because it produces reasonable results and its implementation is straightforward. One problem with all fuzzy clustering algorithms such as fuzzy c-means is that some data points which are assigned to some clusters have low membership values. It is possible that many samples may be assigned to a cluster with low-confidence. In this paper, an efficient and noise-aware implementation of support vector machines, namely relaxed constraints support vector machines, is used to solve the mentioned problem and improve the performance of fuzzy c-means algorithm. First, fuzzy c-means partitions data into appropriate clusters. Then, the samples with high membership values in each cluster are selected for training a multi-class relaxed constraints support vector machine classifier. Finally, the class labels of the remaining data points are predicted by the latter classifier. The performance of the proposed clustering method is evaluated by quantitative measures such as cluster entropy and Minkowski scores. Experimental results on real-life data sets show the superiority of the proposed method.  相似文献   

8.
模糊C均值(FCM)算法是数据聚类分析的主要算法。但在嘈杂环境下,对于抽样大小不一的聚类,数目越多准确性越低,上述弊端可通过替代性FCM(AFCM)的高斯内核映射来解决。鉴于AFCM的不足,提出了针对模糊C均值聚类的广义洛伦兹内核函数。利用该算法对鸢尾数据库进行聚类,将其划分成山鸢尾、变色鸢尾和维吉尼亚鸢尾3类。实验结果表明,广义洛伦兹模糊C均值(GLFCM)可实现对离群聚类和大小不等的聚类数据的分类,其结果优于K均值、FCM、替代性C均值(AFCM)、Gustafson-Kessel(GK)和 Gath-Geva(GG)方法,收敛迭代次数比AFCM的更少,其分区索引(SC)效果也好于其他方法。  相似文献   

9.
In this paper, we make an effort to overcome the sensitivity of traditional clustering algorithms to noisy data points (noise and outliers). A novel pruning method, in terms of information theory, is therefore proposed to phase out noisy points for robust data clustering. This approach identifies and prunes the noisy points based on the maximization of mutual information against input data distributions such that the resulting clusters are least affected by noise and outliers, where the degree of robustness is controlled through a separate parameter to make a trade-off between rejection of noisy points and optimal clustered data. The pruning approach is general, and it can improve the robustness of many existing traditional clustering methods. In particular, we apply the pruning approach to improve the robustness of fuzzy c-means clustering and its extensions, e.g., fuzzy c-spherical shells clustering and kernel-based fuzzy c-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. The effectiveness of the proposed pruning approach is supported by experimental results.  相似文献   

10.
一种改进的可能模糊聚类算法*   总被引:2,自引:0,他引:2  
通过分析FCM、PCM、IPCM和PFCM等流行的聚类算法和它们在噪声环境下所面临的问题,提出一种概率模糊聚类新算法(SWPFCM),该算法结合样本加权和一种适用于噪音环境下的初始化聚类中心的方法,可以有效地消除噪声对聚类结果的影响。实验表明,SWPFCM算法具有处理大量噪声数据的能力,但对于没有噪声或噪声很少时,效果不明显,当目标样本集中出现噪声时,使用SWPFCM算法聚类将会得到满意的聚类结果。  相似文献   

11.
The fuzzy c-means (FCM) and possibilistic c-means (PCM) algorithms have been utilized in a wide variety of fields and applications. Although many methods are derived from the FCM and PCM for clustering various types of spatial data, relational clustering has received much less attention. Most fuzzy clustering methods can only process the spatial data (e.g., in Euclidean space) instead of the nonspatial data (e.g., where the Pearson's correlation coefficient is used as similarity measure). In this paper, we propose a novel clustering method, similarity-based PCM (SPCM), which is fitted for clustering nonspatial data without requesting users to specify the cluster number. The main idea behind the SPCM is to extend the PCM for similarity-based clustering applications by integration with the mountain method. The SPCM has the merit that it can automatically generate clustering results without requesting users to specify the cluster number. Through performance evaluation on real and synthetic data sets, the SPCM method is shown to perform excellently for similarity-based clustering in clustering quality, even in a noisy environment with outliers. This complements the deficiency of other fuzzy clustering methods when applied to similarity-based clustering applications.  相似文献   

12.
支持向量机算法对噪声点和异常点是敏感的,为了解决这个问题,人们提出了模糊支持向量机,但其中的模糊隶属度函数需要人为设置。提出基于模糊分割的支持向量机分类器。在该算法中,首先根据聚类有效性用模糊c-均值聚类分别对训练集中的正负类数据聚类;然后,选择距离最近的c个聚类对构成c个二分类问题;最后,对c个二分类器用加权平均策略得到最终分类结果。为了验证所提算法的有效性,对三个UCI数据集进行了数值实验,结果表明,该算法能有效提高带噪声点和异常点数据集分类的预测精度。  相似文献   

13.
In the fuzzy c-means (FCM) clustering algorithm, almost none of the data points have a membership value of 1. Moreover, noise and outliers may cause difficulties in obtaining appropriate clustering results from the FCM algorithm. The embedding of FCM into switching regressions, called the fuzzy c-regressions (FCRs), still has the same drawbacks as FCM. In this paper, we propose the alpha-cut implemented fuzzy clustering algorithms, referred to as FCMalpha, which allow the data points being able to completely belong to one cluster. The proposed FCMalpha algorithms can form a cluster core for each cluster, where data points inside a cluster core will have a membership value of 1 so that it can resolve the drawbacks of FCM. On the other hand, the fuzziness index m plays different roles for FCM and FCMalpha. We find that the clustering results obtained by FCMalpha are more robust to noise and outliers than FCM when a larger m is used. Moreover, the cluster cores generated by FCMalpha are workable for various data shape clusters, so that FCMalpha is very suitable for embedding into switching regressions. The embedding of FCMalpha into switching regressions is called FCRalpha. The proposed FCRalpha provides better results than FCR for environments with noise or outliers. Numerical examples show the robustness and the superiority of our proposed methods.  相似文献   

14.
目的 为了进一步提高噪声图像分割的抗噪性和准确性,提出一种结合类内距离和类间距离的改进可能聚类算法并将其应用于图像分割。方法 该算法避免了传统可能性聚类分割算法中仅仅考虑以样本点到聚类中心的距离作为算法的测度,将类内距离与类间距离相结合作为算法的新测度,即考虑了类内紧密程度又考虑了类间离散程度,以便对不同的聚类结构有较强的稳定性和更好的抗噪能力,并且将直方图融入可能模糊聚类分割算法中提出快速可能模糊聚类分割算法,使其对各种较复杂图像的分割具有即时性。结果 通过人工合成图像和实际遥感图像分割测试结果表明,本文改进可能聚类算法是有效的,其分割轮廓清晰,分类准确且噪声较小,其误分率相比其他算法至少降低了2个百分点,同时能获得更满意的分割效果。结论 针对模糊C-均值聚类分割算法和可能性聚类分割算法对于背景和目标颜色相近的图像分类不准确的缺陷,将类内距离与类间距离相结合作为算法的测度有效的解决了图像分割归类问题,并且结合直方图提出快速可能模糊聚类分割算法使其对于大篇幅复杂图像也具有适用性。  相似文献   

15.
相比于k-means算法,模糊C均值(FCM)通过引入模糊隶属度,考虑不同数据簇之间的相互作用,进而避免了聚类中心趋同性问题.然而模糊隶属度具有拖尾和翘尾的结构特征,因此使得FCM算法对噪声点和孤立点很敏感;此外,由于FCM算法倾向于将各数据簇均等分,因此算法对数据簇大小也很敏感,对非平衡数据簇聚类效果不佳.针对这些问...  相似文献   

16.
一种协同的可能性模糊聚类算法   总被引:1,自引:0,他引:1  
模糊C-均值聚类(FCM)对噪声数据敏感和可能性C-均值聚类(PCM)对初始中心非常敏感易导致一致性聚类。协同聚类算法利用不同特征子集之间的协同关系并与其他算法相结合,可提高原有的聚类性能。对此,在可能性C-均值聚类算法(PCM)基础上将其与协同聚类算法相结合,提出一种协同的可能性C-均值模糊聚类算法(C-FCM)。该算法在改进的PCM的基础上,提高了对数据集的聚类效果。在对数据集Wine和Iris进行测试的结果表明,该方法优于PCM算法,说明该算法的有效性。  相似文献   

17.
A generalized form of Possibilistic Fuzzy C-Means (PFCM) algorithm (GPFCM) is presented for clustering noisy data. A function of distance is used instead of the distance itself to damp noise contributions. It is shown that when the data are highly noisy, GPFCM finds accurate cluster centers but FCM (Fuzzy C-Means), PCM (Possibilistic C-Means), and PFCM algorithms fail. FCM, PCM, and PFCM yield inaccurate cluster centers when clusters are not of the same size or covariance norm is used, whereas GPFCM performs well for both of the cases even when the data are noisy. It is shown that generalized forms of FCM and PCM (GFCM and GPCM) are also more accurate than FCM and PCM. A measure is defined to evaluate performance of the clustering algorithms. It shows that average error of GPFCM and its simplified forms are about 80% smaller than those of FCM, PCM, and PFCM. However, GPFCM demands higher computational costs due to nonlinear updating equations. Three cluster validity indices are introduced to determine number of clusters in clean and noisy datasets. One of them considers compactness of the clusters; the other considers separation of the clusters, and the third one considers both separation and compactness. Performance of these indices is confirmed to be satisfactory using various examples of noisy datasets.  相似文献   

18.
A possibilistic approach was initially proposed for c-means clustering. Although the possibilistic approach is sound, this algorithm tends to find identical clusters. To overcome this shortcoming, a possibilistic Fuzzy c-means algorithm (PFCM) was proposed which produced memberships and possibilities simultaneously, along with the cluster centers. PFCM addresses the noise sensitivity defect of Fuzzy c-means (FCM) and overcomes the coincident cluster problem of possibilistic c-means (PCM). Here we propose a new model called Kernel-based hybrid c-means clustering (KPFCM) where PFCM is extended by adopting a Kernel induced metric in the data space to replace the original Euclidean norm metric. Use of Kernel function makes it possible to cluster data that is linearly non-separable in the original space into homogeneous groups in the transformed high dimensional space. From our experiments, we found that different Kernels with different Kernel widths lead to different clustering results. Thus a key point is to choose an appropriate Kernel width. We have also proposed a simple approach to determine the appropriate values for the Kernel width. The performance of the proposed method has been extensively compared with a few state of the art clustering techniques over a test suit of several artificial and real life data sets. Based on computer simulations, we have shown that our model gives better results than the previous models.  相似文献   

19.
Generative mixture models (MMs) provide one of the most popular methodologies for unsupervised data clustering. MMs are formulated on the basis of the assumption that each observation derives from (belongs to) a single cluster. However, in many applications, data may intuitively belong to multiple classes, thus rendering the single-cluster assignment assumptions of MMs irrelevant. Furthermore, even in applications where a single-cluster data assignment is required, the induced multinomial allocation of the modeled data points to the clusters derived by a MM, imposing the constraint that the membership probabilities of a data point across clusters sum to one, makes MMs very vulnerable to the presence of outliers in the clustered data sets, and renders them ineffective in discriminating between cases of equal evidence or ignorance. To resolve these issues, in this paper we introduce a possibilistic formulation of MMs. Possibilistic clustering is a methodology that yields possibilistic data partitions, with the obtained membership values being interpreted as degrees of possibility (compatibilities) of the data points with respect to the various clusters. We provide an efficient maximum-likelihood fitting algorithm for the proposed model, and we conduct an objective evaluation of its efficacy using benchmark data.  相似文献   

20.
Fuzzy c-means clustering with spatial constraints is considered as suitable algorithm for data clustering or data analyzing. But FCM has still lacks enough robustness to employ with noise data, because of its Euclidean distance measure objective function for finding the relationship between the objects. It can only be effective in clustering ‘spherical’ clusters, and it may not give reasonable clustering results for “non-compactly filled” spherical data such as “annular-shaped” data. This paper realized the drawbacks of the general fuzzy c-mean algorithm and it tries to introduce an extended Gaussian version of fuzzy C-means by replacing the Euclidean distance in the original object function of FCM. Firstly, this paper proposes initial kernel version of fuzzy c-means to aim at simplifying its computation and then extended it to extended Gaussian kernel version of fuzzy c-means. It derives an effective method to construct the membership matrix for objects, and it derives a robust method for updating centers from extended Gaussian version of fuzzy C-means. Furthermore, this paper proposes a new prototypes learning method and it obtains initial cluster centers using new mathematical initialization centers for the new effective objective function of fuzzy c-means, so that this paper tries to minimize the iteration of algorithms to obtain more accurate result. Initial experiment will be done with an artificially generated data to show how effectively the new proposed Gaussian version of fuzzy C-means works in obtaining clusters, and then the proposed methods can be implemented to cluster the Wisconsin breast cancer database into two clusters for the classes benign and malignant. To show the effective performance of proposed fuzzy c-means with new initialization of centers of clusters, this work compares the results with results of recent fuzzy c-means algorithm; in addition, it uses Silhouette method to validate the obtained clusters from breast cancer datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号