首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Generally, abnormal points (noise and outliers) cause cluster analysis to produce low accuracy especially in fuzzy clustering. These data not only stay in clusters but also deviate the centroids from their true positions. Traditional fuzzy clustering like Fuzzy C-Means (FCM) always assigns data to all clusters which is not reasonable in some circumstances. By reformulating objective function in exponential equation, the algorithm aggressively selects data into the clusters. However noisy data and outliers cannot be properly handled by clustering process therefore they are forced to be included in a cluster because of a general probabilistic constraint that the sum of the membership degrees across all clusters is one. In order to improve this weakness, possibilistic approach relaxes this condition to improve membership assignment. Nevertheless, possibilistic clustering algorithms generally suffer from coincident clusters because their membership equations ignore the distance to other clusters. Although there are some possibilistic clustering approaches that do not generate coincident clusters, most of them require the right combination of multiple parameters for the algorithms to work. In this paper, we theoretically study Possibilistic Exponential Fuzzy Clustering (PXFCM) that integrates possibilistic approach with exponential fuzzy clustering. PXFCM has only one parameter and not only partitions the data but also filters noisy data or detects them as outliers. The comprehensive experiments show that PXFCM produces high accuracy in both clustering results and outlier detection without generating coincident problems.  相似文献   

2.
Robust clustering by pruning outliers   总被引:1,自引:0,他引:1  
In many applications of C-means clustering, the given data set often contains noisy points. These noisy points will affect the resulting clusters, especially if they are far away from the data points. In this paper, we develop a pruning approach for robust C-means clustering. This approach identifies and prunes the outliers based on the sizes and shapes of the clusters so that the resulting clusters are least affected by the outliers. The pruning approach is general, and it can improve the robustness of many existing C-means clustering methods. In particular, we apply the pruning approach to improve the robustness of hard C-means clustering, fuzzy C-means clustering, and deterministic-annealing C-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. In addition, we integrate the pruning approach with the fuzzy approach and the possibilistic approach to design two new algorithms for robust C-means clustering. The numerical results demonstrate that the pruning approach can achieve good robustness.  相似文献   

3.
Fuzzy clustering algorithms are becoming the major technique in cluster analysis. In this paper, we consider the fuzzy clustering based on objective functions. They can be divided into two categories: possibilistic and probabilistic approaches leading to two different function families depending on the conditions required to state that fuzzy clusters are a fuzzy c-partition of the input data. Recently, we have presented in Menard and Eboueya (Fuzzy Sets and Systems, 27, to be published) an axiomatic derivation of the Possibilistic and Maximum Entropy Inference (MEI) clustering approaches, based upon an unifying principle of physics, that of extreme physical information (EPI) defined by Frieden (Physics from Fisher information, A unification, Cambridge University Press, Cambridge, 1999). Here, using the same formalism, we explicitly give a new criterion in order to provide a theoretical justification of the objective functions, constraint terms, membership functions and weighting exponent m used in the probabilistic and possibilistic fuzzy clustering. Moreover, we propose an unified framework including the two procedures. This approach is inspired by the work of Frieden and Plastino and Plastino and Miller (Physics A 235, 577) extending the principle of extremal information in the framework of the non-extensive thermostatistics. Then, we show how, with the help of EPI, one can propose extensions of the FcM and Possibilistic algorithms.  相似文献   

4.
A possibilistic approach to clustering   总被引:27,自引:0,他引:27  
The clustering problem is cast in the framework of possibility theory. The approach differs from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values can be interpreted as degrees of possibility of the points belonging to the classes, i.e., the compatibilities of the points with the class prototypes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data is constructed, and the membership and prototype update equations are derived from necessary conditions for minimization of the criterion function. The advantages of the resulting family of possibilistic algorithms are illustrated by several examples  相似文献   

5.
Many clustering models define good clusters as extrema of objective functions. Optimization of these models is often done using an alternating optimization (AO) algorithm driven by necessary conditions for local extrema. We abandon the objective function model in favor of a generalized model called alternating cluster estimation (ACE). ACE uses an alternating iteration architecture, but membership and prototype functions are selected directly by the user. Virtually every clustering model can be realized as an instance of ACE. Out of a large variety of possible instances of non-AO models, we present two examples: 1) an algorithm with a dynamically changing prototype function that extracts representative data and 2) a computationally efficient algorithm with hyperconic membership functions that allows easy extraction of membership functions. We illustrate these non-AO instances on three problems: a) simple clustering of plane data where we show that creating an unmatched ACE algorithm overcomes some problems of fuzzy c-means (FCM-AO) and possibilistic c-means (PCM-AO); b) functional approximation by clustering on a simple artificial data set; and c) functional approximation on a 12 input 1 output real world data set. ACE models work pretty well in all three cases  相似文献   

6.
目的 为了进一步提高噪声图像分割的抗噪性和准确性,提出一种结合类内距离和类间距离的改进可能聚类算法并将其应用于图像分割。方法 该算法避免了传统可能性聚类分割算法中仅仅考虑以样本点到聚类中心的距离作为算法的测度,将类内距离与类间距离相结合作为算法的新测度,即考虑了类内紧密程度又考虑了类间离散程度,以便对不同的聚类结构有较强的稳定性和更好的抗噪能力,并且将直方图融入可能模糊聚类分割算法中提出快速可能模糊聚类分割算法,使其对各种较复杂图像的分割具有即时性。结果 通过人工合成图像和实际遥感图像分割测试结果表明,本文改进可能聚类算法是有效的,其分割轮廓清晰,分类准确且噪声较小,其误分率相比其他算法至少降低了2个百分点,同时能获得更满意的分割效果。结论 针对模糊C-均值聚类分割算法和可能性聚类分割算法对于背景和目标颜色相近的图像分类不准确的缺陷,将类内距离与类间距离相结合作为算法的测度有效的解决了图像分割归类问题,并且结合直方图提出快速可能模糊聚类分割算法使其对于大篇幅复杂图像也具有适用性。  相似文献   

7.
8.
Traditionally, prototype-based fuzzy clustering algorithms such as the Fuzzy C Means (FCM) algorithm have been used to find “compact” or “filled” clusters. Recently, there have been attempts to generalize such algorithms to the case of hollow or “shell-like” clusters, i.e., clusters that lie in subspaces of feature space. The shell clustering approach provides a powerful means to solve the hitherto unsolved problem of simultaneously fitting multiple curves/surfaces to unsegmented, scattered and sparse data. In this paper, we present several fuzzy and possibilistic algorithms to detect linear and quadric shell clusters. We also introduce generalizations of these algorithms in which the prototypes represent sets of higher-order polynomial functions. The suggested algorithms provide a good trade-off between computational complexity and performance, since the objective function used in these algorithms is the sum of squared distances, and the clustering is sensitive to noise and outliers. We show that by using a possibilistic approach to clustering, one can make the proposed algorithms robust  相似文献   

9.
Uncertain Fuzzy Clustering: Interval Type-2 Fuzzy Approach to C-Means   总被引:1,自引:0,他引:1  
In many pattern recognition applications, it may be impossible in most cases to obtain perfect knowledge or information for a given pattern set. Uncertain information can create imperfect expressions for pattern sets in various pattern recognition algorithms. Therefore, various types of uncertainty may be taken into account when performing several pattern recognition methods. When one performs clustering with fuzzy sets, fuzzy membership values express assignment availability of patterns for clusters. However, when one assigns fuzzy memberships to a pattern set, imperfect information for a pattern set involves uncertainty which exist in the various parameters that are used in fuzzy membership assignment. When one encounters fuzzy clustering, fuzzy membership design includes various uncertainties (e.g., distance measure, fuzzifier, prototypes, etc.). In this paper, we focus on the uncertainty associated with the fuzzifier parameter m that controls the amount of fuzziness of the final C-partition in the fuzzy C-means (FCM) algorithm. To design and manage uncertainty for fuzzifier m, we extend a pattern set to interval type-2 fuzzy sets using two fuzzifiers m1 and m2 which creates a footprint of uncertainty (FOU) for the fuzzifier m. Then, we incorporate this interval type-2 fuzzy set into FCM to observe the effect of managing uncertainty from the two fuzzifiers. We also provide some solutions to type-reduction and defuzzification (i.e., cluster center updating and hard-partitioning) in FCM. Several experimental results are given to show the validity of our method  相似文献   

10.
A Possibilistic Fuzzy c-Means Clustering Algorithm   总被引:20,自引:0,他引:20  
In 1997, we proposed the fuzzy-possibilistic c-means (FPCM) model and algorithm that generated both membership and typicality values when clustering unlabeled data. FPCM constrains the typicality values so that the sum over all data points of typicalities to a cluster is one. The row sum constraint produces unrealistic typicality values for large data sets. In this paper, we propose a new model called possibilistic-fuzzy c-means (PFCM) model. PFCM produces memberships and possibilities simultaneously, along with the usual point prototypes or cluster centers for each cluster. PFCM is a hybridization of possibilistic c-means (PCM) and fuzzy c-means (FCM) that often avoids various problems of PCM, FCM and FPCM. PFCM solves the noise sensitivity defect of FCM, overcomes the coincident clusters problem of PCM and eliminates the row sum constraints of FPCM. We derive the first-order necessary conditions for extrema of the PFCM objective function, and use them as the basis for a standard alternating optimization approach to finding local minima of the PFCM objective functional. Several numerical examples are given that compare FCM and PCM to PFCM. Our examples show that PFCM compares favorably to both of the previous models. Since PFCM prototypes are less sensitive to outliers and can avoid coincident clusters, PFCM is a strong candidate for fuzzy rule-based system identification.  相似文献   

11.
Fuzzy clustering is a widely applied method for extracting the underlying models within data. It has been applied successfully in many real-world applications. Fuzzy c-means is one of the most popular fuzzy clustering methods because it produces reasonable results and its implementation is straightforward. One problem with all fuzzy clustering algorithms such as fuzzy c-means is that some data points which are assigned to some clusters have low membership values. It is possible that many samples may be assigned to a cluster with low-confidence. In this paper, an efficient and noise-aware implementation of support vector machines, namely relaxed constraints support vector machines, is used to solve the mentioned problem and improve the performance of fuzzy c-means algorithm. First, fuzzy c-means partitions data into appropriate clusters. Then, the samples with high membership values in each cluster are selected for training a multi-class relaxed constraints support vector machine classifier. Finally, the class labels of the remaining data points are predicted by the latter classifier. The performance of the proposed clustering method is evaluated by quantitative measures such as cluster entropy and Minkowski scores. Experimental results on real-life data sets show the superiority of the proposed method.  相似文献   

12.
13.
The Fuzzy k-Means clustering model (FkM) is a powerful tool for classifying objects into a set of k homogeneous clusters by means of the membership degrees of an object in a cluster. In FkM, for each object, the sum of the membership degrees in the clusters must be equal to one. Such a constraint may cause meaningless results, especially when noise is present. To avoid this drawback, it is possible to relax the constraint, leading to the so-called Possibilistic k-Means clustering model (PkM). In particular, attention is paid to the case in which the empirical information is affected by imprecision or vagueness. This is handled by means of LR fuzzy numbers. An FkM model for LR fuzzy data is firstly developed and a PkM model for the same type of data is then proposed. The results of a simulation experiment and of two applications to real world fuzzy data confirm the validity of both models, while providing indications as to some advantages connected with the use of the possibilistic approach.  相似文献   

14.
利用数据点特征权重的概率约束关系和可能分布,提出了分别建立在概率和可能加权特征方式之上的改进可能模糊聚类的两种模型。其中建立在可能约束之上的改进PCM算法扩展了原算法,具有更广泛的适用性。实验结果表明,算法能够实现不同概率权重或可能分布特征条件下的模糊聚类,扩展了改进的PCM算法,适用性更广。与PCM及其改进算法相比,聚类的效果较为明显。  相似文献   

15.
In fuzzy clustering, the fuzzy c-means (FCM) clustering algorithm is the best known and used method. Since the FCM memberships do not always explain the degrees of belonging for the data well, Krishnapuram and Keller proposed a possibilistic approach to clustering to correct this weakness of FCM. However, the performance of Krishnapuram and Keller's approach depends heavily on the parameters. In this paper, we propose another possibilistic clustering algorithm (PCA) which is based on the FCM objective function, the partition coefficient (PC) and partition entropy (PE) validity indexes. The resulting membership becomes the exponential function, so that it is robust to noise and outliers. The parameters in PCA can be easily handled. Also, the PCA objective function can be considered as a potential function, or a mountain function, so that the prototypes of PCA can be correspondent to the peaks of the estimated function. To validate the clustering results obtained through a PCA, we generalized the validity indexes of FCM. This generalization makes each validity index workable in both fuzzy and possibilistic clustering models. By combining these generalized validity indexes, an unsupervised possibilistic clustering is proposed. Some numerical examples and real data implementation on the basis of the proposed PCA and generalized validity indexes show their effectiveness and accuracy.  相似文献   

16.
A possibilistic approach was initially proposed for c-means clustering. Although the possibilistic approach is sound, this algorithm tends to find identical clusters. To overcome this shortcoming, a possibilistic Fuzzy c-means algorithm (PFCM) was proposed which produced memberships and possibilities simultaneously, along with the cluster centers. PFCM addresses the noise sensitivity defect of Fuzzy c-means (FCM) and overcomes the coincident cluster problem of possibilistic c-means (PCM). Here we propose a new model called Kernel-based hybrid c-means clustering (KPFCM) where PFCM is extended by adopting a Kernel induced metric in the data space to replace the original Euclidean norm metric. Use of Kernel function makes it possible to cluster data that is linearly non-separable in the original space into homogeneous groups in the transformed high dimensional space. From our experiments, we found that different Kernels with different Kernel widths lead to different clustering results. Thus a key point is to choose an appropriate Kernel width. We have also proposed a simple approach to determine the appropriate values for the Kernel width. The performance of the proposed method has been extensively compared with a few state of the art clustering techniques over a test suit of several artificial and real life data sets. Based on computer simulations, we have shown that our model gives better results than the previous models.  相似文献   

17.
In the fuzzy clustering literature, two main types of membership are usually considered: A relative type, termed probabilistic, and an absolute or possibilistic type, indicating the strength of the attribution to any cluster independent from the rest. There are works addressing the unification of the two schemes. Here, we focus on providing a model for the transition from one schema to the other, to exploit the dual information given by the two schemes, and to add flexibility for the interpretation of results. We apply an uncertainty model based on interval values to memberships in the clustering framework, obtaining a framework that we term graded possibility. We outline a basic example of graded possibilistic clustering algorithm and add some practical remarks about its implementation. The experimental demonstrations presented highlight the different properties attainable through appropriate implementation of a suitable graded possibilistic model. An interesting application is found in automated segmentation of diagnostic medical images, where the model provides an interactive visualization tool for this task.  相似文献   

18.
In this paper, a new approach for fault detection and isolation that is based on the possibilistic clustering algorithm is proposed. Fault detection and isolation (FDI) is shown here to be a pattern classification problem, which can be solved using clustering and classification techniques. A possibilistic clustering based approach is proposed here to address some of the shortcomings of the fuzzy c-means (FCM) algorithm. The probabilistic constraint imposed on the membership value in the FCM algorithm is relaxed in the possibilistic clustering algorithm. Because of this relaxation, the possibilistic approach is shown in this paper to give more consistent results in the context of the FDI tasks. The possibilistic clustering approach has also been used to detect novel fault scenarios, for which the data was not available while training. Fault signatures that change as a function of the fault intensities are represented as fault lines, which have been shown to be useful to classify faults that can manifest with different intensities. The proposed approach has been validated here through simulations involving a benchmark quadruple tank process and also through experimental case studies on the same setup. For large scale systems, it is proposed to use the possibilistic clustering based approach in the lower dimensional approximations generated by algorithms such as PCA. Towards this end, finally, we also demonstrate the key merits of the algorithm for plant wide monitoring study using a simulation of the benchmark Tennessee Eastman problem.  相似文献   

19.
We propose new support vector machines (SVMs) that incorporate the geometric distribution of an input data set by associating each data point with a possibilistic membership, which measures the relative strength of the self class membership. By using a possibilistic distance measure based on the possibilistic membership, we reformulate conventional SVMs in three ways. The proposed methods are shown to have better classification performance than conventional SVMs in various tests.  相似文献   

20.
Over the years data clustering algorithms have been used for image segmentation. Due to the presence of uncertainty in real life datasets, several uncertainty based data clustering algorithms have been developed. The c-means clustering algorithms form one such family of algorithms. Starting with the fuzzy c-means (FCM) a subfamily of this family comprises of rough c-means (RCM), intuitionistic fuzzy c-means (IFCM) and their hybrids like rough fuzzy c-means (RFCM) and rough intuitionistic fuzzy c-means (RIFCM). In the basic subfamily of this family of algorithms, the Euclidean distance was being used to measure the similarity of data. However, the sub family of algorithms obtained replacing the Euclidean distance by kernel based similarities produced better results. Especially, these algorithms were useful in handling viably cluster data points which are linearly inseparable in original input space. During this period it was inferred by Krishnapuram and Keller that the membership constraints in some rudimentary uncertainty based clustering techniques like fuzzy c-means imparts them a probabilistic nature, hence they suggested its possibilistic version. In fact all the other member algorithms from basic subfamily have been extended to incorporate this new notion. Currently, the use of image data is growing vigorously and constantly, accounting to huge figures leading to big data. Moreover, since image segmentation happens to be one of the most time consuming processes, industries are in the need of algorithms which can solve this problem at a rapid pace and with high accuracy. In this paper, we propose to combine the notions of kernel and possibilistic approach together in a distributed environment provided by Apache™ Hadoop. We integrate this combined notion with map-reduce paradigm of Hadoop and put forth three novel algorithms; Hadoop based possibilistic kernelized rough c-means (HPKRCM), Hadoop based possibilistic kernelized rough fuzzy c-means (HPKRFCM) and Hadoop based possibilistic kernelized rough intuitionistic fuzzy c-means (HPKRIFCM) and study their efficiency in image segmentation. We compare their running times and analyze their efficiencies with the corresponding algorithms from the other three sub families on four different types of images, three different kernels and six different efficiency measures; the Davis Bouldin index (DB), Dunn index (D), alpha index (α), rho index (ρ), alpha star index (α*) and gamma index (γ). Our analysis shows that the hyper-tangent kernel with Hadoop based possibilistic kernelized rough intuitionistic fuzzy c-means is the best one for image segmentation among all these clustering algorithms. Also, the times taken to render segmented images by the proposed algorithms are drastically low in comparison to the other algorithms. The implementations of the algorithms have been carried out in Java and for the proposed algorithms we have used Hadoop framework installed on CentOS. For statistical plotting we have used matplotlib (python library).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号