首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

The fuzzy c-means algorithm (FCM) is aimed at computing the membership degree of each data point to its corresponding cluster center. This computation needs to calculate the distance matrix between the cluster center and the data point. The main bottleneck of the FCM algorithm is the computing of the membership matrix for all data points. This work presents a new clustering method, the bdrFCM (boundary data reduction fuzzy c-means). Our algorithm is based on the original FCM proposal, adapted to detect and remove the boundary regions of clusters. Our implementation efforts are directed in two aspects: processing large datasets in less time and reducing the data volume, maintaining the quality of the clusters. A significant volume of real data application (> 106 records) was used, and we identified that bdrFCM implementation has good scalability to handle datasets with millions of data points.

  相似文献   

2.
相比于k-means算法,模糊C均值(FCM)通过引入模糊隶属度,考虑不同数据簇之间的相互作用,进而避免了聚类中心趋同性问题.然而模糊隶属度具有拖尾和翘尾的结构特征,因此使得FCM算法对噪声点和孤立点很敏感;此外,由于FCM算法倾向于将各数据簇均等分,因此算法对数据簇大小也很敏感,对非平衡数据簇聚类效果不佳.针对这些问...  相似文献   

3.
A generalized form of Possibilistic Fuzzy C-Means (PFCM) algorithm (GPFCM) is presented for clustering noisy data. A function of distance is used instead of the distance itself to damp noise contributions. It is shown that when the data are highly noisy, GPFCM finds accurate cluster centers but FCM (Fuzzy C-Means), PCM (Possibilistic C-Means), and PFCM algorithms fail. FCM, PCM, and PFCM yield inaccurate cluster centers when clusters are not of the same size or covariance norm is used, whereas GPFCM performs well for both of the cases even when the data are noisy. It is shown that generalized forms of FCM and PCM (GFCM and GPCM) are also more accurate than FCM and PCM. A measure is defined to evaluate performance of the clustering algorithms. It shows that average error of GPFCM and its simplified forms are about 80% smaller than those of FCM, PCM, and PFCM. However, GPFCM demands higher computational costs due to nonlinear updating equations. Three cluster validity indices are introduced to determine number of clusters in clean and noisy datasets. One of them considers compactness of the clusters; the other considers separation of the clusters, and the third one considers both separation and compactness. Performance of these indices is confirmed to be satisfactory using various examples of noisy datasets.  相似文献   

4.
Fuzzy c-means clustering with spatial constraints is considered as suitable algorithm for data clustering or data analyzing. But FCM has still lacks enough robustness to employ with noise data, because of its Euclidean distance measure objective function for finding the relationship between the objects. It can only be effective in clustering ‘spherical’ clusters, and it may not give reasonable clustering results for “non-compactly filled” spherical data such as “annular-shaped” data. This paper realized the drawbacks of the general fuzzy c-mean algorithm and it tries to introduce an extended Gaussian version of fuzzy C-means by replacing the Euclidean distance in the original object function of FCM. Firstly, this paper proposes initial kernel version of fuzzy c-means to aim at simplifying its computation and then extended it to extended Gaussian kernel version of fuzzy c-means. It derives an effective method to construct the membership matrix for objects, and it derives a robust method for updating centers from extended Gaussian version of fuzzy C-means. Furthermore, this paper proposes a new prototypes learning method and it obtains initial cluster centers using new mathematical initialization centers for the new effective objective function of fuzzy c-means, so that this paper tries to minimize the iteration of algorithms to obtain more accurate result. Initial experiment will be done with an artificially generated data to show how effectively the new proposed Gaussian version of fuzzy C-means works in obtaining clusters, and then the proposed methods can be implemented to cluster the Wisconsin breast cancer database into two clusters for the classes benign and malignant. To show the effective performance of proposed fuzzy c-means with new initialization of centers of clusters, this work compares the results with results of recent fuzzy c-means algorithm; in addition, it uses Silhouette method to validate the obtained clusters from breast cancer datasets.  相似文献   

5.
Fuzzy C-means (FCM) clustering has been widely used successfully in many real-world applications. However, the FCM algorithm is sensitive to the initial prototypes, and it cannot handle non-traditional curved clusters. In this paper, a multi-center fuzzy C-means algorithm based on transitive closure and spectral clustering (MFCM-TCSC) is provided. In this algorithm, the initial guesses of the locations of the cluster centers or the membership values are not necessary. Multi-centers are adopted to represent the non-spherical shape of clusters. Thus, the clustering algorithm with multi-center clusters can handle non-traditional curved clusters. The novel algorithm contains three phases. First, the dataset is partitioned into some subclusters by FCM algorithm with multi-centers. Then, the subclusters are merged by spectral clustering. Finally, based on these two clustering results, the final results are obtained. When merging subclusters, we adopt the lattice similarity method as the distance between two subclusters, which has explicit form when we use the fuzzy membership values of subclusters as the features. Experimental results on two artificial datasets, UCI dataset and real image segmentation show that the proposed method outperforms traditional FCM algorithm and spectral clustering obviously in efficiency and robustness.  相似文献   

6.
One of the simple techniques for Data Clustering is based on Fuzzy C-means (FCM) clustering which describes the belongingness of each data to a cluster by a fuzzy membership function instead of a crisp value. However, the results of fuzzy clustering depend highly on the initial state selection and there is also a high risk for getting the best results when the datasets are large. In this paper, we present a hybrid algorithm based on FCM and modified stem cells algorithms, we called it SC-FCM algorithm, for optimum clustering of a dataset into K clusters. The experimental results obtained by using the new algorithm on different well-known datasets compared with those obtained by K-means algorithm, FCM, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC) Algorithm demonstrate the better performance of the new algorithm.  相似文献   

7.
Data clustering usually requires extensive computations of similarity measures between dataset members and cluster centers, especially for large datasets. Image clustering can be an intermediate process in image retrieval or segmentation, where a fast process is critically required for large image databases. This paper introduces a new approach of multi-agents for fuzzy image clustering (MAFIC) to improve the time cost of the sequential fuzzy \(c\)-means algorithm (FCM). The approach has the distinguished feature of distributing the computation of cluster centers and membership function among several parallel agents, where each agent works independently on a different sub-image of an image. Based on the Java Agent Development Framework platform, an implementation of MAFIC is tested on 24-bit large size images. The experimental results show that the time performance of MAFIC outperforms that of the sequential FCM algorithm by at least four times, and thus reduces the time needed for the clustering process.  相似文献   

8.
Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur “naturally” in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.  相似文献   

9.
针对FCM(Fuzzy C-Means)算法对于初始聚类中心敏感,并只适合于发现球状类型簇的缺陷,提出采用冗余聚类中心初始化的方法降低算法对初始聚类中心的依赖,并先暂时将大簇或者延伸形状的簇分割成用多个小类表示,再利用隶属度矩阵提供的信息合并相邻的小类为大类,对FCM算法进行改进。实验结果显示改进的FCM算法能够在一定程度上识别不规则的簇,并减小FCM算法对初始聚类中心的依赖。  相似文献   

10.
核模糊C-均值聚类KFCM是利用核函数将数据映射到高维空间,通过计算数据点与聚类中心的隶属度对数据进行聚类的算法,拥有高效、快捷的特点而被广泛应用于各领域,然而KFCM算法存在对聚类中心的初始值敏感和不能自适应确定聚类数两个局限性。针对这两个问题,提出一种局部搜索自适应核模糊聚类方法,该方法引入核方法提高数据的可分性,并构造基于核函数的评价函数来确定最优的聚类数目和利用部分样本数据进行局部搜索以寻找初始聚类中心。人工数据和UCI数据集上的实验结果验证了该算法的有效性。  相似文献   

11.
A Possibilistic Fuzzy c-Means Clustering Algorithm   总被引:20,自引:0,他引:20  
In 1997, we proposed the fuzzy-possibilistic c-means (FPCM) model and algorithm that generated both membership and typicality values when clustering unlabeled data. FPCM constrains the typicality values so that the sum over all data points of typicalities to a cluster is one. The row sum constraint produces unrealistic typicality values for large data sets. In this paper, we propose a new model called possibilistic-fuzzy c-means (PFCM) model. PFCM produces memberships and possibilities simultaneously, along with the usual point prototypes or cluster centers for each cluster. PFCM is a hybridization of possibilistic c-means (PCM) and fuzzy c-means (FCM) that often avoids various problems of PCM, FCM and FPCM. PFCM solves the noise sensitivity defect of FCM, overcomes the coincident clusters problem of PCM and eliminates the row sum constraints of FPCM. We derive the first-order necessary conditions for extrema of the PFCM objective function, and use them as the basis for a standard alternating optimization approach to finding local minima of the PFCM objective functional. Several numerical examples are given that compare FCM and PCM to PFCM. Our examples show that PFCM compares favorably to both of the previous models. Since PFCM prototypes are less sensitive to outliers and can avoid coincident clusters, PFCM is a strong candidate for fuzzy rule-based system identification.  相似文献   

12.
This paper introduces a mechanism for testing multivariable models employed by model-based controllers. Although external excitation is not necessary, the data collection includes a stage where the controller is switched to open-loop operation (manual mode). The main idea is to measure a certain “distance” between the closed-loop and the open-loop signals, and then trigger a flag if this “distance” is larger than a threshold level. Moreover, a provision is made for accommodating model uncertainty. Since no hard bounds are assumed with respect to the noise amplitude, the model invalidation mechanism works in a probabilistic framework.  相似文献   

13.
How can we find a natural clustering of a “complex” dataset, which may contain an unknown number of overlapping clusters of arbitrary shape and be contaminated by noise? A tree-structured framework is proposed in this paper to purify such clusters by exploring the structural role of each data. In practice, each individual object within the internal organization of the data has its own specific role—“centroid”, hub or outlier—due to distinctive associations with their respective neighbors. Adjacent centroids always interact on each other and serve as mediate nodes of one tree being members of some cluster. Hubs closed to some centroid become leaf nodes responsible for the termination of the growth of trees. Outliers that weakly touch with any centroid are often discarded from any trees as global noise. All the data can thus be labeled by a specified criterion of “centroids”-connected structural consistency (CCSC). Free of domain-specific information, our framework with CCSC could widely adapt to many clustering-related applications. Theoretical and experimental contributions both confirm that our framework is easy to interpret and implement, efficient and effective in “complex” clustering.  相似文献   

14.
结合[k]-means的自动FCM图像分割方法   总被引:1,自引:0,他引:1  
针对图像分割中模糊C均值算法(FCM)无法自动确定聚类中心,不考虑像素邻域信息的问题,提出一种结合[k]-means的自动FCM图像分割方法。该方法先由图像的灰度直方图确定聚类数目,使用一种改进的快速FCM方法产生初始聚类中心。即通过一步[k]-means算法对大隶属度灰度更新模糊聚类中心,同时仅对小隶属度灰度使用快速FCM?方法进行隶属度更新,迭代后得到初始聚类中心。利用改进隶属度的FCM算法进行最终聚类。实验表明,该方法获取初始聚类中心接近最终值,加速图像分割,并对噪声具有一定的鲁棒性。  相似文献   

15.
In this paper we introduce a class of fuzzy clusterwise regression models with LR fuzzy response variable and numeric explanatory variables, which embodies fuzzy clustering, into a fuzzy regression framework. The model bypasses the heterogeneity problem that could arise in fuzzy regression by subdividing the dataset into homogeneous clusters and performing separate fuzzy regression on each cluster. The integration of the clustering model into the regression framework allows us to simultaneously estimate the regression parameters and the membership degree of each observation to each cluster by optimizing a single objective function. The class of models proposed here includes, as special cases, the fuzzy clusterwise linear regression model and the fuzzy clusterwise polynomial regression model. We also introduce a set of goodness of fit indices to evaluate the fit of the regression model within each cluster as well as in the whole dataset. Finally, we consider some cluster validity criteria that are useful in identifying the “optimal” number of clusters. Several applications are provided in order to illustrate the approach.  相似文献   

16.
针对粗糙K-means聚类及其相关衍生算法需要提前人为给定聚类数目、随机选取初始类簇中心导致类簇交叉区域的数据划分准确率偏低等问题,文中提出基于混合度量与类簇自适应调整的粗糙模糊K-means聚类算法.在计算边界区域的数据对象归属于不同类簇的隶属程度时,综合考虑局部密度和距离的混合度量,并采用自适应调整类簇数目的策略,获得最佳聚类数目.选取数据对象稠密区域中距离最小的两个样本的中点作为初始类簇中心,将附近局部密度高于平均密度的对象划分至该簇后再选取剩余的初始类簇中心,使初始类簇中心的选取更合理.在人工数据集和UCI标准数据集上的实验表明,文中算法在处理类簇交叠严重的球簇状数据集时,具有自适应性,聚类精度较优.  相似文献   

17.
A fuzzy clustering problem consists of assigning a set of patterns to a given number of clusters with respect to some criteria such that each of them may belong to more than one cluster with different degrees of membership. In order to solve it, we first propose a new local search heuristic, called Fuzzy J-Means, where the neighbourhood is defined by all possible centroid-to-pattern relocations. The “integer” solution is then moved to a continuous one by an alternate step, i.e., by finding centroids and membership degrees for all patterns and clusters. To alleviate the difficulty of being stuck in local minima of poor value, this local search is then embedded into the Variable Neighbourhood Search metaheuristic. Results on five standard test problems from the literature are reported and compared with those obtained with the well-known Fuzzy C-Means heuristic. It appears that solutions of substantially better quality are obtained with the proposed methods than with this former one.  相似文献   

18.
Reducing the time complexity of the fuzzy c-means algorithm   总被引:13,自引:0,他引:13  
In this paper, we present an efficient implementation of the fuzzy c-means clustering algorithm. The original algorithm alternates between estimating centers of the clusters and the fuzzy membership of the data points. The size of the membership matrix is on the order of the original data set, a prohibitive size if this technique is to be applied to very large data sets with many clusters. Our implementation eliminates the storage of this data structure by combining the two updates into a single update of the cluster centers. This change significantly affects the asymptotic runtime as the new algorithm is linear with respect to the number of clusters, while the original is quadratic. Elimination of the membership matrix also reduces the overhead associated with repeatedly accessing a large data structure. Empirical evidence is presented to quantify the savings achieved by this new method  相似文献   

19.
A learning machine, called a clustering interpreting probabilistic associative memory (CIPAM), is proposed. CIPAM consists of a clusterer and an interpreter. The clusterer is a recurrent hierarchical neural network of unsupervised processing units (UPUs). The interpreter is a number of supervised processing units (SPUs) that branch out from the clusterer. Each processing unit (PU), UPU or SPU, comprises “dendritic encoders” for encoding inputs to the PU, “synapses” for storing resultant codes, a “nonspiking neuron” for generating inhibitory graded signals to modulate neighboring spiking neurons, “spiking neurons” for computing the subjective probability distribution (SPD) or the membership function, in the sense of fuzzy logic, of the label of said inputs to the PU and generating spike trains with the SPD or membership function as the firing rates, and a masking matrix for maximizing generalization. While UPUs employ unsupervised covariance learning mechanisms, SPUs employ supervised ones. They both also have unsupervised accumulation learning mechanisms. The clusterer of CIPAM clusters temporal and spatial data. The interpreter interprets the resultant clusters, effecting detection and recognition of temporal and hierarchical causes.  相似文献   

20.
Fuzzy c-means (FCMs) is an important and popular unsupervised partitioning algorithm used in several application domains such as pattern recognition, machine learning and data mining. Although the FCM has shown good performance in detecting clusters, the membership values for each individual computed to each of the clusters cannot indicate how well the individuals are classified. In this paper, a new approach to handle the memberships based on the inherent information in each feature is presented. The algorithm produces a membership matrix for each individual, the membership values are between zero and one and measure the similarity of this individual to the center of each cluster according to each feature. These values can change at each iteration of the algorithm and they are different from one feature to another and from one cluster to another in order to increase the performance of the fuzzy c-means clustering algorithm. To obtain a fuzzy partition by class of the input data set, a way to compute the class membership values is also proposed in this work. Experiments with synthetic and real data sets show that the proposed approach produces good quality of clustering.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号