首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

2.
In this paper a new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set. A newly developed multiobjective simulated annealing based optimization technique namely archived multiobjective simulated annealing (AMOSA) is used as the background strategy for optimization. Here features and cluster centers are encoded in the form of a string. As the objective functions, two internal cluster validity indices measuring the goodness of the obtained partitioning using Euclidean distance and point symmetry based distance, respectively, and a count on the number of features are utilized. These three objectives are optimized simultaneously using AMOSA in order to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning. Points are allocated to different clusters using a point symmetry based distance. Mutation changes the feature combination as well as the set of cluster centers. Since AMOSA, like any other MOO technique, provides a set of solutions on the final Pareto front, a technique based on the concept of semi-supervised classification is developed to select a solution from the given set. The effectiveness of the proposed FeaClustMOO in comparison with other clustering techniques like its Euclidean distance based version where Euclidean distance is used for cluster assignment, a genetic algorithm based automatic clustering technique (VGAPS-clustering) using point symmetry based distance with all the features, K-means clustering technique with all features is shown for seven higher dimensional data sets obtained from real-life.  相似文献   

3.
提出了一种改进的基于对称点距离的蚂蚁聚类算法。该算法不再采用Euclidean距离来计算类内对象的相似性,而是使用新的对称点距离来计算相似性,在处理带有对称性质的数据集时,可以有效地识别给定数据集的聚类数目和合适的划分。在该算法中,用人工蚂蚁代表数据对象,根据算法给定的聚类规则来寻找最合适的聚类划分。最后用本算法与标准的蚂蚁聚类算法分别对不同的数据集进行了聚类实验。实验结果证实了算法的有效性。  相似文献   

4.
Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the literature. In this paper a new measure of connectivity is incorporated in the definitions of seven cluster validity indices namely, DB-index, Dunn-index, Generalized Dunn-index, PS-index, I-index, XB-index and SV-index, thereby yielding seven new cluster validity indices which are able to automatically detect clusters of any shape, size or convexity as long as they are well-separated. Here connectivity is measured using a novel approach following the concept of relative neighborhood graph. It is empirically established that incorporation of the property of connectivity significantly improves the capabilities of these indices in identifying the appropriate number of clusters. The well-known clustering techniques, single linkage clustering technique and K-means clustering technique are used as the underlying partitioning algorithms. Results on eight artificially generated and three real-life data sets show that connectivity based Dunn-index performs the best as compared to all the other six indices. Comparisons are made with the original versions of these seven cluster validity indices.  相似文献   

5.
In this article, a new symmetry based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry based distance rather than the Euclidean distance. A newly proposed point symmetry based cluster validity index, {em Sym}-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is therefore able to detect both convex and non-convex clusters irrespective of their sizes and shapes as long as they possess the point symmetry property. Kd-tree based nearest neighbor search is used to reduce the complexity of computing point symmetry based distance. A proof on the convergence property of variable string length GA with point symmetry based distance clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length Genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.  相似文献   

6.
Cluster validity indices are used to validate results of clustering and to find a set of clusters that best fits natural partitions for given data set. Most of the previous validity indices have been considerably dependent on the number of data objects in clusters, on cluster centroids and on average values. They have a tendency to ignore small clusters and clusters with low density. Two cluster validity indices are proposed for efficient validation of partitions containing clusters that widely differ in sizes and densities. The first proposed index exploits a compactness measure and a separation measure, and the second index is based an overlap measure and a separation measure. The compactness and the overlap measures are calculated from few data objects of a cluster while the separation measure uses all data objects. The compactness measure is calculated only from data objects of a cluster that are far enough away from the cluster centroids, while the overlap measure is calculated from data objects that are enough near to one or more other clusters. A good partition is expected to have low degree of overlap and a larger separation distance and compactness. The maximum value of the ratio of compactness to separation and the minimum value of the ratio of overlap to separation indicate the optimal partition. Testing of both proposed indices on some artificial and three well-known real data sets showed the effectiveness and reliability of the proposed indices.  相似文献   

7.
A New Line Symmetry Distance and Its Application to Data Clustering   总被引:2,自引:0,他引:2       下载免费PDF全文
In this paper,at first a new line-symmetry-based distance is proposed.The properties of the proposed distance are then elaborately described.Kd-tree-based nearest neighbor search is used to reduce the complexity of computing the proposed line-symmetry-based distance.Thereafter an evolutionary clustering technique is developed that uses the new linesymmetry -based distance measure for assigning points to different clusters.Adaptive mutation and crossover probabilities are used to accelerate the proposed c...  相似文献   

8.
Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain more regions, while some contain relatively smaller regions (e.g., bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of unsupervised and supervised classification. Three cluster validity indices are utilized, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. The first two cluster validity indices, symmetry distance based Sym-index, and Euclidean distance based I-index, are based on unsupervised properties. The last one is a supervised information based cluster validity index, Minkowski index. For supervised information, initially fuzzy C-mean clustering technique is used. Thereafter, based on the highest membership values of the data points to their respective clusters, randomly 10 % data points with their class labels are chosen. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on three satellite image data sets of different cities of India. Results are also compared with existing clustering techniques.  相似文献   

9.
Cluster validity indices are used for estimating the quality of partitions produced by clustering algorithms and for determining the number of clusters in data. Cluster validation is difficult task, because for the same data set more partitions exists regarding the level of details that fit natural groupings of a given data set. Even though several cluster validity indices exist, they are inefficient when clusters widely differ in density or size. We propose a clustering validity index that addresses these issues. It is based on compactness and overlap measures. The overlap measure, which indicates the degree of overlap between fuzzy clusters, is obtained by calculating the overlap rate of all data objects that belong strongly enough to two or more clusters. The compactness measure, which indicates the degree of similarity of data objects in a cluster, is calculated from membership values of data objects that are strongly enough associated to one cluster. We propose ratio and summation type of index using the same compactness and overlap measures. The maximal value of index denotes the optimal fuzzy partition that is expected to have a high compactness and a low degree of overlap among clusters. Testing many well-known previously formulated and proposed indices on well-known data sets showed the superior reliability and effectiveness of the proposed index in comparison to other indices especially when evaluating partitions with clusters that widely differ in size or density.  相似文献   

10.
Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets but lack robustness with respect to variations in cluster shape, proximity, evenness and so forth. In this paper, we have proposed a multiobjective clustering technique which optimizes simultaneously two objectives, one reflecting the total cluster symmetry and the other reflecting the stability of the obtained partitions over different bootstrap samples of the data set. The proposed algorithm uses a recently developed simulated annealing-based multiobjective optimization technique, named AMOSA, as the underlying optimization strategy. Here, points are assigned to different clusters based on a newly defined point symmetry-based distance rather than the Euclidean distance. Results on several artificial and real-life data sets in comparison with another multiobjective clustering technique, MOCK, three single objective genetic algorithm-based automatic clustering techniques, VGAPS clustering, GCUK clustering and HNGA clustering, and several hybrid methods of determining the appropriate number of clusters from data sets show that the proposed technique is well suited to detect automatically the appropriate number of clusters as well as the appropriate partitioning from data sets having point symmetric clusters. The performance of AMOSA as the underlying optimization technique in the proposed clustering algorithm is also compared with PESA-II, another evolutionary multiobjective optimization technique.  相似文献   

11.
A cluster operator takes a set of data points and partitions the points into clusters (subsets). As with any scientific model, the scientific content of a cluster operator lies in its ability to predict results. This ability is measured by its error rate relative to cluster formation. To estimate the error of a cluster operator, a sample of point sets is generated, the algorithm is applied to each point set and the clusters evaluated relative to the known partition according to the distributions, and then the errors are averaged over the point sets composing the sample. Many validity measures have been proposed for evaluating clustering results based on a single realization of the random-point-set process. In this paper we consider a number of proposed validity measures and we examine how well they correlate with error rates across a number of clustering algorithms and random-point-set models. Validity measures fall broadly into three classes: internal validation is based on calculating properties of the resulting clusters; relative validation is based on comparisons of partitions generated by the same algorithm with different parameters or different subsets of the data; and external validation compares the partition generated by the clustering algorithm and a given partition of the data. To quantify the degree of similarity between the validation indices and the clustering errors, we use Kendall's rank correlation between their values. Our results indicate that, overall, the performance of validity indices is highly variable. For complex models or when a clustering algorithm yields complex clusters, both the internal and relative indices fail to predict the error of the algorithm. Some external indices appear to perform well, whereas others do not. We conclude that one should not put much faith in a validity score unless there is evidence, either in terms of sufficient data for model estimation or prior model knowledge, that a validity measure is well-correlated to the error rate of the clustering algorithm.  相似文献   

12.
A cluster validity index for fuzzy clustering   总被引:1,自引:0,他引:1  
A new cluster validity index is proposed for the validation of partitions of object data produced by the fuzzy c-means algorithm. The proposed validity index uses a variation measure and a separation measure between two fuzzy clusters. A good fuzzy partition is expected to have a low degree of variation and a large separation distance. Testing of the proposed index and nine previously formulated indices on well-known data sets shows the superior effectiveness and reliability of the proposed index in comparison to other indices and the robustness of the proposed index in noisy environments.  相似文献   

13.
A new method of partitive clustering is developed in the framework of shadowed sets. The core and exclusion regions of the generated shadowed partitions result in a reduction in computations as compared to conventional fuzzy clustering. Unlike rough clustering, here the choice of threshold parameter is fully automated. The number of clusters is optimized in terms of various validity indices. It is observed that shadowed clustering can efficiently handle overlapping among clusters as well as model uncertainty in class boundaries. The algorithm is robust in the presence of outliers. A comparative study is made with related partitive approaches. Experimental results on synthetic as well as real data sets demonstrate the superiority of the proposed approach.  相似文献   

14.
In this paper a new multiobjective (MO) clustering technique (GenClustMOO) is proposed which can automatically partition the data into an appropriate number of clusters. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local sub-clusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Three objective functions, one reflecting the total compactness of the partitioning based on the Euclidean distance, the other reflecting the total symmetry of the clusters, and the last reflecting the cluster connectedness, are considered here. These are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method, in order to detect the appropriate number of clusters as well as the appropriate partitioning. The symmetry present in a partitioning is measured using a newly developed point symmetry based distance. Connectedness present in a partitioning is measured using the relative neighborhood graph concept. Since AMOSA, as well as any other MO optimization technique, provides a set of Pareto-optimal solutions, a new method is also developed to determine a single solution from this set. Thus the proposed GenClustMOO is able to detect the appropriate number of clusters and the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. The effectiveness of the proposed GenClustMOO in comparison with another recent multiobjective clustering technique (MOCK), a single objective genetic algorithm based automatic clustering technique (VGAPS-clustering), K-means and single linkage clustering techniques is comprehensively demonstrated for nineteen artificial and seven real-life data sets of varying complexities. In a part of the experiment the effectiveness of AMOSA as the underlying optimization technique in GenClustMOO is also demonstrated in comparison to another evolutionary MO algorithm, PESA2.  相似文献   

15.
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster density, we propose that the clustering validity index be used not only globally to find optimal partitions of input data, but also locally to determine which two neighboring clusters are to be merged in a hierarchical clustering of Self-Organizing Map (SOM). A new two-level SOM-based clustering algorithm using the clustering validity index is also proposed. Experimental results on synthetic and real data sets demonstrate that the proposed clustering algorithm is able to cluster data in a better way than classical clustering algorithms on an SOM.  相似文献   

16.
In this paper, the automatic segmentation of a multispectral magnetic resonance image of the brain is posed as a clustering problem in the intensity space. The automatic clustering problem is thereafter modelled as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. A multiobjective clustering technique, named MCMOClust, is used to solve this problem. MCMOClust utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local sub-clusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously to automatically evolve the appropriate number of clusters present in MR brain images. A semi-supervised method is used to select a single solution from the final Pareto optimal front of MCMOClust. The present method is applied on several simulated T1-weighted, T2-weighted and proton density normal and MS lesion magnetic resonance brain images. Superiority of the present method over Fuzzy C-means, Expectation Maximization clustering algorithms and a newly developed symmetry based fuzzy genetic clustering technique (Fuzzy-VGAPS), are demonstrated quantitatively. The automatic segmentation obtained by multiseed based multiobjective clustering technique (MCMOClust) is also compared with the available ground truth information.  相似文献   

17.
Cluster analysis is used to explore structure in unlabeled batch data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept cannot be directly extended to the online setting because streaming algorithms do not retain the data, nor maintain a partition of it, both needed by batch cluster validity indices. In this paper, we develop two incremental versions (with and without forgetting factors) of the Xie-Beni and Davies-Bouldin validity indices, and use them to monitor and control two streaming clustering algorithms (sk-means and online ellipsoidal clustering), In this context, our new incremental validity indices are more accurately viewed as performance monitoring functions. We also show that incremental cluster validity indices can send a distress signal to online monitors when evolving structure leads an algorithm astray. Our numerical examples indicate that the incremental Xie-Beni index with a forgetting factor is superior to the other three indices tested.  相似文献   

18.
In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of the data. The clusters can be of any size, shape or convexity as long as they possess the property of symmetry. Here the membership values of points to different clusters are computed using the newly proposed point symmetry based distance. A variable number of cluster centers are encoded in the chromosomes. A new fuzzy symmetry based cluster validity index, FSym-index is first proposed here and thereafter it is utilized to measure the fitness of the chromosomes. The proposed index can detect non-convex, as well as convex-non-hyperspherical partitioning with variable number of clusters. It is mathematically justified via its relationship to a well-defined hard cluster validity function: the Dunn’s index, for which the condition of uniqueness has already been established. The results of the Fuzzy-VGAPS are compared with those obtained by seven other algorithms including both fuzzy and crisp methods on four artificial and four real-life data sets. Some real-life applications of Fuzzy-VGAPS to automatically cluster the gene expression data as well as segmenting the magnetic resonance brain image with multiple sclerosis lesions are also demonstrated.  相似文献   

19.
This paper presents the design of a new clustering algorithm for images having wide range of variations in appearances and shape. The major chore of the clustering process involves in creating the partitions, reassigning the elements of the partitions and identifying the compact cluster obtained. The clusters are created from various low-dimensional spaces of the data set. Hierarchically related eigenspaces are employed to reassign the elements of the cluster. The clusters obtained from the proposed clustering scheme are used to form the learning set of the classification module. The quality of clusters generated is evaluated from the classification results. Comparisons on the clustering performance have been made with the well-known K-means and nearest neighbor-based clustering techniques. Excellent performance of the proposed clustering scheme is proved from the results reported. The benchmark datasets for objects and faces having images with large pose variations have been used to illustrate the efficiency and effectiveness of the proposed scheme.  相似文献   

20.
Performance evaluation of some clustering algorithms and validity indices   总被引:16,自引:0,他引:16  
In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号