首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 632 毫秒
1.
Wang  Shuqin  Chen  Yongyong  Yi  Shuang  Chao  Guoqing 《Applied Intelligence》2022,52(13):14935-14948

Graph learning methods have been widely used for multi-view clustering. However, such methods have the following challenges: (1) they usually perform simple fusion of fixed similarity graph matrices, ignoring its essential structure. (2) they are sensitive to noise and outliers because they usually learn the similarity matrix from the raw features. To solve these problems, we propose a novel multi-view subspace clustering method named Frobenius norm-regularized robust graph learning (RGL), which inherits desirable advantages (noise robustness and local information preservation) from the subspace clustering and manifold learning. Specifically, RGL uses Frobenius norm constraint and adjacency similarity learning to simultaneously explore the global information and local similarity of views. Furthermore, the l2,1 norm is imposed on the error matrix to remove the disturbance of noise and outliers. An effectively iterative algorithm is designed to solve the RGL model by the alternation direction method of multipliers. Extensive experiments on nine benchmark databases show the clear advantage of the proposed method over fifteen state-of-the-art clustering methods.

  相似文献   

2.
In this paper, we introduce new algorithms that perform clustering and feature weighting simultaneously and in an unsupervised manner. The proposed algorithms are computationally and implementationally simple, and learn a different set of feature weights for each identified cluster. The cluster dependent feature weights offer two advantages. First, they guide the clustering process to partition the data set into more meaningful clusters. Second, they can be used in the subsequent steps of a learning system to improve its learning behavior. An extension of the algorithm to deal with an unknown number of clusters is also proposed. The extension is based on competitive agglomeration, whereby the number of clusters is over-specified, and adjacent clusters are allowed to compete for data points in a manner that causes clusters which lose in the competition to gradually become depleted and vanish. We illustrate the performance of the proposed approach by using it to segment color images, and to build a nearest prototype classifier.  相似文献   

3.
Seriation is a useful statistical method to visualize clusters in a dataset. However, as the data are noisy or unbalanced, visualizing the data structure becomes challenging. To alleviate this limitation, we introduce a novel metric based on common neighborhood to evaluate the degree of sparsity in a dataset. A pile of matrices are derived for different levels of sparsity, and the matrices are permuted by a branch-and-bound algorithm. The matrix with the best block diagonal form is then selected by a compactness criterion. The selected matrix reveals the intrinsic structure of the data by excluding noisy data or outliers. This seriation algorithm is applicable even if the number of clusters is unknown or if the clusters are imbalanced. However, if the metric introduces too much sparsity in the data, the sub-sampled groups of data could be ousted. To resolve this problem, a multi-scale approach combining different levels of sparsity is proposed. The capability of the proposed seriation method is examined both by toy problems and in the context of spike sorting.  相似文献   

4.
《Image and vision computing》2001,19(9-10):639-648
In this paper, a new learning algorithm is proposed with the purpose of texture segmentation. The algorithm is a competitive clustering scheme with two specific features: elliptical clustering is accomplished by incorporating the Mahalanobis distance measure into the learning rules and under-utilization of smaller clusters is avoided by incorporating a frequency-sensitive term. In the paper, an efficient learning rule that incorporates these features is elaborated. In the experimental section, several experiments demonstrate the usefulness of the proposed technique for the segmentation of textured images. On the compositions of textured images, Gabor filters were applied to generate texture features. The segmentation performance is compared to k-means clustering with and without the use of the Mahalanobis distance and to the ordinary competitive learning scheme. It is demonstrated that the proposed algorithm outperforms the others. A fuzzy version of the technique is introduced, and experimentally compared with fuzzy versions of the k-means and competitive clustering algorithms. The same conclusions as for the hard clustering case hold.  相似文献   

5.
Competitive learning approaches with individual penalization or cooperation mechanisms have the attractive ability of automatic cluster number selection in unsupervised data clustering. In this paper, we further study these two mechanisms and propose a novel learning algorithm called Cooperative and Penalized Competitive Learning (CPCL), which implements the cooperation and penalization mechanisms simultaneously in a single competitive learning process. The integration of these two different kinds of competition mechanisms enables the CPCL to locate the cluster centers more quickly and be insensitive to the number of seed points and their initial positions. Additionally, to handle nonlinearly separable clusters, we further introduce the proposed competition mechanism into kernel clustering framework. Correspondingly, a new kernel-based competitive learning algorithm which can conduct nonlinear partition without knowing the true cluster number is presented. The promising experimental results on real data sets demonstrate the superiority of the proposed methods.  相似文献   

6.
This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed robust competitive agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find an unknown number of clusters of various shapes in noisy data sets, as well as to fit an unknown number of parametric models simultaneously. Several examples, such as clustering/mixture decomposition, line/plane fitting, segmentation of range images, and estimation of motion parameters of multiple objects, are shown  相似文献   

7.
Statistical clustering criteria with free scale parameters and unknown cluster sizes are inclined to create small, spurious clusters. To mitigate this tendency a statistical model for cardinality-constrained clustering of data with gross outliers is established, its maximum likelihood and maximum a posteriori clustering criteria are derived, and their consistency and robustness are analyzed. The criteria lead to constrained optimization problems that can be solved by using iterative, alternating trimming algorithms of k-means type. Each step in the algorithms requires the solution of a λ-assignment problem known from combinatorial optimization. The method allows one to estimate the numbers of clusters and outliers. It is illustrated with a synthetic data set and a real one.  相似文献   

8.
In this paper, we discuss the influence of feature vectors contributions at each learning time t on a sequential-type competitive learning algorithm. We then give a learning rate annealing schedule to improve the unsupervised learning vector quantization (ULVQ) algorithm which uses the winner-take-all competitive learning principle in the self-organizing map (SOM). We also discuss the noisy and outlying problems of a sequential competitive learning algorithm and then propose an alternative learning formula to make the sequential competitive learning robust to noise and outliers. Combining the proposed learning rate annealing schedule and alternative learning formula, we propose an alternative learning vector quantization (ALVQ) algorithm. Some discussion and experimental results from comparing ALVQ with ULVQ show the superiority of the proposed method.  相似文献   

9.
《Information Systems》2001,26(1):35-58
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE achieves this by representing each cluster by a certain fixed number of points that are generated by selecting well scattered points from the cluster and then shrinking them toward the center of the cluster by a specified fraction. Having more than one representative point per cluster allows CURE to adjust well to the geometry of non-spherical shapes and the shrinking helps to dampen the effects of outliers. To handle large databases, CURE employs a combination of random sampling and partitioning. A random sample drawn from the data set is first partitioned and each partition is partially clustered. The partial clusters are then clustered in a second pass to yield the desired clusters. Our experimental results confirm that the quality of clusters produced by CURE is much better than those found by existing algorithms. Furthermore, they demonstrate that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.  相似文献   

10.
Major problems exist in both crisp and fuzzy clustering algorithms. The fuzzy c-means type of algorithms use weights determined by a power m of inverse distances that remains fixed over all iterations and over all clusters, even though smaller clusters should have a larger m. Our method uses a different “distance” for each cluster that changes over the early iterations to fit the clusters. Comparisons show improved results. We also address other perplexing problems in clustering: (i) find the optimal number K of clusters; (ii) assess the validity of a given clustering; (iii) prevent the selection of seed vectors as initial prototypes from affecting the clustering; (iv) prevent the order of merging from affecting the clustering; and (v) permit the clusters to form more natural shapes rather than forcing them into normed balls of the distance function. We employ a relatively large number K of uniformly randomly distributed seeds and then thin them to leave fewer uniformly distributed seeds. Next, the main loop iterates by assigning the feature vectors and computing new fuzzy prototypes. Our fuzzy merging then merges any clusters that are too close to each other. We use a modified Xie-Bene validity measure as the goodness of clustering measure for multiple values of K in a user-interaction approach where the user selects two parameters (for eliminating clusters and merging clusters after viewing the results thus far). The algorithm is compared with the fuzzy c-means on the iris data and on the Wisconsin breast cancer data.  相似文献   

11.
12.
Extracting different clusters of a given data is an appealing topic in swarm intelligence applications. This paper introduces two main data clustering approaches based on particle swarm optimization, namely single swarm and multiple cooperative swarms clustering. A stability analysis is next introduced to determine the model order of the underlying data using multiple cooperative swarms clustering. The proposed approach is assessed using different data sets and its performance is compared with that of k-means, k-harmonic means, fuzzy c-means and single swarm clustering techniques. The obtained results indicate that the proposed approach fairly outperforms the other clustering approaches in terms of different cluster validity measures.  相似文献   

13.
基于数据间内在关联性的自适应模糊聚类模型   总被引:2,自引:0,他引:2  
唐成龙  王石刚 《自动化学报》2010,36(11):1544-1556
提出了一种新的模糊聚类模型(Fuzzy C-means clustering model, FCM), 称为自适应模糊聚类(Adaptive FCM, AFCM). 和现有的大多数模糊聚类方法不同的是, AFCM考虑了数据集中全体数据的内在关联性, 模型中引入了自适应度向量W和自适应指数p. 其中, W在迭代过程中是自适应的, p是一个给定参数. W和p共同作用调控聚类过程. AFCM同时输出三组参数: 模糊隶属度集U, 自适应度向量W, 以及聚类原型集V. 本文给出了两组数据实验验证AFCM的性能. 第1组实验验证AFCM的聚类性能, 以FCM为比较对象. 实验表明 AFCM可以得到更好的聚类质量, 而且通过合理选择自适应指数p, AFCM和FCM在时间复杂性上保持同一水平. 第2组实验检验了AFCM的离群点挖掘性能, 以目前常用的基于密度的LOF为比较对象. 实验表明AFCM算法具有极大的计算效率优势, 且AFCM得到的离群点是全局的, 反映的是离群点和整个数据集的关系, 离群点涵盖的信息也更丰富. 文章指出, AFCM在挖掘大数据集和实时数据中的离群点应用方面, 以及获得高质量的聚类结果的应用方面, 特别在聚类的同时需要挖掘离群点的应用方面具有独特的优势.  相似文献   

14.
This paper develops theory and algorithms concerning a new metric for clustering data. The metric minimizes the total volume of clusters, where the volume of a cluster is defined as the volume of the minimum volume ellipsoid (MVE) enclosing all data points in the cluster. This metric is scale-invariant, that is, the optimal clusters are invariant under an affine transformation of the data space. We introduce the concept of outliers in the new metric and show that the proposed method of treating outliers asymptotically recovers the data distribution when the data comes from a single multivariate Gaussian distribution. Two heuristic algorithms are presented that attempt to optimize the new metric. On a series of empirical studies with Gaussian distributed simulated data, we show that volume-based clustering outperforms well-known clustering methods such as k-means, Ward's method, SOM, and model-based clustering.  相似文献   

15.
The implementation of fuzzy clustering in the design process of vector quantizers faces three challenges. The first is the high computational cost. The second challenge arises because a vector quantizer is required to assign each training sample to only one cluster. However, such an aggressive interpretation of fuzzy clustering results to a crisp partition of inferior quality. The third one is the dependence on initialization. In this paper we develop a fuzzy clustering-based vector quantization algorithm that deals with the aforementioned problems. The algorithm utilizes a specialized objective function, which involves the c-means and the fuzzy c-means along with a competitive agglomeration term. The joint effect is a learning process where the number of codewords (i.e. cluster centers) affected by a specific training sample is gradually reducing and therefore, the number of distance calculations is also reducing. Thus, the computational cost becomes smaller. In addition, the partition is smoothly transferred from fuzzy to crisp conditions and there is no need to employ any aggressive interpretation of fuzzy clustering. The competitive agglomeration term refines large clusters from small and spurious ones. Then, contrary to the classical competitive agglomeration method, we do not discard the small clusters but instead migrate them close to large clusters, rendering more competitive. Thus, the codeword migration process uses the net effect of the competitive agglomeration and acts to further reduce the dependence on initialization in order to obtain a better local minimum. The algorithm is applied to grayscale image compression. The main simulation findings can be summarized as follows: (a) a comparison between the proposed method and other related approaches shows its statistically significant superiority, (b) the algorithm is a fast process, (c) the algorithm is insensitive with respect to its design parameters, and (d) the reconstructed images maintain high quality, which is quantified in terms of the distortion measure.  相似文献   

16.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

17.
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple scans of data an impractical proposition for the purpose of knowledge discovery. Need to learn structures from data in streaming environment has been a driving force for making clustering a popular technique for knowledge discovery from data streams. Continuous nature of streaming data makes it infeasible to look for point membership among the clusters discovered so far, necessitating employment of a synopsis structure to consolidate incoming data points. This synopsis is exploited for building clustering scheme to meet subsequent user demands. The proposed Exclusive and Complete Clustering (ExCC) algorithm captures non-overlapping clusters in data streams with mixed attributes, such that each point either belongs to some cluster or is an outlier/noise. The algorithm is robust, adaptive to changes in data distribution and detects succinct outliers on-the-fly. It deploys a fixed granularity grid structure as synopsis and performs clustering by coalescing dense regions in grid. Speed-based pruning is applied to synopsis prior to clustering to ensure currency of discovered clusters. Extensive experimentation demonstrates that the algorithm is robust, identifies succinct outliers on-the-fly and is adaptive to change in the data distribution. ExCC algorithm is further evaluated for performance and compared with other contemporary algorithms.  相似文献   

18.
This study proposes a hybrid robust approach for constructing Takagi–Sugeno–Kang (TSK) fuzzy models with outliers. The approach consists of a robust fuzzy C-regression model (RFCRM) clustering algorithm in the coarse-tuning phase and an annealing robust back-propagation (ARBP) learning algorithm in the fine-tuning phase. The RFCRM clustering algorithm is modified from the fuzzy C-regression models (FCRM) clustering algorithm by incorporating a robust mechanism and considering input data distribution and robust similarity measure into the FCRM clustering algorithm. Due to the use of robust mechanisms and the consideration of input data distribution, the fuzzy subspaces and the parameters of functions in the consequent parts are simultaneously identified by the proposed RFCRM clustering algorithm and the obtained model will not be significantly affected by outliers. Furthermore, the robust similarity measure is used in the clustering process to reduce the redundant clusters. Consequently, the RFCRM clustering algorithm can generate a better initialization for the TSK fuzzy models in the coarse-tuning phase. Then, an ARBP algorithm is employed to obtain a more precise model in the fine-tuning phase. From our simulation results, it is clearly evident that the proposed robust TSK fuzzy model approach is superior to existing approaches in learning speed and in approximation accuracy.  相似文献   

19.
This paper describes the work that adapts group technology and integrates it with fuzzy c-means, genetic algorithms and the tabu search to realize a fuzzy c-means based hybrid evolutionary approach to the clustering of supply chains. The proposed hybrid approach is able to organise supply chain units, transportation modes and work orders into different unit-transportation-work order families. It can determine the optimal clustering parameter, namely the number of clusters, c, and weighting exponent, m, dynamically, and is able to eliminate the necessity of pre-defining suitable values for these clustering parameters. A new fuzzy c-means validity index that takes into account inter-cluster transportation and group efficiency is formulated. It is employed to determine the promise level that estimates how good a set of clustering parameters is. The capability of the proposed hybrid approach is illustrated using three experiments and the comparative studies. The results show that the proposed hybrid approach is able to suggest suitable clustering parameters and near optimal supply chain clusters can be obtained readily.  相似文献   

20.
Traditional pattern recognition generally involves two tasks: unsupervised clustering and supervised classification. When class information is available, fusing the advantages of both clustering learning and classification learning into a single framework is an important problem worthy of study. To date, most algorithms generally treat clustering learning and classification learning in a sequential or two-step manner, i.e., first execute clustering learning to explore structures in data, and then perform classification learning on top of the obtained structural information. However, such sequential algorithms cannot always guarantee the simultaneous optimality for both clustering and classification learning. In fact, the clustering learning in these algorithms just aids the subsequent classification learning and does not benefit from the latter. To overcome this problem, a simultaneous learning framework for clustering and classification (SCC) is presented in this paper. SCC aims to achieve three goals: (1) acquiring the robust classification and clustering simultaneously; (2) designing an effective and transparent classification mechanism; (3) revealing the underlying relationship between clusters and classes. To this end, with the Bayesian theory and the cluster posterior probabilities of classes, we define a single objective function to which the clustering process is directly embedded. By optimizing this objective function, the effective and robust clustering and classification results are achieved simultaneously. Experimental results on both synthetic and real-life datasets show that SCC achieves promising classification and clustering results at one time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号