共查询到20条相似文献,搜索用时 0 毫秒
1.
Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming 总被引:1,自引:0,他引:1
André L.V. Coelho Everlândio Fernandes Katti FaceliAuthor vitae 《Decision Support Systems》2011,51(4):794-809
This paper investigates a genetic programming (GP) approach aimed at the multi-objective design of hierarchical consensus functions for clustering ensembles. By this means, data partitions obtained via different clustering techniques can be continuously refined (via selection and merging) by a population of fusion hierarchies having complementary validation indices as objective functions. To assess the potential of the novel framework in terms of efficiency and effectiveness, a series of systematic experiments, involving eleven variants of the proposed GP-based algorithm and a comparison with basic as well as advanced clustering methods (of which some are clustering ensembles and/or multi-objective in nature), have been conducted on a number of artificial, benchmark and bioinformatics datasets. Overall, the results corroborate the perspective that having fusion hierarchies operating on well-chosen subsets of data partitions is a fine strategy that may yield significant gains in terms of clustering robustness. 相似文献
2.
On the cluster consensus of discrete-time multi-agent systems 总被引:1,自引:0,他引:1
Nowadays, multi-agent systems (MAS) are ubiquitous in the real world. Consensus is a fundamental natural phenomenon. Over the past decade, consensus of MAS has received increasing attention from various disciplines. This paper aims to further investigate a novel kind of cluster consensus of MAS with several different subgroups. Based on Markov chains and nonnegative matrix analysis, two novel cluster consensus criteria are obtained for MAS with fixed and switching topology, respectively. Furthermore, numerical simulations are also given to validate the effectiveness of these proposed criteria. The proposed cluster consensus criteria have some potential applications in real world engineering systems. 相似文献
3.
Prodip Hore Author Vitae Author Vitae Dmitry B. Goldgof Author Vitae 《Pattern recognition》2009,42(5):676-1901
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups. 相似文献
4.
Clustering is one of the most important unsupervised learning problems and it consists of finding a common structure in a collection of unlabeled data. However, due to the ill-posed nature of the problem, different runs of the same clustering algorithm applied to the same data-set usually produce different solutions. In this scenario choosing a single solution is quite arbitrary. On the other hand, in many applications the problem of multiple solutions becomes intractable, hence it is often more desirable to provide a limited group of “good” clusterings rather than a single solution. In the present paper we propose the least squares consensus clustering. This technique allows to extrapolate a small number of different clustering solutions from an initial (large) ensemble obtained by applying any clustering algorithm to a given data-set. We also define a measure of quality and present a graphical visualization of each consensus clustering to make immediately interpretable the strength of the consensus. We have carried out several numerical experiments both on synthetic and real data-sets to illustrate the proposed methodology. 相似文献
5.
Daniel Hernández-Lobato Author Vitae Gonzalo Martínez-Muñoz Author Vitae 《Pattern recognition》2011,44(7):1426-1434
In this paper we introduce a framework for making statistical inference on the asymptotic prediction of parallel classification ensembles. The validity of the analysis is fairly general. It only requires that the individual classifiers are generated in independent executions of some randomized learning algorithm, and that the final ensemble prediction is made via majority voting. Given an unlabeled test instance, the predictions of the classifiers in the ensemble are obtained sequentially. As the individual predictions become known, Bayes' theorem is used to update an estimate of the probability that the class predicted by the current ensemble coincides with the classification of the corresponding ensemble of infinite size. Using this estimate, the voting process can be halted when the confidence on the asymptotic prediction is sufficiently high. An empirical investigation in several benchmark classification problems shows that most of the test instances require querying only a small number of classifiers to converge to the infinite ensemble prediction with a high degree of confidence. For these instances, the difference between the generalization error of the finite ensemble and the infinite ensemble limit is very small, often negligible. 相似文献
6.
Cluster ensembles in collaborative filtering recommendation 总被引:1,自引:0,他引:1
Recommender systems, which recommend items of information that are likely to be of interest to the users, and filter out less favored data items, have been developed. Collaborative filtering is a widely used recommendation technique. It is based on the assumption that people who share the same preferences on some items tend to share the same preferences on other items. Clustering techniques are commonly used for collaborative filtering recommendation. While cluster ensembles have been shown to outperform many single clustering techniques in the literature, the performance of cluster ensembles for recommendation has not been fully examined. Thus, the aim of this paper is to assess the applicability of cluster ensembles to collaborative filtering recommendation. In particular, two well-known clustering techniques (self-organizing maps (SOM) and k-means), and three ensemble methods (the cluster-based similarity partitioning algorithm (CSPA), hypergraph partitioning algorithm (HGPA), and majority voting) are used. The experimental results based on the Movielens dataset show that cluster ensembles can provide better recommendation performance than single clustering techniques in terms of recommendation accuracy and precision. In addition, there are no statistically significant differences between either the three SOM ensembles or the three k-means ensembles. Either the SOM or k-means ensembles could be considered in the future as the baseline collaborative filtering technique. 相似文献
7.
We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers. 相似文献
8.
Giuliano Galimberti 《Computational statistics & data analysis》2007,52(1):520-536
There is an interest in the problem of identifying different partitions of a given set of units obtained according to different subsets of the observed variables (multiple cluster structures). A model-based procedure has been previously developed for detecting multiple cluster structures from independent subsets of variables. The method relies on model-based clustering methods and on a comparison among mixture models using the Bayesian Information Criterion. A generalization of this method which allows the use of any model-selection criterion is considered. A new approach combining the generalized model-based procedure with variable-clustering methods is proposed. The usefulness of the new method is shown using simulated and real examples. Monte Carlo methods are employed to evaluate the performance of various approaches. Data matrices with two cluster structures are analyzed taking into account the separation of clusters, the heterogeneity within clusters and the dependence of cluster structures. 相似文献
9.
Christian Hennig 《Computational statistics & data analysis》2007,52(1):258-271
Stability in cluster analysis is strongly dependent on the data set, especially on how well separated and how homogeneous the clusters are. In the same clustering, some clusters may be very stable and others may be extremely unstable. The Jaccard coefficient, a similarity measure between sets, is used as a cluster-wise measure of cluster stability, which is assessed by the bootstrap distribution of the Jaccard coefficient for every single cluster of a clustering compared to the most similar cluster in the bootstrapped data sets. This can be applied to very general cluster analysis methods. Some alternative resampling methods are investigated as well, namely subsetting, jittering the data points and replacing some data points by artificial noise points. The different methods are compared by means of a simulation study. A data example illustrates the use of the cluster-wise stability assessment to distinguish between meaningful stable and spurious clusters, but it is also shown that clusters are sometimes only stable because of the inflexibility of certain clustering methods. 相似文献
10.
The combination of multiple clustering results (clustering ensemble) has emerged as an important procedure to improve the quality of clustering solutions. In this paper we propose a new cluster ensemble method based on kernel functions, which introduces the Partition Relevance Analysis step. This step has the goal of analyzing the set of partition in the cluster ensemble and extract valuable information that can improve the quality of the combination process. Besides, we propose a new similarity measure between partitions proving that it is a kernel function. A new consensus function is introduced using this similarity measure and based on the idea of finding the median partition. Related to this consensus function, some theoretical results that endorse the suitability of our methods are proven. Finally, we conduct a numerical experimentation to show the behavior of our method on several databases by making a comparison with simple clustering algorithms as well as to other cluster ensemble methods. 相似文献
11.
A clustering ensemble combines in a consensus function the partitions generated by a set of independent base clusterers. In this study both the employment of particle swarm clustering (PSC) and ensemble pruning (i.e., selective reduction of base partitions) using evolutionary techniques in the design of the consensus function is investigated. In the proposed ensemble, PSC plays two roles. First, it is used as a base clusterer. Second, it is employed in the consensus function; arguably the most challenging element of the ensemble. The proposed consensus function exploits a representation for the base partitions that makes cluster alignment unnecessary, allows for the combination of partitions with different number of clusters, and supports both disjoint and overlapping (fuzzy, probabilistic, and possibilistic) partitions. Results on both synthetic and real-world data sets show that the proposed ensemble can produce statistically significant better partitions, in terms of the validity indices used, than the best base partition available in the ensemble. In general, a small number of selected base partitions (below 20% of the total) yields the best results. Moreover, results produced by the proposed ensemble compare favorably to those of state-of-the-art clustering algorithms, and specially to swarm based clustering ensemble algorithms. 相似文献
12.
Richard J. Hathaway Author Vitae Author Vitae Jacalyn M. Huband Author Vitae 《Pattern recognition》2006,39(7):1315-1324
The problem of determining whether clusters are present in a data set (i.e., assessment of cluster tendency) is an important first step in cluster analysis. The visual assessment of cluster tendency (VAT) tool has been successful in determining potential cluster structure of various data sets, but it can be computationally expensive for large data sets. In this article, we present a new scalable, sample-based version of VAT, which is feasible for large data sets. We include analysis and numerical examples that demonstrate the new scalable VAT algorithm. 相似文献
13.
14.
Optimal resampling and classifier prototype selection in classifier ensembles using genetic algorithms 总被引:2,自引:0,他引:2
Ensembles of classifiers that are trained on different parts of the input space provide good results in general. As a popular boosting technique, AdaBoost is an iterative and gradient based deterministic method used for this purpose where an exponential loss function is minimized. Bagging is a random search based ensemble creation technique where the training set of each classifier is arbitrarily selected. In this paper, a genetic algorithm based ensemble creation approach is proposed where both resampled training sets and classifier prototypes evolve so as to maximize the combined accuracy. The objective function based random search procedure of the resultant system guided by both ensemble accuracy and diversity can be considered to share the basic properties of bagging and boosting. Experimental results have shown that the proposed approach provides better combined accuracies using a fewer number of classifiers than AdaBoost. 相似文献
15.
Cluster ensemble has become a general technique for combining multiple clustering partitions. There are various cluster ensemble methods to be used in real applications. Recently, Zhang et al. (2012) considered a generalized adjusted Rand index () for cluster ensembles by using a consensus matrix to evaluate values. However, Zhang’s method for cluster ensembles cannot treat the cases in fuzzy partitions and fuzzy cluster ensembles. In this paper we propose evaluation measures for cluster ensembles based on the proposed fuzzy generalized Rand index (). We first use a graph and relation matrices to convert a membership matrix into a sign relation matrix, and have the trace of matrix multiplication to calculate similarity measures. We then use the to broaden the scope of the for considering other scenarios so that it can treat the following situations: (1) between a fuzzy cluster ensemble and a crisp partition, (2) between a fuzzy cluster ensemble and a cluster ensemble, (3) between a fuzzy cluster ensemble and a fuzzy partition, (4) between two fuzzy cluster ensembles, and (5) between two different object data sets with the same cardinal number and the same partition method. Finally, numerical comparisons and experimental results are used to demonstrate the key properties, rationality, and practicality of the proposed method. 相似文献
16.
In this paper, we present two bounded cost algorithms that solve multivalued consensus using binary consensus instances. Our first algorithm uses log2n number of binary consensus instances where n is the number of processes, while our second algorithm uses at most binary consensus instances, where is the maximum length of the binary representation of all proposed values in the run. Both algorithms are significant improvements over the previous algorithm in [A. Mostefaoui, M. Raynal, F. Tronel, From binary consensus to multivalued consensus in asynchronous message-passing systems, Information Processing Letters 73 (5–6) (2000) 207–212], where the number of binary consensus instances needed to solve one multivalued consensus is unbounded. 相似文献
17.
This paper investigates the cluster consensus control for generic linear multi-agent systems (MASs) under directed interaction topology via distributed feedback controller. Focus of this paper is particularly on addressing the following problem which is of both theoretical and practical interests but have not been considered in the existing literature: under what kind of interaction among the clusters can the cluster consensus control be achieved regardless of the magnitudes of the coupling strengths among the agents? Directed acyclic interaction topology among the clusters is proved to have this property. As opposed to the algebraic conditions provided in the existing literature, conditions for guaranteeing the cluster consensus control in this paper are presented in terms of purely the graphic topology conditions and thus are very easy to be verified. 相似文献
18.
Many validity measures have been proposed for evaluating clustering results. Most of these popular validity measures do not work well for clusters with different densities and/or sizes. They usually have a tendency of ignoring clusters with low densities. In this paper, we propose a new validity measure that can deal with this situation. In addition, we also propose a modified K-means algorithm that can assign more cluster centres to areas with low densities of data than the conventional K-means algorithm does. First, several artificial data sets are used to test the performance of the proposed measure. Then the proposed measure and the modified K-means algorithm are applied to reduce the edge degradation in vector quantisation of image compression. 相似文献
19.
基于神经网络集成的专家系统模型 总被引:9,自引:3,他引:9
提出一种基于神经网络集成的专家系统模型,并给出神经网络集成的构造算法.在该模型中神经网络集成作为专家系统的一个内嵌模块,用于专家系统的知识获取,克服了传统专家系统在知识获取中的"瓶颈"问题.并将该模型用于图书剔旧系统中,初步建成基于神经网络集成的图书剔旧专家系统原型. 相似文献
20.
Kuo-Lung Wu Author Vitae Miin-Shen Yang Author Vitae Author Vitae 《Pattern recognition》2009,42(11):2541-74
Cluster validity indexes can be used to evaluate the fitness of data partitions produced by a clustering algorithm. Validity indexes are usually independent of clustering algorithms. However, the values of validity indexes may be heavily influenced by noise and outliers. These noise and outliers may not influence the results from clustering algorithms, but they may affect the values of validity indexes. In the literature, there is little discussion about the robustness of cluster validity indexes. In this paper, we analyze the robustness of a validity index using the ? function of M-estimate and then propose several robust-type validity indexes. Firstly, we discuss the validity measure on a single data point and focus on those validity indexes that can be categorized as the mean type of validity indexes. We then propose median-type validity indexes that are robust to noise and outliers. Comparative examples with numerical and real data sets show that the proposed median-type validity indexes work better than the mean-type validity indexes. 相似文献