首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that the final clustering is novel, robust, and scalable. In order to solve this challenging problem we introduce a new graph-based method. Our method uses the evidence accumulated in the previously obtained clusterings, and produces a very good quality final clustering. The number of clusters in the final clustering is obtained automatically; this is another important advantage of our technique. Experimental test results on real and synthetically generated data sets demonstrate the effectiveness of our new method.  相似文献   

2.
This paper presents a fast simulated annealing framework for combining multiple clusterings based on agreement measures between partitions, which are originally used to evaluate a clustering algorithm. Although we can follow a greedy strategy to optimize these measures as the objective functions of clustering ensemble, it may suffer from local convergence and simultaneously incur too large computational cost. To avoid local optima, we consider a simulated annealing optimization scheme that operates through single label changes. Moreover, for the measures between partitions based on the relationship (joined or separated) of pairs of objects, we can update them incrementally for each label change, which ensures that our optimization scheme is computationally feasible. The experimental evaluations demonstrate that the proposed framework can achieve promising results.  相似文献   

3.
Graph determines the performance of graph-based semi-supervised classification. In this paper, we investigate how to construct a graph from multiple clusterings and propose a method called Semi-Supervised Classification using Multiple Clusterings (SSCMC in short). SSCMC firstly projects original samples into different random subspaces and performs clustering on the projected samples. Then, it constructs a graph by setting an edge between two samples if these two samples are clustered in the same cluster for each clustering. Next, it combines these graphs into a composite graph and incorporates the resulting composite graph with a graph-based semi-supervised classifier based on local and global consistency. Our experimental results on two publicly available facial images show that SSCMC not only achieves higher accuracy than other related methods, but also is robust to input parameters.  相似文献   

4.
Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.  相似文献   

5.
6.
Dang  Xuan Hong  Bailey  James 《Machine Learning》2015,98(1-2):7-30
Machine Learning - Clustering is often referred to as unsupervised learning which aims at uncovering hidden structures from data. Unfortunately, though widely being used as one of the principal...  相似文献   

7.
Clustering has a long and rich history in a variety of scientific fields. Finding natural groupings of a data set is a hard task as attested by hundreds of clustering algorithms in the literature. Each clustering technique makes some assumptions about the underlying data set. If the assumptions hold, good clusterings can be expected. It is hard, in some cases impossible, to satisfy all the assumptions. Therefore, it is beneficial to apply different clustering methods on the same data set, or the same method with varying input parameters or both. Then, the clusterings obtained can be combined into a final clustering having better overall quality. Combining multiple clusterings into a final clustering which has better overall quality has gained significant importance recently. Our contributions are a novel method for combining a collection of clusterings into a final clustering which is based on cliques, and a novel output-sensitive clique finding algorithm which works on large and dense graphs and produces output in a short amount of time. Extensive experimental studies on real and artificial data sets demonstrate the effectiveness of our contributions.  相似文献   

8.
Clustering analysis is important for exploring complex datasets. Alternative clustering analysis is an emerging subfield involving techniques for the generation of multiple different clusterings, allowing the data to be viewed from different perspectives. We present two new algorithms for alternative clustering generation. A distinctive feature of our algorithms is their principled formulation of an objective function, facilitating the discovery of a subspace satisfying natural quality and orthogonality criteria. The first algorithm is a regularization of the Principal Components analysis method, whereas the second is a regularization of graph-based dimension reduction. In both cases, we demonstrate a globally optimum subspace solution can be computed. Experimental evaluation shows our techniques are able to equal or outperform a range of existing methods.  相似文献   

9.
On constructing an optimal consensus clustering from multiple clusterings   总被引:1,自引:0,他引:1  
Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mining. In this paper, we formalize a set-theoretic model for computing such a similarity measure. Roughly speaking, in this model we have k>1 partitions (clusters) of the same data set each containing the same number of sets and the goal is to align the sets in each partition to minimize a similarity measure. For k=2, a polynomial-time solution was proposed by Gusfield (Information Processing Letters 82 (2002) 159-164). In this paper, we show that the problem is MAX-SNP-hard for k=3 even if each partition in each cluster contains no more than 2 elements and provide a -approximation algorithm for the problem for any k.  相似文献   

10.
Machine Learning - Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence...  相似文献   

11.
Image retrieval using multiple evidence ranking   总被引:5,自引:0,他引:5  
The World Wide Web is the largest publicly available image repository and a natural source of attention. An immediate consequence is that searching for images on the Web has become a current and important task. To search for images of interest, the most direct approach is keyword-based searching. However, since images on the Web are poorly labeled, direct application of standard keyword-based image searching techniques frequently yields poor results. We propose a comprehensive solution to this problem. In our approach, multiple sources of evidence related to the images are considered. To allow combining these distinct sources of evidence, we introduce an image retrieval model based on Bayesian belief networks. To evaluate our approach, we perform experiments on a reference collection composed of 54000 Web images. Our results indicate that retrieval using an image surrounding text passages is as effective as standard retrieval based on HTML tags. This is an interesting result because current image search engines in the Web usually do not take text passages into consideration. Most important, according to our results, the combination of information derived from text passages with information derived from HTML tags leads to improved retrieval, with relative gains in average precision figures of roughly 50 percent, when compared to the results obtained by the use of each source of evidence in isolation.  相似文献   

12.
On combining multiple clusterings: an overview and a new perspective   总被引:1,自引:0,他引:1  
Many problems can be reduced to the problem of combining multiple clusterings. In this paper, we first summarize different application scenarios of combining multiple clusterings and provide a new perspective of viewing the problem as a categorical clustering problem. We then show the connections between various consensus and clustering criteria and discuss the complexity results of the problem. Finally we propose a new method to determine the final clustering. Experiments on kinship terms and clustering popular music from heterogeneous feature sets show the effectiveness of combining multiple clusterings.  相似文献   

13.
The improvement of many applications such as web search, latency reduction, and personalization/ recommendation systems depends on surfing prediction. Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper, we combine two classification techniques, namely, the Markov model and Support Vector Machines (SVM), to resolve prediction using Dempster’s rule. Such fusion overcomes the inability of the Markov model in predicting the unseen data as well as overcoming the problem of multiclassification in the case of SVM, especially when dealing with large number of classes. We apply feature extraction to increase the power of discrimination of SVM. In addition, during prediction we employ domain knowledge to reduce the number of classifiers for the improvement of accuracy and the reduction of prediction time. We demonstrate the effectiveness of our hybrid approach by comparing our results with widely used techniques, namely, SVM, the Markov model, and association rule mining.  相似文献   

14.
Comparing subspace clusterings   总被引:5,自引:0,他引:5  
We present the first framework for comparing subspace clusterings. We propose several distance measures for subspace clusterings, including generalizations of well-known distance measures for ordinary clusterings. We describe a set of important properties for any measure for comparing subspace clusterings and give a systematic comparison of our proposed measures in terms of these properties. We validate the usefulness of our subspace clustering distance measures by comparing clusterings produced by the algorithms FastDOC, HARP, PROCLUS, ORCLUS, and SSPC. We show that our distance measures can be also used to compare partial clusterings, overlapping clusterings, and patterns in binary data matrices.  相似文献   

15.
This paper proposes a novel face detection method using local gradient patterns (LGP), in which each bit of the LGP is assigned the value one if the neighboring gradient of a given pixel is greater than the average of eight neighboring gradients, and 0 otherwise. LGP representation is insensitive to global intensity variations like the other representations such as local binary patterns (LBP) and modified census transform (MCT), and to local intensity variations along the edge components. We show that LGP has a higher discriminant power than LBP in both the difference between face histogram and non-face histogram and the detection error based on the face/face distance and face/non-face distance. We also reduce the false positive detection error greatly by accumulating evidences from multi-scale detection results with negligible extra computation time. In experiments using the MIT+CMU and FDDB databases, the proposed LGP-based face detection followed by evidence accumulation method provides a face detection rate that is 5–27% better than those of existing methods, and reduces the number of false positives greatly.  相似文献   

16.
Combining multiple knowledge bases   总被引:2,自引:0,他引:2  
Combining knowledge present in multiple knowledge base systems into a single knowledge base is discussed. A knowledge based system can be considered an extension of a deductive database in that it permits function symbols as part of the theory. Alternative knowledge bases that deal with the same subject matter are considered. The authors define the concept of combining knowledge present in a set of knowledge bases and present algorithms to maximally combine them so that the combination is consistent with respect to the integrity constraints associated with the knowledge bases. For this, the authors define the concept of maximality and prove that the algorithms presented combine the knowledge bases to generate a maximal theory. The authors also discuss the relationships between combining multiple knowledge bases and the view update problem  相似文献   

17.
Machine Learning - We deploy a recently proposed framework for mining subjectively interesting patterns from data to the problem of alternative clustering, where patterns are sets of clusters...  相似文献   

18.
Dempster’s combination rule can only be applied to independent bodies of evidence. One occurrence of dependence between two bodies of evidence is when they result from a common source. This paper proposes an improved method for combining dependent bodies of evidence which takes the significance of the common information sources into consideration. The method is based on the significance weighting operation and the “decombination” operation. A numerical example is illustrated to show the use and effectiveness of the proposed method.  相似文献   

19.
《Information Fusion》2009,10(2):124-136
Video target tracking is the process of estimating the current state, and predicting the future state of a target from a sequence of video sensor measurements. Multitarget video tracking is complicated by the fact that targets can occlude one another, affecting video feature measurements in a highly non-linear and difficult to model fashion. In this paper, we apply a multisensory fusion approach to the problem of multitarget video tracking with occlusion. The approach is based on a data-driven method (CFA) to selecting the features and fusion operations that improve a performance criterion.Each sensory cue is treated as a scoring system. Scoring behavior is characterized by a rank–score function. A diversity measure, based on the variation in rank–score functions, is used to dynamically select the scoring systems and fusion operations that produce the best tracking performance. The relationship between the diversity measure and the tracking accuracy of two fusion operations, a linear score combination and an average rank combination, is evaluated on a set of 12 video sequences. These results demonstrate that using the rank–score characteristic as a diversity measure is an effective method to dynamically select scoring systems and fusion operations that improve the performance of multitarget video tracking with occlusions.  相似文献   

20.
Finding centroid clusterings with entropy-based criteria   总被引:1,自引:3,他引:1  
We investigate the following problem: Given a set of candidate clusterings for a common set of objects, find a centroid clustering that is most compatible to the input set. First, we propose a series of entropy-based distance functions for comparing various clusterings. Such functions enable us to directly select the local centroid from the candidate set. Second, we present two combining methods for the global centroid. The selected/combined centroid clustering is likely to be a good choice, i.e., top or middle ranked in terms of closeness to the true clustering. Finally, we evaluate their effectiveness on both artificial and real data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号