首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Semi-supervised graph clustering: a kernel approach   总被引:6,自引:0,他引:6  
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.  相似文献   

2.
3.
Efficient join-index-based spatial-join processing: a clustering approach   总被引:2,自引:0,他引:2  
A join-index is a data structure used for processing join queries in databases. Join-indices use precomputation techniques to speed up online query processing and are useful for data sets which are updated infrequently. The I/O cost of join computation using a join-index with limited buffer space depends primarily on the page-access sequence used to fetch the pages of the base relations. Given a join-index, we introduce a suite of methods based on clustering to compute the joins. We derive upper bounds on the length of the page-access sequences. Experimental results with Sequoia 2000 data sets show that the clustering method outperforms existing methods based on sorting and online-clustering heuristics.  相似文献   

4.
Software and Systems Modeling - In recent years, there has been a growing interest in the use of reference conceptual models to capture information about complex and sensitive business domains...  相似文献   

5.
Evolutionary algorithms (EAs) are often well-suited for optimization problems involving several, often conflicting objectives. Since 1985, various evolutionary approaches to multiobjective optimization have been developed that are capable of searching for multiple solutions concurrently in a single run. However, the few comparative studies of different methods presented up to now remain mostly qualitative and are often restricted to a few approaches. In this paper, four multiobjective EAs are compared quantitatively where an extended 0/1 knapsack problem is taken as a basis. Furthermore, we introduce a new evolutionary approach to multicriteria optimization, the strength Pareto EA (SPEA), that combines several features of previous multiobjective EAs in a unique manner. It is characterized by (a) storing nondominated solutions externally in a second, continuously updated population, (b) evaluating an individual's fitness dependent on the number of external nondominated points that dominate it, (c) preserving population diversity using the Pareto dominance relationship, and (d) incorporating a clustering procedure in order to reduce the nondominated set without destroying its characteristics. The proof-of-principle results obtained on two artificial problems as well as a larger problem, the synthesis of a digital hardware-software multiprocessor system, suggest that SPEA can be very effective in sampling from along the entire Pareto-optimal front and distributing the generated solutions over the tradeoff surface. Moreover, SPEA clearly outperforms the other four multiobjective EAs on the 0/1 knapsack problem  相似文献   

6.
This paper discusses new approaches to unsupervised fuzzy classification of multidimensional data. In the developed clustering models, patterns are considered to belong to some but not necessarily all clusters. Accordingly, such algorithms are called ‘semi-fuzzy’ or ‘soft’ clustering techniques. Several models to achieve this goal are investigated and corresponding implementation algorithms are developed. Experimental results are reported.  相似文献   

7.
8.
In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis (PCA). The proposed algorithm, Predictive Subspace Clustering (PSC) partitions the data into clusters while simultaneously estimating cluster-wise PCA parameters. The algorithm minimises an objective function that depends upon a new measure of influence for PCA models. A penalised version of the algorithm is also described for carrying our simultaneous subspace clustering and variable selection. The convergence of PSC is discussed in detail, and extensive simulation results and comparisons to competing methods are presented. The comparative performance of PSC has been assessed on six real gene expression data sets for which PSC often provides state-of-art results.  相似文献   

9.
Inferring dependencies from relations: a conceptual clustering approach   总被引:1,自引:0,他引:1  
In this paper we consider two related types of data dependencies that can hold in a relation: conjunctive implication rules between attribute‐value pairs, and functional dependencies. We present a conceptual clustering approach that can be used, with some small modifications, for inferring a cover for both types of dependencies. The approach consists of two steps. First, a particular clustered representation of the relation, called concept (or Galois ) lattice , is built. Then, a cover is extracted from the lattice built in the earlier step. Our main emphasis is on the second step. We study the computational complexity of the proposed approach and present an experimental comparison with other methods that confirms its validity. The results of the experiments show that our algorithm for extracting implication rules from concept lattices clearly outperforms an earlier algorithm, and suggest that the overall lattice‐based approach to inferring functional dependencies from relations can be seen as an alternative to traditional methods.  相似文献   

10.
In this research, we address the query clustering problem which involves determining globally optimal execution strategies for a set of queries. The need to process a set of queries together often arises in deductive database systems, scientific database systems, large bibliographic retrieval systems and several other database applications. We address the optimization problem from the perspective of overlaps in data requirements, and model the batched operations using a set-partitioning approach. In this model, we first consider the case of m queries each involving a two-way join operation. We develop a recursive methodology to determine all the processing strategies in this case. Next, we establish certain dominance properties among the strategies, and develop exact as well as heuristic algorithms for selecting an appropriate strategy. We extend this analysis to a clustering approach, and outline a framework for optimizing multiway joins. The results show that the proposed approach is viable and efficient, and can easily be incorporated into the query processing component of most database systems  相似文献   

11.
We address the issue of clustering examples by integrating multiple data sources, particularly numerical vectors and nodes in a network. We propose a new, efficient spectral approach, which integrates the two costs for clustering numerical vectors and clustering nodes in a network into a matrix trace, reducing the issue to a trace optimization problem which can be solved by an eigenvalue decomposition. We empirically demonstrate the performance of the proposed approach through a variety of experiments, including both synthetic and real biological datasets.  相似文献   

12.
To obtain a user-desired and accurate clustering result in practical applications, one way is to utilize additional pairwise constraints that indicate the relationship between two samples, that is, whether these samples belong to the same cluster or not. In this paper, we put forward a discriminative learning approach which can incorporate pairwise constraints into the recently proposed two-class maximum margin clustering framework. In particular, a set of pairwise loss functions is proposed, which features robust detection and penalization for violating the pairwise constraints. Consequently, the proposed method is able to directly find the partitioning hyperplane, which can separate the data into two groups and satisfy the given pairwise constraints as much as possible. In this way, it makes fewer assumptions on the distance metric or similarity matrix for the data, which may be complicated in practice, than existing popular constrained clustering algorithms. Finally, an iterative updating algorithm is proposed for the resulting optimization problem. The experiments on a number of real-world data sets demonstrate that the proposed pairwise constrained two-class clustering algorithm outperforms several representative pairwise constrained clustering counterparts in the literature.  相似文献   

13.
The aggregation of objectives in multiple criteria programming is one of the simplest and widely used approach. But it is well known that this technique sometimes fail in different aspects for determining the Pareto frontier. This paper proposes a new approach for multicriteria optimization, which aggregates the objective functions and uses a line search method in order to locate an approximate efficient point. Once the first Pareto solution is obtained, a simplified version of the former one is used in the context of Pareto dominance to obtain a set of efficient points, which will assure a thorough distribution of solutions on the Pareto frontier. In the current form, the proposed technique is well suitable for problems having multiple objectives (it is not limited to bi-objective problems) and require the functions to be continuous twice differentiable. In order to assess the effectiveness of this approach, some experiments were performed and compared with two recent well known population-based metaheuristics namely ParEGO and NSGA II. When compared to ParEGO and NSGA II, the proposed approach not only assures a better convergence to the Pareto frontier but also illustrates a good distribution of solutions. From a computational point of view, both stages of the line search converge within a short time (average about 150 ms for the first stage and about 20 ms for the second stage). Apart from this, the proposed technique is very simple, easy to implement and use to solve multiobjective problems.  相似文献   

14.
Machine Learning - State-of-the-art clustering algorithms provide little insight into the rationale for cluster membership, limiting their interpretability. In complex real-world applications, the...  相似文献   

15.
Spectral clustering: A semi-supervised approach   总被引:2,自引:0,他引:2  
Recently, graph-based spectral clustering algorithms have been developing rapidly, which are proposed as discrete combinatorial optimization problems and approximately solved by relaxing them into tractable eigenvalue decomposition problems. In this paper, we first review the current existing spectral clustering algorithms in a unified-framework way and give a straightforward explanation about spectral clustering. We also present a novel model for generalizing the unsupervised spectral clustering to semi-supervised spectral clustering. Under this model, prior information given by some instance-level constraints can be generalized to space-level constraints. We find that (undirected) graph built on the enlarged prior information is more meaningful, hence the boundaries of the clusters are more correct. Experimental results based on toy data, real-world data and image segmentation demonstrate the advantages of the proposed model.  相似文献   

16.
一种基于引力的聚类方法   总被引:8,自引:1,他引:8  
蒋盛益  李庆华 《计算机应用》2005,25(2):286-288,300
将万有引力的思想引入聚类分析中,提出了一种基于引力的聚类方法GCA(Gravitybased Clustering Approach),同时给出了一种计算聚类阈值的简单而有效的方法。GCA关于数据库的大小和属性个数具有近似线性时间复杂度,这使得聚类方法GCA具有好的扩展性。实验结果表明GCA可产生高质量的聚类结果。  相似文献   

17.
Neural Computing and Applications - This paper focuses on using feature salience to evaluate the quality of a partition when dealing with hard clustering. It is based on the hypothesis that a good...  相似文献   

18.

With the recent advancements in Internet-based computing models, the usage of cloud-based applications to facilitate daily activities is significantly increasing and is expected to grow further. Since the submitted workloads by users to use the cloud-based applications are different in terms of quality of service (QoS) metrics, it requires the analysis and identification of these heterogeneous cloud workloads to provide an efficient resource provisioning solution as one of the challenging issues to be addressed. In this study, we present an efficient resource provisioning solution using metaheuristic-based clustering mechanism to analyze cloud workloads. The proposed workload clustering approach used a combination of the genetic algorithm and fuzzy C-means technique to find similar clusters according to the user’s QoS requirements. Then, we used a gray wolf optimizer technique to make an appropriate scaling decision to provide the cloud resources for serving of cloud workloads. Besides, we design an extended framework to show interaction between users, cloud providers, and resource provisioning broker in the workload clustering process. The simulation results obtained under real workloads indicate that the proposed approach is efficient in terms of CPU utilization, elasticity, and the response time compared with the other approaches.

  相似文献   

19.
Semi-supervised fuzzy clustering: A kernel-based approach   总被引:1,自引:0,他引:1  
Huaxiang Zhang  Jing Lu 《Knowledge》2009,22(6):477-481
Semi-supervised clustering algorithms aim to improve the clustering accuracy under the supervisions of a limited amount of labeled data. Since kernel-based approaches, such as kernel-based fuzzy c-means algorithm (KFCM), have been successfully used in classification and clustering problems, in this paper, we propose a novel semi-supervised clustering approach using the kernel-based method based on KFCM and denote it the semi-supervised kernel fuzzy c-mean algorithm (SSKFCM). The objective function of SSKFCM is defined by adding classification errors of both the labeled and the unlabeled data, and its global optimum has been obtained through repeatedly updating the fuzzy memberships and the optimized kernel parameter. The objective function may have more than one local optimum, so we employ a function transformation technique to reformulate the objective function after a local minimum has been obtained, and select the best optimum as the solution to the objective function. Experimental results on both the artificial and several real data sets show SSKFCM performs better than its conventional counterparts and it achieves the best accurate clustering results when the parameter is optimized.  相似文献   

20.
In this paper, we investigate the use of a self-adaptive Pareto evolutionary multi-objective optimization (EMO) approach for evolving the controllers of virtual embodied organisms. The objective of this paper is to demonstrate the trade-off between quality of solutions and computational cost. We show empirically that evolving controllers using the proposed algorithm incurs significantly less computational cost when compared to a self-adaptive weighted sum EMO algorithm, a self-adaptive single-objective evolutionary algorithm (EA) and a hand-tuned Pareto EMO algorithm. The main contribution of the self-adaptive Pareto EMO approach is its ability to produce sufficiently good controllers with different locomotion capabilities in a single run, thereby reducing the evolutionary computational cost and allowing the designer to explore the space of good solutions simultaneously. Our results also show that self-adaptation was found to be highly beneficial in reducing redundancy when compared against the other algorithms. Moreover, it was also shown that genetic diversity was being maintained naturally by virtue of the system's inherent multi-objectivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号