首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
根据聚类假设,提出一种新的基于图的半监督学习算法,称为密度敏感的半监督聚类。该算法引入一种密度敏感的距离测度,它能较好地反映聚类假设,并且充分挖掘了数据集中复杂的内在结构信息,同时与基于图的半监督学习方法相结合,使得算法在聚类性能上有了显著的提高。经过实验仿真进一步表明,该算法在特定图像应用上具有优越性。  相似文献   

2.
Blog clustering is an important approach for online public opinion analysis. The traditional clustering methods, usually group blogs by keywords, stories and timeline, which usually ignore opinions and emotions expressed in the blog articles. In this paper, an integrated graph-based model for clustering Chinese blogs by embedded sentiments is proposed. A novel graph-based representation and the corresponding clustering algorithm are applied on the Chinese blog search results. The proposed model SoB-graph considers not only sentiment words but also structural information in blogs. Experimental results show that comparing with the traditional graph-based document representation model and vector space document representation model, the proposed SoB-graph model has achieved better performance in clustering sentiments in Chinese blog documents.  相似文献   

3.
There is no doubt that clustering is one of the most studied data mining tasks. Nevertheless, it remains a challenging problem to solve despite the many proposed clustering approaches. Graph-based approaches solve the clustering task as a global optimization problem, while many other works are based on local methods. In this paper, we propose a novel graph-based algorithm “GBR” that relaxes some well-defined method even as improving the accuracy whilst keeping it simple. The primary motivation of our relaxation of the objective is to allow the reformulated objective to find well distributed cluster indicators for complicated data instances. This relaxation results in an analytical solution that avoids the approximated iterative methods that have been adopted in many other graph-based approaches. The experiments on synthetic and real data sets show that our relaxation accomplishes excellent clustering results. Our key contributions are: (1) we provide an analytical solution to solve the global clustering task as opposed to approximated iterative approaches; (2) a very simple implementation using existing optimization packages; (3) an algorithm with relatively less computation time over the number of data instances to cluster than other well defined methods in the literature.  相似文献   

4.
Graph determines the performance of graph-based semi-supervised classification. In this paper, we investigate how to construct a graph from multiple clusterings and propose a method called Semi-Supervised Classification using Multiple Clusterings (SSCMC in short). SSCMC firstly projects original samples into different random subspaces and performs clustering on the projected samples. Then, it constructs a graph by setting an edge between two samples if these two samples are clustered in the same cluster for each clustering. Next, it combines these graphs into a composite graph and incorporates the resulting composite graph with a graph-based semi-supervised classifier based on local and global consistency. Our experimental results on two publicly available facial images show that SSCMC not only achieves higher accuracy than other related methods, but also is robust to input parameters.  相似文献   

5.
多视角聚类通过利用多视角之间的互补性和一致性信息来提高聚类的性能.近年来受到越来越多的关注.为了及时掌握目前基于图的多视角聚类算法的研究现状与最新技术,对大量的、最新的多视角图聚类进行调查、归纳整理、分类及总结.根据多视角聚类涉及的算法机制和数学原理,并进一步分为基于图、基于网络和基于谱的聚类方法.不仅详细介绍了每一类...  相似文献   

6.
随着可获得的大规模蛋白质相互作用数据的迅速增长,从系统水平上对细胞机制的基本组件和结构的理解成为了一种可能。如今所面临的最大挑战是如何通过分析此类复杂的相互作用数据来反映细胞组织、进程以及功能的规律。基于图理论的聚类方法是分析蛋白质相互作用数据的有效手段。本文将从蛋白质相互作用网络(PPI网络)的图模型、聚类算法、评估方法及应用几个方面描述PPI网络聚类分析的最新研究进展。最后,讨论该方向研究所面临的挑战及进一步的研究方向。  相似文献   

7.
Self-organizing map (SOM) is an artificial neural network tool that is trained using unsupervised learning to produce a low dimensional representation of the input space, called a map. This map is generally the object of a clustering analysis step which aims to partition the referents vectors (map neurons) into compact and well-separated groups. In this paper, we consider the problem of the clustering SOM using different aspects: partitioning, hierarchical and graph coloring based techniques. Unlike the traditional clustering SOM techniques, which use k-means or hierarchical clustering, the graph-based approaches have the advantage of providing a partitioning of the SOM by simultaneously using dissimilarities and neighborhood relations provided by the map. We present the experimental results of several comparisons between these different ways of clustering.  相似文献   

8.
In recent years there has been a growing interest in clustering methods stemming from the spectral decomposition of the data affinity matrix, which are shown to present good results on a wide variety of situations. However, a complete theoretical understanding of these methods in terms of data distributions is not yet well understood. In this paper, we propose a spectral clustering based mode merging method for mean shift as a theoretically well-founded approach that enables a probabilistic interpretation of affinity based clustering through kernel density estimation. This connection also allows principled kernel optimization and enables the use of anisotropic variable-size kernels to match local data structures. We demonstrate the proposed algorithm's performance on image segmentation applications and compare its clustering results with the well-known Mean Shift and Normalized Cut algorithms.  相似文献   

9.
Although graph-based relaxed clustering (GRC) is one of the spectral clustering algorithms with straightforwardness and self-adaptability, it is sensitive to the parameters of the adopted similarity measure and also has high time complexity O(N(3)) which severely weakens its usefulness for large data sets. In order to overcome these shortcomings, after introducing certain constraints for GRC, an enhanced version of GRC [constrained GRC (CGRC)] is proposed to increase the robustness of GRC to the parameters of the adopted similarity measure, and accordingly, a novel algorithm called fast GRC (FGRC) based on CGRC is developed in this paper by using the core-set-based minimal enclosing ball approximation. A distinctive advantage of FGRC is that its asymptotic time complexity is linear with the data set size N. At the same time, FGRC also inherits the straightforwardness and self-adaptability from GRC, making the proposed FGRC a fast and effective clustering algorithm for large data sets. The advantages of FGRC are validated by various benchmarking and real data sets.  相似文献   

10.
针对现有基于图的流行排序的显著目标检测研究算法对于背景先验假设过于理想导致其在复杂背景图像检测中效果较不佳的问题,提出一种基于仿射传播聚类和流行排序的改进算法。首先根据位于边界的超像素集的颜色对比度进行背景提取;然后在背景估计和前景估计的显著性计算中利用仿射传播算法将提取的背景按颜色自适应聚类,根据各聚类簇分别采用经典的流行排序算法计算显著性,最后合并排序结果并融合多尺度显著值得到最终的显著图。在常用的公开的ASD、ECSSD、DUTOMRON和SED2数据集上与九种流行算法就准确率、召回率、F-measure、PR曲线和AUC值等指标和直观的视觉检测效果进行了比较,证明了所提算法的有效性。  相似文献   

11.
Clustering analysis is important for exploring complex datasets. Alternative clustering analysis is an emerging subfield involving techniques for the generation of multiple different clusterings, allowing the data to be viewed from different perspectives. We present two new algorithms for alternative clustering generation. A distinctive feature of our algorithms is their principled formulation of an objective function, facilitating the discovery of a subspace satisfying natural quality and orthogonality criteria. The first algorithm is a regularization of the Principal Components analysis method, whereas the second is a regularization of graph-based dimension reduction. In both cases, we demonstrate a globally optimum subspace solution can be computed. Experimental evaluation shows our techniques are able to equal or outperform a range of existing methods.  相似文献   

12.
Pattern Analysis and Applications - A similarity graph represents the local characteristics of a data set, and it is used as input to various clustering methods including spectral, graph-based, and...  相似文献   

13.
现实世界中高维数据无处不在,然而在高维数据中往往存在大量的冗余和噪声信息,这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能.实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中.因而,降维成为挖掘高维数据类簇结构的关键技术.在众多降维方法中,基于图的降维方法是研究的热点.然而,大部分基于图的降维算法存在以下两个问题:(1)需要计算或者学习邻接图,计算复杂度高;(2)降维的过程中没有考虑降维后的用途.针对这两个问题,提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型,通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图,具有样本个数的线性时间复杂度.在真实数据集上的实验结果表明,与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵,使投影后的数据有利于聚类.  相似文献   

14.
In this paper, we address the problem of comparing and classifying protein surfaces with graph-based methods. Comparison relies on matching surface graphs, extracted from the surfaces by considering concave and convex patches, through a kernelized version of the Softassign graph-matching algorithm. On the other hand, classification is performed by clustering the surface graphs with an EM-like algorithm, also relying on kernelized Softassign, and then calculating the distance of an input surface graph to the closest prototype. We present experiments showing the suitability of kernelized Softassign for both comparing and classifying surface graphs.  相似文献   

15.
We propose an algorithm providing an abstract representation of any polygonal object O in terms of spheres. The result is a graph-based skeleton capturing the general shape of O and its inner structure (respective positions of convex parts and their thickness). We define a first-order logic language expressing in a qualitative way the needed notions (distance, size and angle). Last, we propose methods to compare shapes using this graph-based skeleton of objects.  相似文献   

16.
The results of traditional clustering methods are usually unreliable as there is not any guidance from the data labels, while the class labels can be predicted more reliable by the semisupervised learning if the labels of partial data are given. In this paper, we propose an actively self-training clustering method, in which the samples are actively selected as training set to minimize an estimated Bayes error, and then explore semisupervised learning to perform clustering. Traditional graph-based semisupervised learning methods are not convenient to estimate the Bayes error; we develop a specific regularization framework on graph to perform semisupervised learning, in which the Bayes error can be effectively estimated. In addition, the proposed clustering algorithm can be readily applied in a semisupervised setting with partial class labels. Experimental results on toy data and real-world data sets demonstrate the effectiveness of the proposed clustering method on the unsupervised and the semisupervised setting. It is worthy noting that the proposed clustering method is free of initialization, while traditional clustering methods are usually dependent on initialization.  相似文献   

17.
对称非负矩阵分解SNMF作为一种基于图的聚类算法,能够更自然地捕获图表示中嵌入的聚类结构,并且在线性和非线性流形上获得更好的聚类结果,但对变量的初始化比较敏感。另外,标准的SNMF算法利用误差平方和来衡量分解的质量,对噪声和异常值敏感。为了解决这些问题,在集成学习视角下,提出一种鲁棒自适应对称非负矩阵分解聚类算法RS3NMF(robust self-adaptived symmetric nonnegative matrix factorization)。基于L2,1范数的RS3NMF模型缓解了噪声和异常值的影响,保持了特征旋转不变性,提高了模型的鲁棒性。同时,在不借助任何附加信息的前提下,利用SNMF对初始化特征的敏感性来逐步增强聚类性能。采用交替迭代方法优化,并保证目标函数值的收敛性。大量实验结果表明,所提RS3NMF算法优于其他先进的算法,具有较强的鲁棒性。  相似文献   

18.
Semi-supervised graph clustering: a kernel approach   总被引:6,自引:0,他引:6  
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.  相似文献   

19.
Clustering ensembles: models of consensus and weak partitions   总被引:4,自引:0,他引:4  
Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial, or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum-likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intraclass variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world data sets.  相似文献   

20.
在分析现有混合属性数据聚类算法存在问题的基础上,选用基于图论的松弛聚类算法作为解决问题的“基石”;引入基于“Local Scale”思想的高斯核参数计算步骤,对基于图论的松弛聚类算法进行了自适应改进,并对其点对距离计算过程进行了面向混合属性的度量扩展。在上述两步改进的基础上,结合聚类集成技术,提出了一种新的混合属性数据聚类算法,并进行了实例验证,结果表明提出的算法具有较强的参数鲁棒性和较高的聚类精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号