首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 203 毫秒
1.
基于成对约束的判别型半监督聚类分析   总被引:10,自引:1,他引:9  
尹学松  胡恩良  陈松灿 《软件学报》2008,19(11):2791-2802
现有一些典型的半监督聚类方法一方面难以有效地解决成对约束的违反问题,另一方面未能同时处理高维数据.通过提出一种基于成对约束的判别型半监督聚类分析方法来同时解决上述问题.该方法有效地利用了监督信息集成数据降维和聚类,即在投影空间中使用基于成对约束的K均值算法对数据聚类,再利用聚类结果选择投影空间.同时,该算法降低了基于约束的半监督聚类算法的计算复杂度,并解决了聚类过程中成对约束的违反问题.在一组真实数据集上的实验结果表明,与现有相关半监督聚类算法相比,新方法不仅能够处理高维数据,还有效地提高了聚类性能.  相似文献   

2.
The problem of clustering with side information has received much recent attention and metric learning has been considered as a powerful approach to this problem. Until now, various metric learning methods have been proposed for semi-supervised clustering. Although some of the existing methods can use both positive (must-link) and negative (cannot-link) constraints, they are usually limited to learning a linear transformation (i.e., finding a global Mahalanobis metric). In this paper, we propose a framework for learning linear and non-linear transformations efficiently. We use both positive and negative constraints and also the intrinsic topological structure of data. We formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem. The proposed non-linear method can be considered as an efficient kernel learning method that yields an explicit non-linear transformation and thus shows out-of-sample generalization ability. Experimental results on synthetic and real-world data sets show the effectiveness of our metric learning method for semi-supervised clustering tasks.  相似文献   

3.
Composite kernels for semi-supervised clustering   总被引:3,自引:2,他引:1  
A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.  相似文献   

4.
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the conventional clustering problem called semi-supervised clustering, which performs clustering in the presence of some background knowledge or supervisory information expressed as pairwise similarity or dissimilarity constraints. However, existing metric learning methods for semi-supervised clustering mostly perform global metric learning through a linear transformation. In this paper, we propose a new metric learning method that performs nonlinear transformation globally but linear transformation locally. In particular, we formulate the learning problem as an optimization problem and present three methods for solving it. Through some toy data sets, we show empirically that our locally linear metric adaptation (LLMA) method can handle some difficult cases that cannot be handled satisfactorily by previous methods. We also demonstrate the effectiveness of our method on some UCI data sets. Besides applying LLMA to semi-supervised clustering, we have also used it to improve the performance of content-based image retrieval systems through metric learning. Experimental results based on two real-world image databases show that LLMA significantly outperforms other methods in boosting the image retrieval performance.  相似文献   

5.
王亮  王士同 《计算机工程》2012,38(1):148-150
针对样本间的不均衡性,提出一种基于成对约束的动态加权半监督模糊核聚类算法。在传统模糊聚类算法中加入半监督学习机制,通过Mercer核将原数据空间映射到特征空间,为特征空间中的每个向量分配一个动态权值,由此得到新的目标函数,并结合一种简单的核参数选择方法实现数据分类。理论分析和实验结果表明,与模糊核聚类算法及成对约束的竞争群算法相比,该算法具有更好的聚类效果。  相似文献   

6.
最大间隔聚类是近来聚类分析的一个研究热点,为进一步提高其聚类准确性,提出一种基于成对约束的半监督最大间隔聚类算法.该算法在最大间隔聚类的目标函数中添加针对成对约束的损失项,从而对违反给定约束条件的分界面进行惩罚.对所得到的非凸优化问题,本文提出一种基于约束凹凸过程的迭代算法来进行高效求解.实验表明,本文提出的算法能极大地提高最大间隔聚类的准确性,其聚类性能也明显优于其他两种半监督聚类算法.  相似文献   

7.
Recently, integrating new knowledge sources such as pairwise constraints into various classification tasks with insufficient training data has been actively studied in machine learning. In this paper, we propose a novel semi-supervised classification approach, called semi-supervised classification with enhanced spectral kernel, which can simultaneously handle both sparse labeled data and additional pairwise constraints together with unlabeled data. Specifically, we first design a non-parameter spectral kernel learning model based on the squared loss function. Then we develop an efficient semi-supervised classification algorithm which takes advantage of Laplacian spectral regularization: semi-supervised classification with enhanced spectral kernel under the squared loss (ESKS). Finally, we conduct many experiments on a variety of synthetic and real-world data sets to demonstrate the effectiveness of the proposed ESKS algorithm.  相似文献   

8.
双层随机游走半监督聚类   总被引:3,自引:0,他引:3  
何萍  徐晓华  陆林  陈崚 《软件学报》2014,25(5):997-1013
半监督聚类旨在根据用户给出的必连和不连约束,把所有数据点划分到不同的簇中,从而获得更准确、更加符合用户要求的聚类结果.目前的半监督聚类算法大多数通过修改已有的聚类算法或者结合度规学习,使聚类结果与点对约束尽可能地保持一致,却很少考虑点对约束对周围无约束数据的显式影响程度.提出一种由在顶点上的低层随机游走和在组件上的高层随机游走两部分构成的双层随机游走半监督聚类算法,其中,低层随机游走主要负责计算选出的约束顶点对其他顶点的影响范围和影响程度,称为组件;高层随机游走则进一步将各个点对约束以自适应的强度在组件上进行约束传播,把它们在每个顶点上的影响综合在一个簇指示矩阵中.UCI数据集和大型真实数据集上的实验结果表明,双层随机游走半监督聚类算法比其他半监督聚类算法更准确,也比较高效.  相似文献   

9.
成对约束的属性加权半监督模糊核聚类算法   总被引:1,自引:0,他引:1  
在机器学习和数据挖掘中,带约束的半监督聚类是一个活跃的研究领域。为了利用约束条件获得表现更优异的聚类效果,提出了一种成对约束的属性加权半监督聚类算法,该方法充分考虑了属性间的不平衡性,在传统模糊聚类算法中融合半监督学习机制并通过Mercer核把原始的观察空间映射到高维特征空间。实验结果表明,该算法优于相似的成对约束的竞争群算法(PCCA)。  相似文献   

10.
K-Hub聚类算法是一种有效的高维数据聚类算法,但是它对初始聚类中心的选择非常敏感,并且对于靠近类边界的实例往往不能正确聚类.为了解决这些问题,提出一种结合主动学习和半监督聚类的K-Hub聚类算法.运用主动学习策略学习部分实例的关联限制,然后利用这些关联限制指导K-Hub的聚类过程.实验结果表明,基于主动学习的K-Hub聚类算法能有效提升K-Hub的聚类准确率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号