首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
In this article, we address the problem of automatic constraint selection to improve the performance of constraint-based clustering algorithms. To this aim we propose a novel active learning algorithm that relies on a k-nearest neighbors graph and a new constraint utility function to generate queries to the human expert. This mechanism is paired with propagation and refinement processes that limit the number of constraint candidates and introduce a minimal diversity in the proposed constraints. Existing constraint selection heuristics are based on a random selection or on a min–max criterion and thus are either inefficient or more adapted to spherical clusters. Contrary to these approaches, our method is designed to be beneficial for all constraint-based clustering algorithms. Comparative experiments conducted on real datasets and with two distinct representative constraint-based clustering algorithms show that our approach significantly improves clustering quality while minimizing the number of human expert solicitations.  相似文献   

2.
一种结合主动学习的半监督文档聚类算法   总被引:1,自引:0,他引:1  
半监督文档聚类,即利用少量具有监督信息的数据来辅助无监督文档聚类,近几年来逐渐成为机器学习和数据挖掘领域研究的热点问题.由于获取大量监督信息费时费力,因此,国内外学者考虑如何获得少量但对聚类性能提高显著的监督信息.提出一种结合主动学习的半监督文档聚类算法,通过引入成对约束信息指导DBSCAN的聚类过程来提高聚类性能,得到一种半监督文档聚类算法Cons-DBSCAN.通过对约束集中所含信息量的衡量和对DBSCAN算法本身的分析,提出了一种启发式的主动学习算法,能够选取含信息量大的成对约束集,从而能够更高效地辅助半监督文档聚类.实验结果表明,所提出的算法能够高效地进行文档聚类.通过主动学习算法获得的成对约束集,能够显著地提高聚类性能.并且,算法的性能优于两个代表性的结合主动学习的半监督聚类算法.  相似文献   

3.
Clustering requires the user to define a distance metric, select a clustering algorithm, and set the hyperparameters of that algorithm. Getting these right, so that a clustering is obtained that meets the users subjective criteria, can be difficult and tedious. Semi-supervised clustering methods make this easier by letting the user provide must-link or cannot-link constraints. These are then used to automatically tune the similarity measure and/or the optimization criterion. In this paper, we investigate a complementary way of using the constraints: they are used to select an unsupervised clustering method and tune its hyperparameters. It turns out that this very simple approach outperforms all existing semi-supervised methods. This implies that choosing the right algorithm and hyperparameter values is more important than modifying an individual algorithm to take constraints into account. In addition, the proposed approach allows for active constraint selection in a more effective manner than other methods.  相似文献   

4.
Semi-supervised dimensionality reduction has attracted an increasing amount of attention in this big-data era. Many algorithms have been developed with a small number of pairwise constraints to achieve performances comparable to those of fully supervised methods. However, one challenging problem with semi-supervised approaches is the appropriate choice of the constraint set, including the cardinality and the composition of the constraint set which, to a large extent, affects the performance of the resulting algorithm. In this work, we address the problem by incorporating ensemble subspaces and active learning into dimensionality reduction and propose a new global and local scatter based semi-supervised dimensionality reduction method with active constraints selection. Unlike traditional methods that select the supervised information in one subspace, we pick up pairwise constraints in ensemble subspaces, where a novel active learning algorithm is designed with both exploration and filtering to generate informative pairwise constraints. The automatic constraint selection approach proposed in this paper can be generalized to be used with all constraint-based semi-supervised learning algorithms. Comparative experiments are conducted on four face database and the results validate the effectiveness of the proposed method.  相似文献   

5.
Constraint Score is a recently proposed method for feature selection by using pairwise constraints which specify whether a pair of instances belongs to the same class or not. It has been shown that the Constraint Score, with only a small amount of pairwise constraints, achieves comparable performance to those fully supervised feature selection methods such as Fisher Score. However, one major disadvantage of the Constraint Score is that its performance is dependent on a good selection on the composition and cardinality of constraint set, which is very challenging in practice. In this work, we address the problem by importing Bagging into Constraint Score and a new method called Bagging Constraint Score (BCS) is proposed. Instead of seeking one appropriate constraint set for single Constraint Score, in BCS we perform multiple Constraint Score, each of which uses a bootstrapped subset of original given constraint set. Diversity analysis on individuals of ensemble shows that resampling pairwise constraints is helpful for simultaneously improving accuracy and diversity of individuals. We conduct extensive experiments on a series of high-dimensional datasets from UCI repository and gene databases, and the experimental results validate the effectiveness of the proposed method.  相似文献   

6.
半监督聚类中基于密度的约束扩展方法   总被引:1,自引:0,他引:1       下载免费PDF全文
张亮  李敏强 《计算机工程》2008,34(10):13-15
现有的半监督聚类方法较少利用数据集空间结构信息,限制了聚类算法的性能。该文提出一种基于密度的约束扩展方法(DCE),将数据集以图的形式表达,定义一种基于密度的图形相似度。根据样本点间的距离和相似度关系,对已知约束集进行扩展,扩展后的约束集可用于各种半监督聚类算法。以约束完全连接聚类和成对约束K均值方法为例,说明了约束扩展方法的应用。实验表明,DCE能够有效地提升半监督聚类算法的性能。  相似文献   

7.
Selecting correct dimensions is very important to subspace clustering and is a challenging issue. This paper studies semi-supervised approach to the problem. In this setting, limited domain knowledge in the form of space level pair-wise constraints, i.e., must-links and cannot-links, are available. We propose a semi-supervised subspace clustering (S3C) algorithm that exploits constraint inconsistence for dimension selection. Our algorithm firstly correlates globally inconsistent constraints to dimensions in which they are consistent, then unites constraints with common correlating dimensions, and finally forms the subspaces according to the constraint unions. Experimental results show that S3C is superior to the typical unsupervised subspace clustering algorithm FINDIT, and the other constraint based semi-supervised subspace clustering algorithm SC-MINER.  相似文献   

8.
双层随机游走半监督聚类   总被引:3,自引:0,他引:3  
何萍  徐晓华  陆林  陈崚 《软件学报》2014,25(5):997-1013
半监督聚类旨在根据用户给出的必连和不连约束,把所有数据点划分到不同的簇中,从而获得更准确、更加符合用户要求的聚类结果.目前的半监督聚类算法大多数通过修改已有的聚类算法或者结合度规学习,使聚类结果与点对约束尽可能地保持一致,却很少考虑点对约束对周围无约束数据的显式影响程度.提出一种由在顶点上的低层随机游走和在组件上的高层随机游走两部分构成的双层随机游走半监督聚类算法,其中,低层随机游走主要负责计算选出的约束顶点对其他顶点的影响范围和影响程度,称为组件;高层随机游走则进一步将各个点对约束以自适应的强度在组件上进行约束传播,把它们在每个顶点上的影响综合在一个簇指示矩阵中.UCI数据集和大型真实数据集上的实验结果表明,双层随机游走半监督聚类算法比其他半监督聚类算法更准确,也比较高效.  相似文献   

9.
针对现有稀疏子空间聚类算法获取的系数矩阵不能准确反应高维空间中数据分布的稀疏性的不足,提出一种分式函数约束的稀疏子空间聚类模型,并利用交替方向迭代方法给出该模型的解。在无噪声情形下,证明了该方法获取的系数矩阵具有块对角结构,这为其准确获取数据结构提供了理论保证;在含噪声情形下,对异常点噪声同样采用分式函数约束作为正则项,提高了模型的鲁棒性。在人工数据集、Extended Yale B库和Hopkins155数据集上的实验结果表明,基于分式函数约束的稀疏子空间聚类方法不仅提高了聚类结果的准确率,而且对异常点噪声具有更好的鲁棒性。  相似文献   

10.
一种基于同类约束的半监督近邻反射传播聚类方法   总被引:1,自引:0,他引:1  
以近邻反射传播 (Affinity propagation, AP) 聚类算法为基础, 提出了一种基于同类约束的半监督近邻反射传播聚类方法 (Semi-supervised affinity propagation clustering method with homogeneity constraints, HCSAP).该方法在聚类目标函数中引入同类约束项, 以保证聚类结果与同类集先验信息一致.利用最大和信任传播 (Max-sum belief propagation) 优化过程对目标函数进行求解, 导出同类约束下的吸引度 (Responsibility) 和归属度 (Availability) 的迭代方程.人工数据集和真实数据集上的实验结果表明本文所提方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号