首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 455 毫秒
1.
Semi-supervised fuzzy co-clustering algorithm for document categorization   总被引:1,自引:1,他引:0  
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.  相似文献   

2.
针对当前多文档聚合推导引起的敏感信息泄露问题存在风险大、隐蔽性高的特点,提出了一种基于半监督聚类的文档敏感信息推导方法。首先,为确保在较小的时间开销下获得高质量的约束信息,设计了一种新颖的二阶约束主动学习算法,它通过选择不确定性最大的样本点来生成信息量最大的约束闭包;然后,在引入约束信息的基础上结合DBSCAN提出一种新的半监督聚类算法,它能够有效解决DBSCAN算法存在的边界模糊问题,提高文档聚类准确性;最后,在半监督聚类结果的基础上,对相似文档进行敏感信息可能性测度。实验表明,半监督聚类算法准确率提升明显,推导方法能够有效推导出敏感信息。  相似文献   

3.
成对约束的属性加权半监督模糊核聚类算法   总被引:1,自引:0,他引:1  
在机器学习和数据挖掘中,带约束的半监督聚类是一个活跃的研究领域。为了利用约束条件获得表现更优异的聚类效果,提出了一种成对约束的属性加权半监督聚类算法,该方法充分考虑了属性间的不平衡性,在传统模糊聚类算法中融合半监督学习机制并通过Mercer核把原始的观察空间映射到高维特征空间。实验结果表明,该算法优于相似的成对约束的竞争群算法(PCCA)。  相似文献   

4.
Clustering requires the user to define a distance metric, select a clustering algorithm, and set the hyperparameters of that algorithm. Getting these right, so that a clustering is obtained that meets the users subjective criteria, can be difficult and tedious. Semi-supervised clustering methods make this easier by letting the user provide must-link or cannot-link constraints. These are then used to automatically tune the similarity measure and/or the optimization criterion. In this paper, we investigate a complementary way of using the constraints: they are used to select an unsupervised clustering method and tune its hyperparameters. It turns out that this very simple approach outperforms all existing semi-supervised methods. This implies that choosing the right algorithm and hyperparameter values is more important than modifying an individual algorithm to take constraints into account. In addition, the proposed approach allows for active constraint selection in a more effective manner than other methods.  相似文献   

5.
Over the last decade there has been an increasing interest in semi-supervised clustering. Several studies have suggested that even a small amount of supervised information can significantly improve the results of unsupervised learning. One popular method of incorporating partial supervised information is through pair-wise constraints indicating whether a certain pair of patterns should belong to the same (Must-link) or different (Dont-link) clusters. In this study we propose a novel semi-supervised fuzzy clustering algorithm (SSFCA). The supervised information is incorporated via a method quantifying Must-link and/or Dont-link constraints. Additionally, we present an extension of SSFCA that allows the algorithm to automatically detect the number of clusters in the data. We apply SSFCA to the intrinsic problem of gene expression profiles clustering. The advantageous properties of fuzzy logic, inherited to SSFCA, allow genes to belong to more than one group, revealing this way more profound information concerning their multiple functioning roles. Finally, we investigate the incorporation of prior biological knowledge arriving from Gene Ontology in the process of selecting pair-wise constraints. Simulations on artificial and real life datasets proved that the proposed SSFCA significantly outperformed other standard and semi-supervised clustering methods.  相似文献   

6.
In this work we consider a fuzzy set based approach to the issue of discovery in databases (database mining). The concept of linguistic summaries is described and shown to be a user friendly way to present information contained in a database. We discuss methods for measuring the amount of information provided by a linguistic summary. The issue of conjecturing, how to decide on which summaries may be informative, is discussed. We suggest two approaches to help us focus on relevant summaries. The first method, called the template method, makes use of linguistic concepts related to the domain of the attributes involved in the summaries. The second approach uses the mountain clustering method to help focus our summaries. © 1996 John Wiley & Sons, Inc.  相似文献   

7.
Extracting fuzzy classification rules from partially labeled data   总被引:1,自引:1,他引:0  
The interpretability and flexibility of fuzzy if-then rules make them a popular basis for classifiers. It is common to extract them from a database of examples. However, the data available in many practical applications are often unlabeled, and must be labeled manually by the user or by expensive analyses. The idea of semi-supervised learning is to use as much labeled data as available and try to additionally exploit the information in the unlabeled data. In this paper we describe an approach to learn fuzzy classification rules from partially labeled datasets.  相似文献   

8.
王亮  王士同 《计算机工程》2012,38(1):148-150
针对样本间的不均衡性,提出一种基于成对约束的动态加权半监督模糊核聚类算法。在传统模糊聚类算法中加入半监督学习机制,通过Mercer核将原数据空间映射到特征空间,为特征空间中的每个向量分配一个动态权值,由此得到新的目标函数,并结合一种简单的核参数选择方法实现数据分类。理论分析和实验结果表明,与模糊核聚类算法及成对约束的竞争群算法相比,该算法具有更好的聚类效果。  相似文献   

9.
Semi-supervised document clustering, which takes into account limited supervised data to group unlabeled documents into clusters, has received significant interest recently. Because of getting supervised data may be expensive, it is important to get most informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm and a new method for actively selecting informative instance-level constraints to get improved clustering performance. The semi- supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN) algorithm, which incorporates instance-level constraints to guide the clustering process in DBSCAN. An active learning approach is proposed to select informative document pairs for obtaining user feedbacks. Experimental results show that Cons-DBSCAN with our proposed active learning approach can improve the clustering performance significantly when given a relatively small amount of constraints.  相似文献   

10.
为了有效的使用用户给定的先验信息,并从多个角度考虑图像分割问题,本文提出了应用于彩色图像分割的半监督多目标进化模糊聚类算法。首先,将半监督方法引入到多目标进化聚类算法中,通过使用少量的监督信息指导聚类过程;其次,将最大熵正则化引入到带有监督信息的目标函数中,使目标函数具有清晰的物理意义;最后,利用监督信息构造基于相似性度量的有效指标从非支配解集中选择一个最优解。实验结果表明,该算法与传统的多目标进化聚类算法及半监督模糊聚类算法相比具有更好的灵活性和准确性。  相似文献   

11.
为了解决传统聚类由于缺少有效指导而导致图像分割结果不理想的问题,将半监督方法引入到多目标进化模糊聚类算法中,提出了一种基于半监督的多目标进化模糊聚类。图像分割算法通过构造基于半监督的类内紧致性函数和类间分离度函数,利用监督信息指导聚类过程获得非支配解集。为了从非支配解集中选择一个最优解,利用监督信息构造了基于相似性度量的有效性指标。实验结果表明,提出的方法在分割准确率和视觉效果上明显优于无监督的聚类方法。  相似文献   

12.
CrossClus: user-guided multi-relational clustering   总被引:2,自引:0,他引:2  
Most structured data in real-life applications are stored in relational databases containing multiple semantically linked relations. Unlike clustering in a single table, when clustering objects in relational databases there are usually a large number of features conveying very different semantic information, and using all features indiscriminately is unlikely to generate meaningful results. Because the user knows her goal of clustering, we propose a new approach called CrossClus, which performs multi-relational clustering under user’s guidance. Unlike semi-supervised clustering which requires the user to provide a training set, we minimize the user’s effort by using a very simple form of user guidance. The user is only required to select one or a small set of features that are pertinent to the clustering goal, and CrossClus searches for other pertinent features in multiple relations. Each feature is evaluated by whether it clusters objects in a similar way with the user specified features. We design efficient and accurate approaches for both feature selection and object clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of CrossClus. The work was supported in part by the U.S. National Science Foundation NSF IIS-03-13678 and NSF BDI-05-15813, and an IBM Faculty Award. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect views of the funding agencies.  相似文献   

13.
This paper presents a new semi-supervised fuzzy c-means clustering for data with clusterwise tolerance by opposite criteria. In semi-supervised clustering, pairwise constraints, that is, must-link and cannot-link, are frequently used in order to improve clustering performances. From the viewpoint of handling pairwise constraints, a new semi-supervised fuzzy c-means clustering is proposed by introducing clusterwise tolerance-based pairwise constraints. First, a concept of clusterwise tolerance-based pairwise constraints is introduced. Second, the optimization problems of the proposed method are formulated. Especially, must-link and cannot-link are handled by opposite criteria in our proposed method. Third, a new clustering algorithm is constructed based on the above discussions. Finally, the effectiveness of the proposed algorithm is verified through numerical examples.  相似文献   

14.
现有的半监督聚类集成方法能利用先验信息,使集成的准确性、鲁棒性和稳定性得到提高,但在集成阶段加入成对约束信息时,只考虑了给定的约束信息而忽视了约束点与被约束点的邻域点之间的关系.针对此问题,提出了一种基于数据相关性的半监督模糊聚类集成方法.该方法首先利用半监督模糊聚类算法建立集成信息矩阵,并将其转换为相似性矩阵;然后,利用已知的约束信息及约束点与被约束点的邻域点之间的关系来修改相似性矩阵;最后,利用图划分算法得到最终的聚类结果.真实数据上的实验结果表明,提出的方法可以有效提高聚类质量.  相似文献   

15.
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the conventional clustering problem called semi-supervised clustering, which performs clustering in the presence of some background knowledge or supervisory information expressed as pairwise similarity or dissimilarity constraints. However, existing metric learning methods for semi-supervised clustering mostly perform global metric learning through a linear transformation. In this paper, we propose a new metric learning method that performs nonlinear transformation globally but linear transformation locally. In particular, we formulate the learning problem as an optimization problem and present three methods for solving it. Through some toy data sets, we show empirically that our locally linear metric adaptation (LLMA) method can handle some difficult cases that cannot be handled satisfactorily by previous methods. We also demonstrate the effectiveness of our method on some UCI data sets. Besides applying LLMA to semi-supervised clustering, we have also used it to improve the performance of content-based image retrieval systems through metric learning. Experimental results based on two real-world image databases show that LLMA significantly outperforms other methods in boosting the image retrieval performance.  相似文献   

16.
一种结合主动学习的半监督文档聚类算法   总被引:1,自引:0,他引:1  
半监督文档聚类,即利用少量具有监督信息的数据来辅助无监督文档聚类,近几年来逐渐成为机器学习和数据挖掘领域研究的热点问题.由于获取大量监督信息费时费力,因此,国内外学者考虑如何获得少量但对聚类性能提高显著的监督信息.提出一种结合主动学习的半监督文档聚类算法,通过引入成对约束信息指导DBSCAN的聚类过程来提高聚类性能,得到一种半监督文档聚类算法Cons-DBSCAN.通过对约束集中所含信息量的衡量和对DBSCAN算法本身的分析,提出了一种启发式的主动学习算法,能够选取含信息量大的成对约束集,从而能够更高效地辅助半监督文档聚类.实验结果表明,所提出的算法能够高效地进行文档聚类.通过主动学习算法获得的成对约束集,能够显著地提高聚类性能.并且,算法的性能优于两个代表性的结合主动学习的半监督聚类算法.  相似文献   

17.
距离度量对模糊聚类算法FCM的聚类结果有关键性的影响。实际应用中存在这样一种场景,聚类的数据集中存在着一定量的带标签的成对约束集合的辅助信息。为了充分利用这些辅助信息,首先提出了一种基于混合距离学习方法,它能利用这样的辅助信息来学习出数据集合的距离度量公式。然后,提出了一种基于混合距离学习的鲁棒的模糊C均值聚类算法(HR-FCM算法),它是一种半监督的聚类算法。算法HR-FCM既保留了GIFP-FCM(Generalized FCM algorithm with improved fuzzy partitions)算法的鲁棒性等性能,也因为所采用更为合适的距离度量而具有更好的聚类性能。实验结果证明了所提算法的有效性。  相似文献   

18.
对于所提出的建立在成对约束基础之上的半监督凝聚层次聚类算法,对聚类簇进行半监督处理的最主要目的在于借助于对样本监督信息的合理应用,达到提高样本在无监督状态下学习性能的目标.在现阶段的技术条件支持下,以半监督聚类分析为核心,建立在must link以及cannot link基础之上的约束关系被广泛地应用于样本聚类分析的过程当中.从这一角度上来说,为了使聚类簇与聚类簇之间的距离关系表述更加的真实与精确,就要求通过对成对约束关系的综合应用,实现对聚类簇距离的有效调整与优化.  相似文献   

19.
In recent years feedback approaches have been used in relating low-level image features with concepts to overcome the subjective nature of the human image interpretation. Generally, in these systems when the user starts with a new query, the entire prior experience of the system is lost. In this paper, we address the problem of incorporating prior experience of the retrieval system to improve the performance on future queries. We propose a semi-supervised fuzzy clustering method to learn class distribution (meta knowledge) in the sense of high-level concepts from retrieval experience. Using fuzzy rules, we incorporate the meta knowledge into a probabilistic feature relevance feedback approach to improve the retrieval performance. Results on synthetic and real databases show that our approach provides better retrieval precision compared to the case when no retrieval experience is used.  相似文献   

20.
In Gaussian mixture modeling, it is crucial to select the number of Gaussians for a sample set, which becomes much more difficult when the overlap in the mixture is larger. Under regularization theory, we aim to solve this problem using a semi-supervised learning algorithm through incorporating pairwise constraints into entropy regularized likelihood (ERL) learning which can make automatic model selection for Gaussian mixture. The simulation experiments further demonstrate that the presented semi-supervised learning algorithm (i.e., the constrained ERL learning algorithm) can automatically detect the number of Gaussians with a good parameter estimation, even when two or more actual Gaussians in the mixture are overlapped at a high degree. Moreover, the constrained ERL learning algorithm leads to some promising results when applied to iris data classification and image database categorization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号