首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The combination of multiple clustering results (clustering ensemble) has emerged as an important procedure to improve the quality of clustering solutions. In this paper we propose a new cluster ensemble method based on kernel functions, which introduces the Partition Relevance Analysis step. This step has the goal of analyzing the set of partition in the cluster ensemble and extract valuable information that can improve the quality of the combination process. Besides, we propose a new similarity measure between partitions proving that it is a kernel function. A new consensus function is introduced using this similarity measure and based on the idea of finding the median partition. Related to this consensus function, some theoretical results that endorse the suitability of our methods are proven. Finally, we conduct a numerical experimentation to show the behavior of our method on several databases by making a comparison with simple clustering algorithms as well as to other cluster ensemble methods.  相似文献   

2.
为了改善单一聚类算法的聚类性能,提出一种基于量子遗传算法的XML文档聚类集成解决方法。该方法首先利用KNN分类算法将XML文档划分成k个差异性的聚类成员;其次根据聚类成员的关系获得内联相似度矩阵,并通过多次分割、向下、向上、双向收缩的QR算法分解特征值对应的特征向量来实现矩阵的维数缩减;然后在映射空间上用量子遗传算法实现聚类集成,把每一个样本判别到最优的聚类类别中。这样减少了数据差异性对聚类结果的影响,提高了聚类质量。实验结果表明,在真实的数据集上,该聚类集成算法比其他聚类集成算法具有更好的效果。  相似文献   

3.
罗会兰  危辉 《计算机科学》2010,37(11):234-238
提出了一种基于集成技术和谱聚类技术的混合数据聚类算法CBEST。它利用聚类集成技术产生混合数据间的相似性,这种相似性度量没有对数据特征值分布模型做任何的假设。基于此相似性度量得到的待聚类数据的相似性矩阵,应用谱聚类算法得到混合数据聚类结果。大量真实和人工数据上的实验结果验证了CBEST的有效性和它对噪声的鲁棒性。与其它混合数据聚类算法的比较研究也证明了CBEST的优越性能。CBEST还能有效融合先验知识,通过参数的调节来设置不同属性在聚类中的权重。  相似文献   

4.
Multi-view clustering has become an important extension of ensemble clustering. In multi-view clustering, we apply clustering algorithms on different views of the data to obtain different cluster labels for the same set of objects. These results are then combined in such a manner that the final clustering gives better result than individual clustering of each multi-view data. Multi view clustering can be applied at various stages of the clustering paradigm. This paper proposes a novel multi-view clustering algorithm that combines different ensemble techniques. Our approach is based on computing different similarity matrices on the individual datasets and aggregates these to form a combined similarity matrix, which is then used to obtain the final clustering. We tested our approach on several datasets and perform a comparison with other state-of-the-art algorithms. Our results show that the proposed algorithm outperforms several other methods in terms of accuracy while maintaining the overall complexity of the individual approaches.  相似文献   

5.
基于谱聚类的聚类集成算法   总被引:6,自引:7,他引:6  
周林  平西建  徐森  张涛 《自动化学报》2012,38(8):1335-1342
谱聚类是近年来出现的一类性能优越的聚类算法,能对任意形状的数据进行聚类, 但算法对尺度参数比较敏感,利用聚类集成良好的鲁棒性和泛化能力,本文提出了基于谱聚类的聚类集成算法.该算法首先利用谱聚类算法的内在特性构造多样性的聚类成员; 然后,采用连接三元组算法计算相似度矩阵,扩充了数据点之间的相似性信息;最后,对相似度矩阵使用谱聚类算法得到最终的集成结果. 为了使算法能扩展到大规模应用,利用Nystrm采样算法只计算随机采样数据点之间以及随机采样数据点与剩余数据点之间的相似度矩阵,从而有效降低了算法的计算复杂度. 本文算法既利用了谱聚类算法的优越性能,同时又避免了精确选择尺度参数的问题.实验结果表明:较之其他常见的聚类集成算法,本文算法更优越、更有效,能较好地解决数据聚类、图像分割等问题.  相似文献   

6.
标签传播算法(LPA)是一种高效地处理大规模网络的社区发现算法,由于其近乎线性的时间复杂度而受到广泛关注。然而,该算法每个节点的标签依赖于其邻居节点,其迭代速度和聚类有效性对标签信息的更新顺序非常敏感,影响了社区发现结果的准确性和稳定性。基于该问题,提出了一种基于加权聚类集成的标签传播算法。该算法利用多次标签传播算法的结果作为基聚类集,并用模块度评估每个基聚类的重要性,使其作为节点相似性度量的权值形成加权相似性矩阵,最后通过层次聚类得出最终的社区划分结果。在实验分析中,该算法和其他5个具有代表性的标签传播算法的改进算法在真实数据集上进行了比较,展示了新算法能有效地提高标签传播算法的社区发现精度。  相似文献   

7.
An improved spectral clustering algorithm based on random walk   总被引:2,自引:0,他引:2  
The construction process for a similarity matrix has an important impact on the performance of spectral clustering algorithms. In this paper, we propose a random walk based approach to process the Gaussian kernel similarity matrix. In this method, the pair-wise similarity between two data points is not only related to the two points, but also related to their neighbors. As a result, the new similarity matrix is closer to the ideal matrix which can provide the best clustering result. We give a theoretical analysis of the similarity matrix and apply this similarity matrix to spectral clustering. We also propose a method to handle noisy items which may cause deterioration of clustering performance. Experimental results on real-world data sets show that the proposed spectral clustering algorithm significantly outperforms existing algorithms.  相似文献   

8.
谱聚类将数据聚类问题转化成图划分问题,通过寻找最优的子图,对数据点进行聚类。谱聚类的关键是构造合适的相似矩阵,将数据集的内在结构真实地描述出来。针对传统的谱聚类算法采用高斯核函数来构造相似矩阵时对尺度参数的选择很敏感,而且在聚类阶段需要随机确定初始的聚类中心,聚类性能也不稳定等问题,本文提出了基于消息传递的谱聚类算法。该算法采用密度自适应的相似性度量方法,可以更好地描述数据点之间的关系,然后利用近邻传播(Affinity propagation,AP)聚类中“消息传递”机制获得高质量的聚类中心,提高了谱聚类算法的性能。实验表明,新算法可以有效地处理多尺度数据集的聚类问题,其聚类性能非常稳定,聚类质量也优于传统的谱聚类算法和k-means算法。  相似文献   

9.
现有的半监督聚类集成方法能利用先验信息,使集成的准确性、鲁棒性和稳定性得到提高,但在集成阶段加入成对约束信息时,只考虑了给定的约束信息而忽视了约束点与被约束点的邻域点之间的关系.针对此问题,提出了一种基于数据相关性的半监督模糊聚类集成方法.该方法首先利用半监督模糊聚类算法建立集成信息矩阵,并将其转换为相似性矩阵;然后,利用已知的约束信息及约束点与被约束点的邻域点之间的关系来修改相似性矩阵;最后,利用图划分算法得到最终的聚类结果.真实数据上的实验结果表明,提出的方法可以有效提高聚类质量.  相似文献   

10.
杜航原  张晶  王文剑   《智能系统学报》2020,15(6):1113-1120
针对聚类集成中一致性函数设计问题,本文提出一种深度自监督聚类集成算法。该算法首先根据基聚类划分结果采用加权连通三元组算法计算样本之间的相似度矩阵,基于相似度矩阵表达邻接关系,将基聚类由特征空间中的数据表示变换至图数据表示;在此基础上,基聚类的一致性集成问题被转化为对基聚类图数据表示的图聚类问题。为此,本文利用图神经网络构造自监督聚类集成模型,一方面采用图自动编码器学习图的低维嵌入,依据低维嵌入似然分布估计聚类集成的目标分布;另一方面利用聚类集成目标对低维嵌入过程进行指导,确保模型获得的图低维嵌入与聚类集成结果是一致最优的。在大量数据集上进行了仿真实验,结果表明本文算法相比HGPA、CSPA和MCLA等算法可以进一步提高聚类集成结果的准确性。  相似文献   

11.
数据挖掘中聚类算法研究进展   总被引:6,自引:0,他引:6  
聚类分析是数据挖掘中重要的研究内容之一,对聚类准则进行了总结,对五类传统的聚类算法的研究现状和进展进行了较为全面的总结,就一些新的聚类算法进行了梳理,根据样本归属关系、样本数据预处理、样本的相似性度量、样本的更新策略、样本的高维性和与其他学科的融合等六个方面对聚类中近20多个新算法,如粒度聚类、不确定聚类、量子聚类、核聚类、谱聚类、聚类集成、概念聚类、球壳聚类、仿射聚类、数据流聚类等,分别进行了详细的概括。这对聚类是一个很好的总结,对聚类的发展具有积极意义。  相似文献   

12.
Clustering ensemble is a popular approach for identifying data clusters that combines the clustering results from multiple base clustering algorithms to produce more accurate and robust data clusters. However, the performance of clustering ensemble algorithms is highly dependent on the quality of clustering members. To address this problem, this paper proposes a member enhancement-based clustering ensemble (MECE) algorithm that selects the ensemble members by considering their distribution consistency. MECE has two main components, called heterocluster splitting and homocluster merging. The first component estimates two probability density functions (p.d.f.s) estimated on the sample points of an heterocluster and represents them using a Gaussian distribution and a Gaussian mixture model. If the random numbers generated by these two p.d.f.s have different probability distributions, the heterocluster is then split into smaller clusters. The second component merges the clusters that have high neighborhood densities into a homocluster, where the neighborhood density is measured using a novel evaluation criterion. In addition, a co-association matrix is presented, which serves as a summary for the ensemble of diverse clusters. A series of experiments were conducted to evaluate the feasibility and effectiveness of the proposed ensemble member generation algorithm. Results show that the proposed MECE algorithm can select high quality ensemble members and as a result yield the better clusterings than six state-of-the-art ensemble clustering algorithms, that is, cluster-based similarity partitioning algorithm (CSPA), meta-clustering algorithm (MCLA), hybrid bipartite graph formulation (HBGF), evidence accumulation clustering (EAC), locally weighted evidence accumulation (LWEA), and locally weighted graph partition (LWGP). Specifically, MECE algorithm has the nearly 23% higher average NMI, 27% higher average ARI, 15% higher average FMI, and 10% higher average purity than CSPA, MCLA, HBGF, EAC, LWEA, and LWGA algorithms. The experimental results demonstrate that MECE algorithm is a valid approach to deal with the clustering ensemble problems.  相似文献   

13.
为了解决谱聚类方法中大规模的相似性矩阵的存储和特征分解困难的问题,利用权核K-均值算法的目标函数和图谱划分准则的等价性,将图谱划分准则作为免疫克隆选择优化算法的亲和度函数,提出一种利用免疫克隆选择优化算法求解图谱划分问题的新方法——免疫克隆选择图划分方法。该方法在免疫克隆选择操作的过程中引入了一个个体修正算子,使得个体以更快的速度向更优的个体进化。此外,在新方法中还引入了流形距离测度来构造相似性矩阵,使得新算法可以有效处理具有复杂结构的数据。采用人工数据集、USPS手写体数字识别和UMIST人脸识别的仿真实验验证了新方法的有效性和鲁棒性。  相似文献   

14.
近年来谱聚类算法在模式识别和计算机视觉领域被广泛应用,而相似性矩阵的构造是谱聚类算法的关键步骤。针对传统谱聚类算法计算复杂度高难以应用到大规模图像分割处理的问题,提出了区间模糊谱聚类图像分割方法。该方法首先利用灰度直方图和区间模糊理论得到图像灰度间的区间模糊隶属度,然后利用该隶属度构造基于灰度的区间模糊相似性测度,最后利用该相似性测度构造相似性矩阵并通过规范切图谱划分准则对图像进行划分,得到最终的图像分割结果。由于区间模糊理论的引入,提高了传统谱聚类的分割性能,对比实验也表明该方法在分割效果和计算复杂度上都有较大的改善。  相似文献   

15.
Spectral clustering with fuzzy similarity measure   总被引:1,自引:0,他引:1  
Spectral clustering algorithms have been successfully used in the field of pattern recognition and computer vision. The widely used similarity measure for spectral clustering is Gaussian kernel function which measures the similarity between data points. However, it is difficult for spectral clustering to choose the suitable scaling parameter in Gaussian kernel similarity measure. In this paper, utilizing the prototypes and partition matrix obtained by fuzzy c-means clustering algorithm, we develop a fuzzy similarity measure for spectral clustering (FSSC). Furthermore, we introduce the K-nearest neighbor sparse strategy into FSSC and apply the sparse FSSC to texture image segmentation. In our experiments, we firstly perform some experiments on artificial data to verify the efficiency of the proposed fuzzy similarity measure. Then we analyze the parameters sensitivity of our method. Finally, we take self-tuning spectral clustering and Nyström methods for baseline comparisons, and apply these three methods to the synthetic texture and remote sensing image segmentation. The experimental results show that the proposed method is significantly effective and stable.  相似文献   

16.
Combining multiple clusterings using evidence accumulation   总被引:2,自引:0,他引:2  
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n /spl times/ n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the k-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.  相似文献   

17.
李鹏清  李扬定  邓雪莲  李永钢  方月 《计算机科学》2018,45(Z11):458-461, 467
传统的谱聚类算法在建立相似度矩阵时仅考虑数据点与点的距离,忽略了数据点之间隐含的内在联系。针对这一问题,提出了一种基于SimRank的谱聚类算法。该算法首先用无向图数据建立邻接矩阵,并计算出基于SimRank的相似度矩阵;然后根据相似度矩阵建立拉普拉斯矩阵表达式,对其进行归一化后再进行谱分解;最后对分解得到的特征向量进行k-means聚类。在Zoo等UCI标准数据集上的实验结果表明,所提算法在聚类精确度、标准互信息和纯度3个评价指标上均优于现有的LRR(Low Rank Rrepresentation)等基于距离相似度的谱聚类算法。  相似文献   

18.
In recent years, spectral clustering has become one of the most popular clustering algorithms in areas of pattern analysis and recognition. This algorithm uses the eigenvalues and eigenvectors of a normalized similarity matrix to partition the data, and is simple to implement. However, when the image is corrupted by noise, spectral clustering cannot obtain satisfying segmentation performance. In order to overcome the noise sensitivity of the standard spectral clustering algorithm, a novel fuzzy spectral clustering algorithm with robust spatial information for image segmentation (FSC_RS) is proposed in this paper. Firstly, a non-local-weighted sum image of the original image is generated by utilizing the pixels with a similar configuration of each pixel. Then a robust gray-based fuzzy similarity measure is defined by using the fuzzy membership values among gray values in the new generated image. Thus, the similarity matrix obtained by this measure is only dependent on the number of the gray-levels and can be easily stored. Finally, the spectral graph partitioning method can be applied to this similarity matrix to group the gray values of the new generated image and then the corresponding pixels in the image are reclassified to obtain the final segmentation result. Some segmentation experiments on synthetic and real images show that the proposed method outperforms traditional spectral clustering methods and spatial fuzzy clustering in efficiency and robustness.  相似文献   

19.
Searching and mining biomedical literature databases are common ways of generating scientific hypotheses by biomedical researchers. Clustering can assist researchers to form hypotheses by seeking valuable information from grouped documents effectively. Although a large number of clustering algorithms are available, this paper attempts to answer the question as to which algorithm is best suited to accurately cluster biomedical documents. Non-negative matrix factorization (NMF) has been widely applied to clustering general text documents. However, the clustering results are sensitive to the initial values of the parameters of NMF. In order to overcome this drawback, we present the ensemble NMF for clustering biomedical documents in this paper. The performance of ensemble NMF was evaluated on numerous datasets generated from the TREC Genomics track dataset. With respect to most datasets, the experimental results have demonstrated that the ensemble NMF significantly outperforms classical clustering algorithms of bisecting K-means, and hierarchical clustering. We compared four different methods for constructing an ensemble NMF. For clustering biomedical documents, this research is the first to compare ensemble NMF with typical classical clustering algorithms, and validates ensemble NMF constructed from different graph-based ensemble algorithms. This is also the first work on ensemble NMF with Hybrid Bipartite Graph Formulation for clustering biomedical documents.  相似文献   

20.
The clustering ensemble has emerged as a prominent method for improving robustness, stability, and accuracy of unsupervised classification solutions. It combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known as methods with high ability to solve optimization problems including clustering. To date, significant progress has been contributed to find consensus clustering that will yield better results than existing clustering. This paper presents a survey of genetic algorithms designed for clustering ensembles. It begins with the introduction of clustering ensembles and clustering ensemble algorithms. Subsequently, this paper describes a number of suggested genetic-guided clustering ensemble algorithms, in particular the genotypes, fitness functions, and genetic operations. Next, clustering accuracies among the genetic-guided clustering ensemble algorithms is compared. This paper concludes that using genetic algorithms in clustering ensemble improves the clustering accuracy and addresses open questions subject to future research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号