首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
杜航原  张晶  王文剑   《智能系统学报》2020,15(6):1113-1120
针对聚类集成中一致性函数设计问题,本文提出一种深度自监督聚类集成算法。该算法首先根据基聚类划分结果采用加权连通三元组算法计算样本之间的相似度矩阵,基于相似度矩阵表达邻接关系,将基聚类由特征空间中的数据表示变换至图数据表示;在此基础上,基聚类的一致性集成问题被转化为对基聚类图数据表示的图聚类问题。为此,本文利用图神经网络构造自监督聚类集成模型,一方面采用图自动编码器学习图的低维嵌入,依据低维嵌入似然分布估计聚类集成的目标分布;另一方面利用聚类集成目标对低维嵌入过程进行指导,确保模型获得的图低维嵌入与聚类集成结果是一致最优的。在大量数据集上进行了仿真实验,结果表明本文算法相比HGPA、CSPA和MCLA等算法可以进一步提高聚类集成结果的准确性。  相似文献   

2.
Evolving clusters in gene-expression data   总被引:1,自引:0,他引:1  
Clustering is a useful exploratory tool for gene-expression data. Although successful applications of clustering techniques have been reported in the literature, there is no method of choice in the gene-expression analysis community. Moreover, there are only a few works that deal with the problem of automatically estimating the number of clusters in bioinformatics datasets. Most clustering methods require the number k of clusters to be either specified in advance or selected a posteriori from a set of clustering solutions over a range of k. In both cases, the user has to select the number of clusters. This paper proposes improvements to a clustering genetic algorithm that is capable of automatically discovering an optimal number of clusters and its corresponding optimal partition based upon numeric criteria. The proposed improvements are mainly designed to enhance the efficiency of the original clustering genetic algorithm, resulting in two new clustering genetic algorithms and an evolutionary algorithm for clustering (EAC). The original clustering genetic algorithm and its modified versions are evaluated in several runs using six gene-expression datasets in which the right clusters are known a priori. The results illustrate that all the proposed algorithms perform well in gene-expression data, although statistical comparisons in terms of the computational efficiency of each algorithm point out that EAC outperforms the others. Statistical evidence also shows that EAC is able to outperform a traditional method based on multiple runs of k-means over a range of k.  相似文献   

3.
鉴于计算代价高昂的谱聚类无法满足海量网络社区发现的需求,提出一种用于网络重叠社区发现的谱聚类集成算法(SCEA).首先,利用高效的近似谱聚类(KASP)算法生成个体聚类集合;然后,引入个体聚类选择机制对个体聚类进行优选,并对优选后的个体聚类建立簇相似图;最后,进行层次软聚类,得到网络节点的软划分.实验结果表明,与代表性算法(CPM,Link,COPRA,SSDE)相比较,SCEA能够挖掘出具有更高规范化互信息(NMI)的网络重叠社区结构,且具有相对较好的鲁棒性.  相似文献   

4.
Combining multiple clusterings using evidence accumulation   总被引:2,自引:0,他引:2  
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n /spl times/ n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the k-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.  相似文献   

5.
针对传统的聚类集成算法难以高效地处理海量数据的聚类分析问题,提出一种基于MapReduce的并行FCM聚类集成算法。算法利用随机初始聚心来获取具有差异化的聚类成员,通过建立聚类成员簇间OVERLAP矩阵来寻找逻辑等价簇,最后利用投票法共享聚类成员中数据对象的分类情况得出最终的聚类结果。实验证明,该算法具有良好的精确度,加速比和扩展性,具有处理较大规模数据集的能力。  相似文献   

6.
A two-leveled symbiotic evolutionary algorithm for clustering problems   总被引:3,自引:3,他引:0  
Because of its unsupervised nature, clustering is one of the most challenging problems, considered as a NP-hard grouping problem. Recently, several evolutionary algorithms (EAs) for clustering problems have been presented because of their efficiency for solving the NP-hard problems with high degree of complexity. Most previous EA-based algorithms, however, have dealt with the clustering problems given the number of clusters (K) in advance. Although some researchers have suggested the EA-based algorithms for unknown K clustering, they still have some drawbacks to search efficiently due to their huge search space. This paper proposes the two-leveled symbiotic evolutionary clustering algorithm (TSECA), which is a variant of coevolutionary algorithm for unknown K clustering problems. The clustering problems considered in this paper can be divided into two sub-problems: finding the number of clusters and grouping the data into these clusters. The two-leveled framework of TSECA and genetic elements suitable for each sub-problem are proposed. In addition, a neighborhood-based evolutionary strategy is employed to maintain the population diversity. The performance of the proposed algorithm is compared with some popular evolutionary algorithms using the real-life and simulated synthetic data sets. Experimental results show that TSECA produces more compact clusters as well as the accurate number of clusters.  相似文献   

7.
侯勇  郑雪峰 《计算机应用》2013,33(8):2204-2207
当前流行的聚类集成算法无法依据不同数据集的不同特点给出恰当的处理方案,为此提出一种新的基于数据集特点的增强聚类集成算法,该算法由基聚类器的生成、基聚类器的选择与共识函数构成。该算法依据数据集的特点,通过启发式方法,选出合适的基聚类器,构建最终的基聚类器集合,并产生最终聚类结果。实验中,对ecoli,leukaemia与Vehicle三个基准数据集进行了聚类,所提出算法的聚类误差分别是0.014,0.489,0.479,同基于Bagging的结构化集成(BSEA)、异构聚类集成(HCE)和基于聚类的集成分类(COEC)算法相比,所提出算法的聚类误差始终最低;而在增加候基聚类器的情况下,所提出算法的标准化互信息(NMI)值始终高于对比算法。实验结果表明,同对比的聚类集成算法相比,所提出算法的聚类精度最高,可伸缩性最强。  相似文献   

8.
聚类融合通过把具有一定差异性的聚类成员进行组合,能够得到比单一算法更为优越的结果,是近年来聚类算法研究领域的热点问题之一。提出了一种基于自适应最近邻的聚类融合算法ANNCE,能够根据数据分布密度的不同,为每一个数据点自动选择合适的最近邻选取范围。该算法与已有的基于KNN的算法相比,不仅解决了KNN算法中存在的过多参数需要实验确定的问题,还进一步提高了聚类效果。  相似文献   

9.
通过引入上、下近似的思想,粗糙K-means已成为一种处理聚类边界模糊问题的有效算法,粗糙模糊K-means、模糊粗糙K-means等作为粗糙K-means的衍生算法,进一步对聚类边界对象的不确定性进行了细化描述,改善了聚类的效果。然而,这些算法在中心均值迭代计算时没有充分考虑各簇的数据对象与均值中心的距离、邻近范围的数据分布疏密程度等因素对聚类精度的影响。针对这一问题提出了一种局部密度自适应度量的方法来描述簇内数据对象的空间特征,给出了一种基于局部密度自适应度量的粗糙K-means聚类算法,并通过实例计算分析验证了算法的有效性。  相似文献   

10.
王留正  何振峰 《计算机应用》2012,32(11):3005-3008
进化算法可以有效地克服K means对初始聚类中心敏感的缺陷,提高了聚类性能。在进化K means聚类算法 (F-EAC)的基础上,针对其变异操作——簇分裂算子的随机性与局部性,提出了两个全局性分裂算子。结合最大最小距离的思想,利用待分裂簇的周边簇信息来指导簇分裂初始点的选择,使簇的分裂更有利于全局划分,以进一步提高进化聚类的有效性。实验结果表明,基于全局性分裂算子的算法在类数发现及聚类精度方面均优于F EAC。  相似文献   

11.
Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where, unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.  相似文献   

12.
To cluster web documents, all of which have the same name entities, we attempted to use existing clustering algorithms such as K-means and spectral clustering. Unexpectedly, it turned out that these algorithms are not effective to cluster web documents. According to our intensive investigation, we found that clustering such web pages is more complicated because (1) the number of clusters (known as ground truth) is larger than two or three clusters as in general clustering problems and (2) clusters in the data set have extremely skewed distributions of cluster sizes. To overcome the aforementioned problem, in this paper, we propose an effective clustering algorithm to boost up the accuracy of K-means and spectral clustering algorithms. In particular, to deal with skewed distributions of cluster sizes, our algorithm performs both bisection and merge steps based on normalized cuts of the similarity graph G to correctly cluster web documents. Our experimental results show that our algorithm improves the performance by approximately 56% compared to spectral bisection and 36% compared to K-means.  相似文献   

13.
《Knowledge》2006,19(1):77-83
Ensemble methods that train multiple learners and then combine their predictions have been shown to be very effective in supervised learning. This paper explores ensemble methods for unsupervised learning. Here, an ensemble comprises multiple clusterers, each of which is trained by k-means algorithm with different initial points. The clusters discovered by different clusterers are aligned, i.e. similar clusters are assigned with the same label, by counting their overlapped data items. Then, four methods are developed to combine the aligned clusterers. Experiments show that clustering performance could be significantly improved by ensemble methods, where utilizing mutual information to select a subset of clusterers for weighted voting is a nice choice. Since the proposed methods work by analyzing the clustering results instead of the internal mechanisms of the component clusterers, they are applicable to diverse kinds of clustering algorithms.  相似文献   

14.
基于投票机制的融合聚类算法   总被引:1,自引:0,他引:1  
以一趟聚类算法作为划分数据的基本算法,讨论聚类融合问题.通过重复使用一趟聚类算法划分数据,并随机选择阈值和数据输入顺序,得到不同的聚类结果,将这些聚类结果映射为模式间的关联矩阵,在关联矩阵上使用投票机制获得最终的数据划分.在真实数据集和人造数据集上检验了提出的聚类融合算法,并与相关聚类算法进行了对比,实验结果表明,文中提出的算法是有效可行的.  相似文献   

15.
罗会兰  危辉 《计算机科学》2010,37(8):214-218
提出了基于数学形态学的聚类集成算法CEOMM.它利用不同的结构元素的探针作用,对不同的结构元素探测出来的簇核心图进行集成,在集成所得到的簇核心基础上聚类.实验结果表明,算法CEOMM对有复杂类形状的数据集进行聚类时,效果比传统聚类算法更好,且能确定聚类数.而且由于采用了不同的结构元素进行探测,对于由不同形状的类构成的数据集其聚类效果很理想.  相似文献   

16.
解决文本聚类集成问题的两个谱算法   总被引:8,自引:0,他引:8  
徐森  卢志茂  顾国昌 《自动化学报》2009,35(7):997-1002
聚类集成中的关键问题是如何根据不同的聚类器组合为最终的更好的聚类结果. 本文引入谱聚类思想解决文本聚类集成问题, 然而谱聚类算法需要计算大规模矩阵的特征值分解问题来获得文本的低维嵌入, 并用于后续聚类. 本文首先提出了一个集成算法, 该算法使用代数变换将大规模矩阵的特征值分解问题转化为等价的奇异值分解问题, 并继续转化为规模更小的特征值分解问题; 然后进一步研究了谱聚类算法的特性, 提出了另一个集成算法, 该算法通过求解超边的低维嵌入, 间接得到文本的低维嵌入. 在TREC和Reuters文本数据集上的实验结果表明, 本文提出的两个谱聚类算法比其他基于图划分的集成算法鲁棒, 是解决文本聚类集成问题行之有效的方法.  相似文献   

17.
基于先验信息和谱分析的聚类融合算法   总被引:1,自引:0,他引:1  
在聚类过程中利用先验信息能显著提高聚类算法的性能,但已存在的聚类融合算法很少考虑到数据集的先验信息。基于先验信息和谱分析,提出一种聚类融合算法,将成对限制信息引入到谱聚类算法中,用受限的谱聚类算法产生聚类成员,再采用基于互联合矩阵的集成方法生成最后的聚类结果。实验结果表明,利用先验信息能有效提高聚类的效果。  相似文献   

18.
混合数据聚类是聚类分析中一个重要的问题。现有的混合数据聚类算法主要是在全体样本的相似性度量的基础上进行聚类,因此对大规模数据进行聚类时,算法效率不高。基于此,设计了一种新的抽样策略,在此基础上,提出了一种基于抽样的大规模混合数据聚类集成算法。该算法对利用新的抽样策略得到的多个样本子集分别进行聚类,并将结果集成得到最终聚类结果。实验证明,与改进的K-prototypes算法相比,该算法的效率有了显著提高,同时聚类有效性指标基本相同。  相似文献   

19.
为了改善单一聚类算法的聚类性能,提出一种基于量子遗传算法的XML文档聚类集成解决方法。该方法首先利用KNN分类算法将XML文档划分成k个差异性的聚类成员;其次根据聚类成员的关系获得内联相似度矩阵,并通过多次分割、向下、向上、双向收缩的QR算法分解特征值对应的特征向量来实现矩阵的维数缩减;然后在映射空间上用量子遗传算法实现聚类集成,把每一个样本判别到最优的聚类类别中。这样减少了数据差异性对聚类结果的影响,提高了聚类质量。实验结果表明,在真实的数据集上,该聚类集成算法比其他聚类集成算法具有更好的效果。  相似文献   

20.
Clustering is a popular data analysis and data mining technique. A popular technique for clustering is based on k-means such that the data is partitioned into K clusters. However, the k-means algorithm highly depends on the initial state and converges to local optimum solution. This paper presents a new hybrid evolutionary algorithm to solve nonlinear partitional clustering problem. The proposed hybrid evolutionary algorithm is the combination of FAPSO (fuzzy adaptive particle swarm optimization), ACO (ant colony optimization) and k-means algorithms, called FAPSO-ACO–K, which can find better cluster partition. The performance of the proposed algorithm is evaluated through several benchmark data sets. The simulation results show that the performance of the proposed algorithm is better than other algorithms such as PSO, ACO, simulated annealing (SA), combination of PSO and SA (PSO–SA), combination of ACO and SA (ACO–SA), combination of PSO and ACO (PSO–ACO), genetic algorithm (GA), Tabu search (TS), honey bee mating optimization (HBMO) and k-means for partitional clustering problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号