首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Clustering ensemble is a popular approach for identifying data clusters that combines the clustering results from multiple base clustering algorithms to produce more accurate and robust data clusters. However, the performance of clustering ensemble algorithms is highly dependent on the quality of clustering members. To address this problem, this paper proposes a member enhancement-based clustering ensemble (MECE) algorithm that selects the ensemble members by considering their distribution consistency. MECE has two main components, called heterocluster splitting and homocluster merging. The first component estimates two probability density functions (p.d.f.s) estimated on the sample points of an heterocluster and represents them using a Gaussian distribution and a Gaussian mixture model. If the random numbers generated by these two p.d.f.s have different probability distributions, the heterocluster is then split into smaller clusters. The second component merges the clusters that have high neighborhood densities into a homocluster, where the neighborhood density is measured using a novel evaluation criterion. In addition, a co-association matrix is presented, which serves as a summary for the ensemble of diverse clusters. A series of experiments were conducted to evaluate the feasibility and effectiveness of the proposed ensemble member generation algorithm. Results show that the proposed MECE algorithm can select high quality ensemble members and as a result yield the better clusterings than six state-of-the-art ensemble clustering algorithms, that is, cluster-based similarity partitioning algorithm (CSPA), meta-clustering algorithm (MCLA), hybrid bipartite graph formulation (HBGF), evidence accumulation clustering (EAC), locally weighted evidence accumulation (LWEA), and locally weighted graph partition (LWGP). Specifically, MECE algorithm has the nearly 23% higher average NMI, 27% higher average ARI, 15% higher average FMI, and 10% higher average purity than CSPA, MCLA, HBGF, EAC, LWEA, and LWGA algorithms. The experimental results demonstrate that MECE algorithm is a valid approach to deal with the clustering ensemble problems.  相似文献   

2.
侯勇  郑雪峰 《计算机应用》2013,33(8):2204-2207
当前流行的聚类集成算法无法依据不同数据集的不同特点给出恰当的处理方案,为此提出一种新的基于数据集特点的增强聚类集成算法,该算法由基聚类器的生成、基聚类器的选择与共识函数构成。该算法依据数据集的特点,通过启发式方法,选出合适的基聚类器,构建最终的基聚类器集合,并产生最终聚类结果。实验中,对ecoli,leukaemia与Vehicle三个基准数据集进行了聚类,所提出算法的聚类误差分别是0.014,0.489,0.479,同基于Bagging的结构化集成(BSEA)、异构聚类集成(HCE)和基于聚类的集成分类(COEC)算法相比,所提出算法的聚类误差始终最低;而在增加候基聚类器的情况下,所提出算法的标准化互信息(NMI)值始终高于对比算法。实验结果表明,同对比的聚类集成算法相比,所提出算法的聚类精度最高,可伸缩性最强。  相似文献   

3.
针对互联网流量标注困难以及单个聚类器的泛化能力较弱,提出一种基于互信息(MI)理论的选择聚类集成方法,以提高流量分类的精度。首先计算不同初始簇个数K的K均值聚类结果与训练集中流量协议的真实分布之间的规范化互信息(NMI);然后基于NMI的值来选择用于聚类集成的K均值基聚类器的K值序列;最后采用二次互信息(QMI)的一致函数生成一致聚类结果,并使用一种半监督方法对聚类簇进行标注。通过实验比较了聚类集成方法与单个聚类算法在4个不同测试集上总体分类精度。实验结果表明,聚类集成方法的流量分类总体精度能达到90%。所提方法将聚类集成模型应用到网络流量分类中,提高了流量分类的精度和在不同数据集上的分类稳定性。  相似文献   

4.
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices.  相似文献   

5.
An unsupervised learning algorithm, named soft spectral clustering ensemble (SSCE), is proposed in this paper. Until now many proposed ensemble algorithms cannot be used on image data, even images of a mere 256 × 256 pixels are too expensive in computational cost and storage. The proposed method is suitable for performing image segmentation and can, to some degree, solve some open problems of spectral clustering (SC). In this paper, a random scaling parameter and Nystr?m approximation are applied to generate the individual spectral clusters for ensemble learning. We slightly modify the standard SC algorithm to aquire a soft partition and then map it via a centralized logcontrast transform to relax the constraint of probability data, the sum of which is one. All mapped data are concatenated to form the new features for each instance. Principal component analysis (PCA) is used to reduce the dimension of the new features. The final aggregated result can be achieved by clustering dimension-reduced data. Experimental results, on UCI data and different image types, show that the proposed algorithm is more efficient compared with some existing consensus functions.  相似文献   

6.
贺娜  马盈仓 《计算机工程》2022,48(7):114-121+150
现有多视图模糊C均值聚类(FCM)算法通常将一个多视图分解为多个单视图进行数据处理,导致视图数据聚类精度降低,从而影响全局数据划分结果。为实现高维数据和多视图数据的高效聚类,提出一种基于KL信息的多视图自加权模糊聚类算法。将多个视图信息及其权重进行拟合融入标准FCM算法,求解多个隶属度矩阵和质心矩阵。在此基础上,通过附加KL信息作为模糊正则项进一步修正共识隶属度矩阵并保持权重分布的平滑性,其中KL信息是视图隶属度与其共识隶属度的比值,最小化KL信息会使每个视图的隶属度偏向于共识隶属度以得到更好的聚类结果。实验结果表明,该算法相比于传统聚类算法具有更好的聚类效果和更快的收敛速度,尤其在3-Sources数据集上相比于MVASM算法的聚类精度、标准化互信息和纯度分别提升了7.46、15.34和5.48个百分点。  相似文献   

7.
基于 K-center和信息增益的 Web搜索结果聚类方法 *   总被引:1,自引:0,他引:1  
丁振国  孟星 《计算机应用研究》2008,25(10):3125-3127
基于 K-center和信息增益的概念 ,将改进后的 FPF( furthest-point-first)算法用于 Web搜索结果聚类 ,提出了聚类标志方法 ,使得聚类呈现出的结果更易于用户理解 ,给出了评价聚类质量的模型。将该算法与 Lingo, K-means算法进行比较 ,其结果表明 ,本算法能够较好地平衡聚类质量和速度 ,更加适用于 Web检索聚类。  相似文献   

8.
挖掘多视图一致性是提升多视图聚类性能的关键,为更好地从多视图数据中学习一致性表示,提出一种新的多视图聚类算法OMTSC。OMTSC算法同时学习每个视图的聚类分配矩阵和特征嵌入,并将聚类分配矩阵分解为共享正交基矩阵和聚类编码矩阵。正交基矩阵可捕获并储存多视图一致性信息形成潜在聚类中心,经过加权融合的多视图聚类编码矩阵可更好地平衡不同视图的质量差异。引入基于二部图的协同聚类,实现正交基、聚类编码和特征嵌入3个矩阵的知识相互迁移,以提升多视图数据一致性和多样性,并利用特征嵌入的多样性最大化多视图一致性学习最优的潜在聚类中心,从而提高多视图聚类的性能。此外,基于群稀疏约束的特征嵌入可有效消除多视图数据中的噪声,提升算法的鲁棒性。在WikipediaArticles、COIL20和ORL数据集上的实验结果表明,与SC-Best、Co-Reg等先进的多视图聚类算法相比,OMTSC算法在ACC、NMI、ARI 3个评价指标上整体取得最优值,其中在COIL20和ORL数据集中的NMI评价指标均高于0.9。  相似文献   

9.
杜航原  张晶  王文剑   《智能系统学报》2020,15(6):1113-1120
针对聚类集成中一致性函数设计问题,本文提出一种深度自监督聚类集成算法。该算法首先根据基聚类划分结果采用加权连通三元组算法计算样本之间的相似度矩阵,基于相似度矩阵表达邻接关系,将基聚类由特征空间中的数据表示变换至图数据表示;在此基础上,基聚类的一致性集成问题被转化为对基聚类图数据表示的图聚类问题。为此,本文利用图神经网络构造自监督聚类集成模型,一方面采用图自动编码器学习图的低维嵌入,依据低维嵌入似然分布估计聚类集成的目标分布;另一方面利用聚类集成目标对低维嵌入过程进行指导,确保模型获得的图低维嵌入与聚类集成结果是一致最优的。在大量数据集上进行了仿真实验,结果表明本文算法相比HGPA、CSPA和MCLA等算法可以进一步提高聚类集成结果的准确性。  相似文献   

10.
针对半监督聚类算法性能受到成对约束数量多寡的限制问题,现有的研究大都依赖于原始成对约束的数量。因此,首先提出了基于灰关联分析的成对约束初始化算法(initialization algorithm of pair constraints based on grey relational analysis,PCIG)。该算法通过均衡接近度计算数据对象间的相似度,并根据相似度的取值来确定可信区间,然后借鉴网络结构初始化方法来扩充数据对象间的成对关系。最后,将其应用于标签传播聚类算法。通过在五个基准数据集上进行实验,基于改进成对约束扩充的标签传播聚类算法与其他方法相比NMI值和ARI值有所提升。实验结果证明了改进成对约束扩充可以有效改善标签传播算法的聚类效果。  相似文献   

11.
鉴于计算代价高昂的谱聚类无法满足海量网络社区发现的需求,提出一种用于网络重叠社区发现的谱聚类集成算法(SCEA).首先,利用高效的近似谱聚类(KASP)算法生成个体聚类集合;然后,引入个体聚类选择机制对个体聚类进行优选,并对优选后的个体聚类建立簇相似图;最后,进行层次软聚类,得到网络节点的软划分.实验结果表明,与代表性算法(CPM,Link,COPRA,SSDE)相比较,SCEA能够挖掘出具有更高规范化互信息(NMI)的网络重叠社区结构,且具有相对较好的鲁棒性.  相似文献   

12.
A clustering ensemble combines in a consensus function the partitions generated by a set of independent base clusterers. In this study both the employment of particle swarm clustering (PSC) and ensemble pruning (i.e., selective reduction of base partitions) using evolutionary techniques in the design of the consensus function is investigated. In the proposed ensemble, PSC plays two roles. First, it is used as a base clusterer. Second, it is employed in the consensus function; arguably the most challenging element of the ensemble. The proposed consensus function exploits a representation for the base partitions that makes cluster alignment unnecessary, allows for the combination of partitions with different number of clusters, and supports both disjoint and overlapping (fuzzy, probabilistic, and possibilistic) partitions. Results on both synthetic and real-world data sets show that the proposed ensemble can produce statistically significant better partitions, in terms of the validity indices used, than the best base partition available in the ensemble. In general, a small number of selected base partitions (below 20% of the total) yields the best results. Moreover, results produced by the proposed ensemble compare favorably to those of state-of-the-art clustering algorithms, and specially to swarm based clustering ensemble algorithms.  相似文献   

13.
选择性聚类融合研究进展   总被引:1,自引:0,他引:1  
传统的聚类融合方法通常是将所有产生的聚类成员融合以获得最终的聚类结果。在监督学习中,选择分类融合方法会获得更好的结果,从选择分类融合中得到启示,在聚类融合中应用这种方法被定义为选择性聚类融合。对选择性聚类融合关键技术进行了综述,讨论了未来的研究方向。  相似文献   

14.
传统DBSCAN算法不能正确聚类密度不均匀的数据集,聚类结果受邻域阈值和密度阈值参数的影响较大。提出一种新的优化初始点和自适应半径的密度聚类算法。利用反向最近邻和相似度矩阵发现当前全局密度最大的数据样本,分析该样本周围密度的分布情况,采用自适应的方法计算当前簇的邻域阈值,并利用DBSCAN算法进行聚类。在人工数据集和UCI数据集上进行测试的结果表明,与经典的DBSCAN、OPTICS、RNN-DBSCAN算法相比,优化初始点和自适应半径的密度聚类算法在ARI、NMI、Homogeneity、Completeness和V-measure 5个评价指标上整体取得最优值,其中在Compound、Jain等数据集上达到1.0,具有较高的聚类效率和准确度。  相似文献   

15.
提出一种基于Bagging的集成聚类方法,采用一种新的数据集采样技术生成数据子集,尽可能的保持了子样本的多样性和最大相关性,然后应用一种改进的k均值聚类算法生成个体学习器,根据互信息对数据集的不同聚类结果进行处理,最后通过计算有争议的数据对象与各个聚类中心的距离将其重新划分到新的聚类结果中.在多个UCI标准数据集上的实验结果表明,该方法能有效改善聚类质量.  相似文献   

16.
邱保志  唐雅敏 《计算机应用》2017,37(12):3482-3486
针对如何快速寻找密度骨架、提高高维数据聚类准确性的问题,提出一种快速识别高密度骨架的聚类(ECLUB)算法。首先,在定义了对象局部密度的基础上,根据互k近邻一致性及近邻点局部密度关系,快速识别出高密度骨架;然后,对未分配的低密度点依据邻近关系进行划分,得到最终聚类。人工合成数据集及真实数据集上的实验验证了所提算法的有效性,在Olivetti Face数据集上的聚类结果显示,ECLUB算法的调整兰德系数(ARI)和归一化互信息(NMI)分别为0.8779和0.9622。与经典的基于密度的聚类算法(DBSCAN)、密度中心聚类算法(CFDP)以及密度骨架聚类算法(CLUB)相比,所提ECLUB算法效率更高,且对于高维数据聚类准确率更高。  相似文献   

17.
In this research, a data clustering algorithm named as non-dominated sorting genetic algorithm-fuzzy membership chromosome (NSGA-FMC) based on K-modes method which combines fuzzy genetic algorithm and multi-objective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. In addition, due to this innovative chromosome setting, a more efficient solution selection technique which selects a solution from non-dominated Pareto front based on the largest fuzzy membership is integrated in the proposed algorithm. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed NSGA-FMC with two existing methods: genetic algorithm fuzzy K-modes (GA-FKM) and multi-objective genetic algorithm-based fuzzy clustering of categorical attributes (MOGA (π, sep)). Adjusted Rand index (ARI), π, sep, and computation time were used as performance indexes for comparison. The experimental result showed that the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously with shorter computation time.  相似文献   

18.
针对多核子空间谱聚类算法没有考虑噪声和关系图结构的问题,提出了一种新的联合低秩稀疏的多核子空间聚类算法(JLSMKC)。首先,通过联合低秩与稀疏表示进行子空间学习,使关系图具有低秩和稀疏结构属性;其次,建立鲁棒的多核低秩稀疏约束模型,用于减少噪声对关系图的影响和处理数据的非线性结构;最后,通过多核方法充分利用共识核矩阵来增强关系图质量。7个数据集上的实验结果表明,所提算法JLSMKC在聚类精度(ACC)、标准互信息(NMI)和纯度(Purity)上优于5种流行的多核聚类算法,同时减少了聚类时间,提高了关系图块对角质量。该算法在聚类性能上有较大优势。  相似文献   

19.
罗晓慧  李凡长  张莉  高家俊 《软件学报》2020,31(4):991-1001
流形学习是当今最重要的研究方向之一.约简维度的选择影响着流形学习方法的性能.当约简维度恰好是本征维度时,更容易发现原始数据的内在性质.然而,本征维度估计仍然是流形学习的一个研究难点.在此基础上,提出了一种新的无监督方法,即基于选择聚类集成的相似流形学习(SML-SCE)算法,避免了对本征维度的估计,并且性能表现良好.SML-SCE利用改进的层次平衡K-means(MBKHK)方法生成具有代表性的锚点,高效地构造相似度矩阵.随后计算得到了多个不同维度下的相似低维嵌入,这些低维嵌入是对原始数据的不同表示,而且不同低维嵌入之间的多样性有利于集成学习.因此,SML-SCE采用选择性聚类集成方法作为结合策略.对于通过K-means聚类得到的相似低维嵌入的聚类结果,采用聚类间的归一化互信息(NMI)作为权重的衡量标准.最后,舍弃权重较低的聚类,采用基于权重的选择性投票方案,得到最终的聚类结果.在多个数据集的大量实验结果表明了该方法的有效性.  相似文献   

20.
李斌  狄岚  王少华  于晓瞳 《计算机应用》2016,36(7):1981-1987
传统的核聚类仅考虑了类内元素的关系而忽略了类间的关系,对边界模糊或边界存在噪声点的数据集进行聚类分析时,会造成边界点的误分问题。为解决上述问题,在核模糊C均值(KFCM)聚类算法的基础上提出了一种基于改进核模糊C均值类间极大化聚类(MKFCM)算法。该算法考虑了类内元素和类间元素的联系,引入了高维特征空间的类间极大惩罚项和调控因子,拉大类中心间的距离,使得边界处的样本得到了较好的划分。在各模拟数据集的实验中,该算法在类中心的偏移距离相对其他算法均有明显降低。在人造高斯数据集的实验中,该算法的精度(ACC)、归一化互信息(NMI)、芮氏指标(RI)指标分别提升至0.9132,0.7575,0.9138。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号