共查询到19条相似文献,搜索用时 140 毫秒
1.
2.
一种基于投票策略的聚类融合算法 总被引:1,自引:0,他引:1
在分类算法和回归模型中,融合方法正得到越来越广泛的应用,但在非监督机器学习领域,由于缺乏数据集的先验知识,则不能直接用于聚类算法.提出并实现了一种基于投票策略的聚类融合算法,该算法利用k-means算法每次随机选取聚类中心而得到不同样本划分的特性,将多次运行得到的聚类结果通过投票的方式合并,从而得到最终的结果.通过一系列真实数据和合成数据集的实验证明,这种方法比单一的聚类算法能更有效地提高聚类的准确率.在此基础上,为了降低高维数据运算的复杂性,将随机划分属性子空间的方法应用到上述聚类融合算法中,实验证明,该方法同时也能够在一个属性子空间上获得好的聚类结果. 相似文献
3.
4.
近年来,谱聚类在分类领域得到了广泛的研究,其中基于路径和基于密度的算法是两个重要的研究方向。虽然这两种算法在一些数据集上能取得较好的分类效果,但不能对一些特殊的数据集进行准确分类。融合了这两种方法的优点,通过多级密度约束来寻找路径,根据得到的路径建立新的相似性矩阵。为了加强对噪声的鲁棒性,根据数据集的局部信息加入鲁棒性系数,提出了基于路径与密度的稳健谱聚类算法。实验结果表明该方法在人工数据集和手写体数据集上能取得较理想的分类结果。 相似文献
5.
在面对现实中广泛存在的不平衡数据分类问题时,大多数 传统分类算法假定数据集类分布是平衡的,分类结果偏向多数类,效果不理想。为此,提出了一种基于聚类融合欠抽样的改进AdaBoost分类算法。该算法首先进行聚类融合,根据样本权值从每个簇中抽取一定比例的多数类和全部的少数类组成平衡数据集。使用AdaBoost算法框架,对多数类和少数类的错分类给予不同的权重调整,选择性地集成分类效果较好的几个基分类器。实验结果表明,该算法在处理不平衡数据分类上具有一定的优势。 相似文献
6.
数据挖掘中聚类分析的技术方法 总被引:1,自引:0,他引:1
数据挖掘是信息产业界近年来非常热门的研究方向,聚类分析是数据挖掘中的核心技术。对各种聚类算法进行了分类,对代表算法作了详细的分析,并对这些算法从多个方面进行了比较,从而为研究和在不同领域使用这些算法提供了参考。同时还阐述了聚类分析在数据挖掘中的应用。 相似文献
7.
混合属性聚类是近年来的研究热点,对于混合属性数据的聚类算法要求处理好数值属性以及分类属性,而现存许多算法没有很好得平衡两种属性,以至于得不到令人满意的聚类结果.针对混合属性,在此提出一种基于交集的聚类融合算法,算法单独用基于相对密度的算法处理数值属性,基于信息熵的算法处理分类属性,然后通过基于交集的融合算法融合两个聚类成员,最终得到聚类结果.算法在UCI数据集Zoo上进行验证,与现存k-prototypes与EM算法进行了比较,在聚类的正确率上都优于k-prototypes与EM算法,还讨论了融合算法中交集元素比的取值对算法结果的影响. 相似文献
8.
9.
一种改进的基于特征赋权的K均值聚类算法 总被引:2,自引:0,他引:2
聚类分析是数据挖掘及机器学习领域内的重点问题之一。近年来,为了提高聚类质量,借鉴和引入了分类领域特征选择及特征赋权思想,提出了一些基于特征赋权的聚类算法。在这些研究基础上,本文提出了一种基于密度的初始中心点选择算法,并借鉴文[1]所提出的特征赋权方法,给出了一种改进的基于特征赋权的K均值算法。实验表明该算法能较为稳定地得到较高质量的聚类结果。 相似文献
10.
网络异常检测是网络管理中非常重要的课题,因此已在近年来得到广泛研究.人们在该领域提出了许多先进的网络流量异常检测方法,但是自动准确地对网络流量进行分类和识别来发现网络中的异常流量仍然是一个非常具有挑战性的问题.文中提出了一种基于多维聚类挖掘的异常检测方法,通过两个阶段来实现异常检测.第一阶段先通过多维聚类挖掘算法,自动对网络中的流量进行多维聚类,第二阶段通过计算多维聚类的异常度来实现异常检测.通过文中的方法,网络中的异常流量被自动归类到不同的有意义的聚类中,通过对这些聚类进行分析可以发现网络中的异常行为.最后通过实验对算法进行了验证,结果表明该方法能够有效检测网络中的异常流量. 相似文献
11.
Hybrid ensemble approach for classification 总被引:1,自引:1,他引:0
This paper presents a novel hybrid ensemble approach for classification in medical databases. The proposed approach is formulated
to cluster extracted features from medical databases into soft clusters using unsupervised learning strategies and fuse the
decisions using parallel data fusion techniques. The idea is to observe associations in the features and fuse the decisions
made by learning algorithms to find the strong clusters which can make impact on overall classification accuracy. The novel
techniques such as parallel neural-based strong clusters fusion and parallel neural network based data fusion are proposed
that allow integration of various clustering algorithms for hybrid ensemble approach. The proposed approach has been implemented
and evaluated on the benchmark databases such as Digital Database for Screening Mammograms, Wisconsin Breast Cancer, and Pima
Indian Diabetics. A comparative performance analysis of the proposed approach with other existing approaches for knowledge
extraction and classification is presented. The experimental results demonstrate the effectiveness of the proposed approach
in terms of improved classification accuracy on benchmark medical databases. 相似文献
12.
Reza Ghaemi Nasir bin Sulaiman Hamidah Ibrahim Norwati Mustapha 《Artificial Intelligence Review》2011,35(4):287-318
The clustering ensemble has emerged as a prominent method for improving robustness, stability, and accuracy of unsupervised
classification solutions. It combines multiple partitions generated by different clustering algorithms into a single clustering
solution. Genetic algorithms are known as methods with high ability to solve optimization problems including clustering. To
date, significant progress has been contributed to find consensus clustering that will yield better results than existing
clustering. This paper presents a survey of genetic algorithms designed for clustering ensembles. It begins with the introduction
of clustering ensembles and clustering ensemble algorithms. Subsequently, this paper describes a number of suggested genetic-guided
clustering ensemble algorithms, in particular the genotypes, fitness functions, and genetic operations. Next, clustering accuracies
among the genetic-guided clustering ensemble algorithms is compared. This paper concludes that using genetic algorithms in
clustering ensemble improves the clustering accuracy and addresses open questions subject to future research. 相似文献
13.
开放关系抽取(Open Relation Extraction, OpenRE)旨在从开放域语料库中抽取关系事实。大多数OpenRE方法通常局限于无监督方法提取命名实体之间的关系模式,然后将语义等价的模式聚类成一个关系簇,但由于缺少监督信息且聚类精度较低,影响了最终的关系抽取效果。为了进一步提高聚类性能,该文提出一种无监督集成聚类框架(Unsupervised Ensemble Clustering,UEC),它将无监督集成学习与基于信息度量的多步聚类算法相结合自主创建高质量伪标签,并以此作为监督信息改进关系特征的学习,从而引导聚类过程,获得更好的标签质量,最后通过多次迭代聚类发现文本中的关系类型。在FewRel和NYT-FB数据集上的实验结果表明,该文方法优于其他主流的基线OpenRE模型,F1值分别达到了65.2%和67.1%。 相似文献
14.
This paper discusses new approaches to unsupervised fuzzy classification of multidimensional data. In the developed clustering models, patterns are considered to belong to some but not necessarily all clusters. Accordingly, such algorithms are called ‘semi-fuzzy’ or ‘soft’ clustering techniques. Several models to achieve this goal are investigated and corresponding implementation algorithms are developed. Experimental results are reported. 相似文献
15.
Carlos Valle Francisco Saravia Héctor Allende Raúl Monge César Fernández 《Neural Processing Letters》2010,32(3):277-291
Ensemble learning has gained considerable attention in different tasks including regression, classification and clustering. Adaboost and Bagging are two popular approaches used to train these models. The former provides accurate estimations in regression settings but is computationally expensive because of its inherently sequential structure, while the latter is less accurate but highly efficient. One of the drawbacks of the ensemble algorithms is the high computational cost of the training stage. To address this issue, we propose a parallel implementation of the Resampling Local Negative Correlation (RLNC) algorithm for training a neural network ensemble in order to acquire a competitive accuracy like that of Adaboost and an efficiency comparable to that of Bagging. We test our approach on both synthetic and real datasets from the UCI and Statlib repositories for the regression task. In particular, our fine-grained parallel approach allows us to achieve a satisfactory balance between accuracy and parallel efficiency. 相似文献
16.
This paper describes a novel feature selection algorithm for unsupervised clustering, that combines the clustering ensembles method and the population based incremental learning algorithm. The main idea of the proposed unsupervised feature selection algorithm is to search for a subset of all features such that the clustering algorithm trained on this feature subset can achieve the most similar clustering solution to the one obtained by an ensemble learning algorithm. In particular, a clustering solution is firstly achieved by a clustering ensembles method, then the population based incremental learning algorithm is adopted to find the feature subset that best fits the obtained clustering solution. One advantage of the proposed unsupervised feature selection algorithm is that it is dimensionality-unbiased. In addition, the proposed unsupervised feature selection algorithm leverages the consensus across multiple clustering solutions. Experimental results on several real data sets demonstrate that the proposed unsupervised feature selection algorithm is often able to obtain a better feature subset when compared with other existing unsupervised feature selection algorithms. 相似文献
17.
This paper describes a general fuzzy min-max (GFMM) neural network which is a generalization and extension of the fuzzy min-max clustering and classification algorithms of Simpson (1992, 1993). The GFMM method combines supervised and unsupervised learning in a single training algorithm. The fusion of clustering and classification resulted in an algorithm that can be used as pure clustering, pure classification, or hybrid clustering classification. It exhibits a property of finding decision boundaries between classes while clustering patterns that cannot be said to belong to any of existing classes. Similarly to the original algorithms, the hyperbox fuzzy sets are used as a representation of clusters and classes. Learning is usually completed in a few passes and consists of placing and adjusting the hyperboxes in the pattern space; this is an expansion-contraction process. The classification results can be crisp or fuzzy. New data can be included without the need for retraining. While retaining all the interesting features of the original algorithms, a number of modifications to their definition have been made in order to accommodate fuzzy input patterns in the form of lower and upper bounds, combine the supervised and unsupervised learning, and improve the effectiveness of operations. A detailed account of the GFMM neural network, its comparison with the Simpson's fuzzy min-max neural networks, a set of examples, and an application to the leakage detection and identification in water distribution systems are given 相似文献
18.
为了能有效应对数据流中的概念漂移现象,提出结合无监督学习的数据流分类算法.该算法以集成式分类技术为基础,在分类过程中引入属性约简,利用聚类算法对数据进行聚类,通过对比分类和聚类结果的准确率,判断是否发生概念漂移.实验表明,文中算法在综合时间花销和准确率上取得较好效果. 相似文献
19.
针对训练包不含标签的无监督多示例问题,本文提出了聚类和分类结合的多示例预测算法。首先利用多示例聚类算法完成无监督多示例学习的聚类任务,并根据聚类结果,将各个簇中的每个包转换成相应的k维特征向量。在标准多示例预测模型和一般性多示例预测模型上进行实验,可以得到较高的预测准确度,与其它多示例预测算法相比,本文算法具有较好的性能。 相似文献