首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
集体差异性被认为是集成学习中的一个关键因素,在聚类集成的研究中,生成聚类集体的方法有许多种,但就专门致力于生成高差异性聚类集体的方法研究较少,基于此,本文提出生成高差异性聚类集体的方法CEAN和ICEAN,在算法中通过引入人工数据来增加聚类集体的差异性,用实验比较了CEAN和ICEAN与文献中出现的常用聚类集体生成方法,实验表明CEAN和ICEAN确实能增加生成集体的差异性,从而在相似平均集体成员准确度情况下使得聚类集成的效果更好.  相似文献   

2.
基础聚类成员预处理是聚类集成算法中的一个重要研究步骤。众多研究表明,基础聚类成员集合的差异性会影响聚类集成算法性能。当前聚类集成研究围绕着生成基础聚类和优化集成策略展开,而针对基础聚类成员的差异性度量及其优化的研究尚不完善。文中基于Jaccard相似性提出一种基础聚类成员差异性度量指标,并结合三支决策思想提出了基础聚类成员差异性三支过滤方法。该方法首先设定基础聚类成员的三支决策的初始阈值α(0)和β(0),然后计算各个基础聚类成员的差异性度量指标,进而实施三支决策。其决策策略为:当基础聚类成员的差异性度量指标小于指定阈值α(0)时,删除该基础聚类成员;当基础聚类成员的差异性度量指标大于指定阈值β(0)时,保留该基础聚类成员;当基础聚类成员的差异性度量指标大于α(0)且小于β(0)时,该基础聚类成员被归入三支决策边界域等待进一步判断。当结束一轮三支决策后,算法将重新计算三支决策阈值α(1)和β(1)并对上轮三支决策边界域重新进行三支决策,直至没有基础聚类成员被归入三支决策边界域或达到指定迭代次数。对比实验表明基础差异性度量的基础聚类三支过滤方法能够有效地提升聚类集成效果。  相似文献   

3.
一种改进的自适应聚类集成选择方法   总被引:1,自引:0,他引:1  
徐森  皋军  花小朋  李先锋  徐静 《自动化学报》2018,44(11):2103-2112
针对自适应聚类集成选择方法(Adaptive cluster ensemble selection,ACES)存在聚类集体稳定性判定方法不客观和聚类成员选择方法不够合理的问题,提出了一种改进的自适应聚类集成选择方法(Improved ACES,IACES).IACES依据聚类集体的整体平均归一化互信息值判定聚类集体稳定性,若稳定则选择具有较高质量和适中差异性的聚类成员,否则选择质量较高的聚类成员.在多组基准数据集上的实验结果验证了IACES方法的有效性:1)IACES能够准确判定聚类集体的稳定性,而ACES会将某些不稳定的聚类集体误判为稳定;2)与其他聚类成员选择方法相比,根据IACES选择聚类成员进行集成在绝大部分情况下都获得了更佳的聚类结果,在所有数据集上都获得了更优的平均聚类结果.  相似文献   

4.
学习器间的差异性是影响集成学习效果的一个关键因素。目前针对分类集成的研究较多,针对聚类集成的研究则相对较少。基于聚类问题的本质特点,提出一种新的聚类集成学习方法,利用聚类有效性指标度量不同聚类结果性能上的差异,根据有效性指标的评价值为聚类结果分配权值,通过加权投票的决策方法进行聚类集成并确定最佳聚类数。理论研究和实验结果证明了新的聚类集成学习方法的可行性和高效性。  相似文献   

5.
基于k-means聚类的神经网络分类器集成方法研究   总被引:3,自引:1,他引:2       下载免费PDF全文
针对差异性是集成学习的必要条件,研究了基于k-means聚类技术提高神经网络分类器集成差异性的方法。通过训练集并使用神经网络分类器学习算法训练许多分类器模型,在验证集中利用每个分类器的分类结果作为聚类的数据对象;然后应用k-means聚类方法对这些数据聚类,在聚类结果的每个簇中选择一个分类器代表模型,以此构成集成学习的成员;最后应用投票方法实验研究了这种提高集成学习差异性方法的性能,并与常用的集成学习方法bagging、adaboost进行了比较。  相似文献   

6.
层次聚类的簇集成方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
聚类集成比单个聚类方法具有更高的鲁棒性和精确性,它主要由两部分组成,即个体成员的产生和结果的融合。针对聚类集成,首先用k-means聚类算法得到个体成员,然后使用层次聚类中的单连接法、全连接法与平均连接法进行融合。为了评价聚类集成方法的性能,实验中使用了ARI(Adjusted Rand Index)。实验结果表明,平均连接法的聚类集成性能优于单连接法和全连接法。研究并讨论了融合方法的聚类正确率和集成规模的关系。  相似文献   

7.
罗会兰  危辉 《计算机科学》2010,37(11):234-238
提出了一种基于集成技术和谱聚类技术的混合数据聚类算法CBEST。它利用聚类集成技术产生混合数据间的相似性,这种相似性度量没有对数据特征值分布模型做任何的假设。基于此相似性度量得到的待聚类数据的相似性矩阵,应用谱聚类算法得到混合数据聚类结果。大量真实和人工数据上的实验结果验证了CBEST的有效性和它对噪声的鲁棒性。与其它混合数据聚类算法的比较研究也证明了CBEST的优越性能。CBEST还能有效融合先验知识,通过参数的调节来设置不同属性在聚类中的权重。  相似文献   

8.
距离与差异性度量是聚类分析中的基本概念,是许多聚类算法的核心内容。在经典的聚类分析中,度量差异性的指标是距离的简单函数。该文针对混合属性数据集,提出两种距离定义,将差异性度量推广成为距离、类大小等因素的多元函数,使得原来只适用于数值属性或分类属性数据的聚类算法可用于混合属性数据。实验结果表明新的距离定义和差异性度量方法可提高聚类的质量。  相似文献   

9.
针对差异性是集成学习的一个重要条件,研究基于模糊聚类技术提高神经网络集成差异性的方法。提取大量弱分类器的权值和阈值并作为模糊聚类的数据对象,然后将聚类结果作为集成网络中个体网络的权值和阈值,最后在标准数据集上进行仿真实验,证实方法的有效性。  相似文献   

10.
模型聚类及在集成学习中的应用研究   总被引:2,自引:0,他引:2  
聚类技术是一种重要的数据分析工具,在数据挖掘、模式识别等领域具有广泛的应用前景.通常,聚类算法的聚类对象为传统的数据集合,它们可以表示为欧式空间中的点.然而,在一些任务中,聚类的对象并不是显式的数据点,而是一些抽象的数据模型,例如神经网络、决策树、支持向量机等模型.通过定义广义的距离(实际任务中的距离定义可能各不相同),研究了数据对象为一般模型的聚类方法,提出了基于模型对象的一般聚类算法框架;作为模型聚类的一个应用,研究了应用神经网络模型的聚类提高集成学习差异性的方法,实验研究了聚类的簇数、集成学习的规模以及集成学习性能间的关系.  相似文献   

11.
选择性聚类融合研究进展   总被引:1,自引:0,他引:1  
传统的聚类融合方法通常是将所有产生的聚类成员融合以获得最终的聚类结果。在监督学习中,选择分类融合方法会获得更好的结果,从选择分类融合中得到启示,在聚类融合中应用这种方法被定义为选择性聚类融合。对选择性聚类融合关键技术进行了综述,讨论了未来的研究方向。  相似文献   

12.
Ensemble systems are classification structures that apply a two‐level decision‐making process, in which the first level produces the outputs of the individual classifiers and the second level produces the output of the combination method (final output). Although ensemble systems have been proven to be efficient for pattern recognition tasks, its efficient design is not an easy task. This article investigates the influence of two diversity measures when used explicitly to guide the design of ensemble systems. These diversity measures were proposed recently, and they proved to be very interesting for the diversity–accuracy dilemma. To perform this investigation, we will use two well‐known optimization techniques, genetic algorithms, and tabu search, in their mono‐objective and multiobjective versions. As objectives of the optimization techniques, we use error rate and two diversity measures as well as all possible combinations of these three objectives. In this article, we aim to analyze which set of objectives can generate more accurate ensembles. In addition, we aim to analyze whether or not the diversity measures (good and bad diversities) have a positive effect in the design of ensemble systems, mainly if they can replace the error rate as an optimization objective without incurring significant losses in the accuracy level of the generated ensembles.  相似文献   

13.
Cluster ensembles have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across different data collections. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the corresponding parameters, given a set of data to be investigated. Almost two decades after the first publication of a kind, the method has proven effective for many problem domains, especially microarray data analysis and its down-streaming applications. Recently, it has been greatly extended both in terms of theoretical modelling and deployment to problem solving. The survey attempts to match this emerging attention with the provision of fundamental basis and theoretical details of state-of-the-art methods found in the present literature. It yields the ranges of ensemble generation strategies, summarization and representation of ensemble members, as well as the topic of consensus clustering. This review also includes different applications and extensions of cluster ensemble, with several research issues and challenges being highlighted.  相似文献   

14.
聚类组合研究的新进展   总被引:1,自引:0,他引:1       下载免费PDF全文
作为目前聚类分析的新兴研究热点,聚类组合方法能将两种或多种聚类方法集成起来以改善其性能。从聚类多样性和共识函数两方面综述了最新研究进展,探讨将神经网络组合的思想用于聚类组合。最后指出了将来可能的研究方向。  相似文献   

15.
Ke  Minlong  Fernanda L.  Xin   《Neurocomputing》2009,72(13-15):2796
Negative correlation learning (NCL) is a successful approach to constructing neural network ensembles. In batch learning mode, NCL outperforms many other ensemble learning approaches. Recently, NCL has also shown to be a potentially powerful approach to incremental learning, while the advantages of NCL have not yet been fully exploited. In this paper, we propose a selective NCL (SNCL) algorithm for incremental learning. Concretely, every time a new training data set is presented, the previously trained neural network ensemble is cloned. Then the cloned ensemble is trained on the new data set. After that, the new ensemble is combined with the previous ensemble and a selection process is applied to prune the whole ensemble to a fixed size. This paper is an extended version of our preliminary paper on SNCL. Compared to the previous work, this paper presents a deeper investigation into SNCL, considering different objective functions for the selection process and comparing SNCL to other NCL-based incremental learning algorithms on two more real world bioinformatics data sets. Experimental results demonstrate the advantage of SNCL. Further, comparisons between SNCL and other existing incremental learning algorithms, such Learn++ and ARTMAP, are also presented.  相似文献   

16.
In order to improve the generalisation ability of the maritime surveillance radar, a novel ensemble selection technique, termed Optimisation and Dynamic Selection (ODS), is proposed. During the optimisation phase, the non-dominated sorting genetic algorithm II for multi-objective optimisation is used to find the Pareto front, i.e. a set of ensembles of classifiers representing different tradeoffs between the classification error and diversity. During the dynamic selection phase, the meta-learning method is used to predict whether a candidate ensemble is competent enough to classify a query instance based on three different aspects, namely, feature space, decision space and the extent of consensus. The classification performance and time complexity of ODS are compared against nine other ensemble methods using a self-built full polarimetric high resolution range profile data-set. The experimental results clearly show the effectiveness of ODS. In addition, the influence of the selection of diversity measures is studied concurrently.  相似文献   

17.
选择性集成是当前机器学习领域的研究热点之一。由于选择性集成属于NP"难"问题,人们多利用启发式方法将选择性集成转化为其他问题来求得近似最优解,因为各种算法的出发点和描述角度各不相同,现有的大量选择性集成算法显得繁杂而没有规律。为便于研究人员迅速了解和应用本领域的最新进展,本文根据选择过程中核心策略的特征将选择性集成算法分为四类,即迭代优化法、排名法、分簇法、模式挖掘法;然后利用UCI数据库的20个常用数据集,从预测性能、选择时间、结果集成分类器大小三个方面对这些典型算法进行了实验比较;最后总结了各类方法的优缺点,并展望了选择性集成的未来研究重点。  相似文献   

18.
Decision trees are a kind of off-the-shelf predictive models, and they have been successfully used as the base learners in ensemble learning. To construct a strong classifier ensemble, the individual classifiers should be accurate and diverse. However, diversity measure remains a mystery although there were many attempts. We conjecture that a deficiency of previous diversity measures lies in the fact that they consider only behavioral diversity, i.e., how the classifiers behave when making predictions, neglecting the fact that classifiers may be potentially different even when they make the same predictions. Based on this recognition, in this paper, we advocate to consider structural diversity in addition to behavioral diversity, and propose the TMD (tree matching diversity) measure for decision trees. To investigate the usefulness of TMD, we empirically evaluate performances of selective ensemble approaches with decision forests by incorporating different diversity measures. Our results validate that by considering structural and behavioral diversities together, stronger ensembles can be constructed. This may raise a new direction to design better diversity measures and ensemble methods.  相似文献   

19.
Clustering ensembles combine multiple partitions of data into a single clustering solution of better quality. Inspired by the success of supervised bagging and boosting algorithms, we propose non-adaptive and adaptive resampling schemes for the integration of multiple independent and dependent clusterings. We investigate the effectiveness of bagging techniques, comparing the efficacy of sampling with and without replacement, in conjunction with several consensus algorithms. In our adaptive approach, individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given dataset. The sampling probability for each data point dynamically depends on the consistency of its previous assignments in the ensemble. New subsamples are then drawn to increasingly focus on the problematic regions of the input feature space. A measure of data point clustering consistency is therefore defined to guide this adaptation. Experimental results show improved stability and accuracy for clustering structures obtained via bootstrapping, subsampling, and adaptive techniques. A meaningful consensus partition for an entire set of data points emerges from multiple clusterings of bootstraps and subsamples. Subsamples of small size can reduce computational cost and measurement complexity for many unsupervised data mining tasks with distributed sources of data. This empirical study also compares the performance of adaptive and non-adaptive clustering ensembles using different consensus functions on a number of datasets. By focusing attention on the data points with the least consistent clustering assignments, whether one can better approximate the inter-cluster boundaries or can at least create diversity in boundaries and this results in improving clustering accuracy and convergence speed as a function of the number of partitions in the ensemble. The comparison of adaptive and non-adaptive approaches is a new avenue for research, and this study helps to pave the way for the useful application of distributed data mining methods.  相似文献   

20.
The problem of object category classification by committees or ensembles of classifiers, each of which is based on one diverse codebook, is addressed in this paper. Two methods of constructing visual codebook ensembles are proposed in this study. The first technique introduces diverse individual visual codebooks using different clustering algorithms. The second uses various visual codebooks of different sizes for constructing an ensemble with high diversity. Codebook ensembles are trained to capture and convey image properties from different aspects. Based on these codebook ensembles, different types of image representations can be acquired. A classifier ensemble can be trained based on different expression datasets from the same training image set. The use of a classifier ensemble to categorize new images can lead to improved performance. Detailed experimental analysis on a Pascal VOC challenge dataset reveals that the present ensemble approach performs well, consistently improves the performance of visual object classifiers, and results in state-of-the-art performance in categorization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号