首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
为了提高分类器集成性能,提出了一种基于聚类算法与排序修剪结合的分类器集成方法。首先将混淆矩阵作为量化基分类器间差异度的工具,通过聚类将分类器划分为若干子集;然后提出一种排序修剪算法,以距离聚类中心最近的分类器为起点,根据分类器的距离对差异度矩阵动态加权,以加权差异度作为排序标准对子集中的分类器进行按比例修剪;最后使用投票法对选出的基分类器进行集成。同时与多种集成方法在UCI数据库中的10组数据集上进行对比与分析,实验结果表明基于聚类与排序修剪的分类器选择方法有效提升了集成系统的分类能力。  相似文献   

2.
为改进SVM对不均衡数据的分类性能,提出一种基于拆分集成的不均衡数据分类算法,该算法对多数类样本依据类别之间的比例通过聚类划分为多个子集,各子集分别与少数类合并成多个训练子集,通过对各训练子集进行学习获得多个分类器,利用WE集成分类器方法对多个分类器进行集成,获得最终分类器,以此改进在不均衡数据下的分类性能.在UCI数据集上的实验结果表明,该算法的有效性,特别是对少数类样本的分类性能.  相似文献   

3.
向欣  陆歌皓 《计算机应用研究》2021,38(12):3604-3610
针对现实信用评估业务中样本类别不平衡和代价敏感的情况,为降低信用风险评估的误分类损失,提出一种基于DESMID-AD动态选择的信用评估集成模型,根据每一个测试样本的特点动态地选择合适的基分类器对其进行信用预测.为提高模型对信用差客户(小类)的识别能力,在基分类器训练前使用过采样的方法对训练数据作类别平衡,采用元学习的方式基于多个指标进行基分类器的性能评估并在此阶段设计权重机制增强小类的影响.在三个公开信用评估数据集上,以AUC、一型、二型错误率以及误分类代价作为评价指标,与九种信用评估常用模型做比较,证明了该方法在信用评估领域的有效性和可行性.  相似文献   

4.
《Information Fusion》2003,4(2):87-100
A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and diverse simple Bayesian classifiers is to use different feature subsets generated with the random subspace method. In this case, the ensemble consists of multiple classifiers constructed by randomly selecting feature subsets, that is, classifiers constructed in randomly chosen subspaces. In this paper, we present an algorithm for building ensembles of simple Bayesian classifiers in random subspaces. The EFS_SBC algorithm includes a hill-climbing-based refinement cycle, which tries to improve the accuracy and diversity of the base classifiers built on random feature subsets. We conduct a number of experiments on a collection of 21 real-world and synthetic data sets, comparing the EFS_SBC ensembles with the single simple Bayes, and with the boosted simple Bayes. In many cases the EFS_SBC ensembles have higher accuracy than the single simple Bayesian classifier, and than the boosted Bayesian ensemble. We find that the ensembles produced focusing on diversity have lower generalization error, and that the degree of importance of diversity in building the ensembles is different for different data sets. We propose several methods for the integration of simple Bayesian classifiers in the ensembles. In a number of cases the techniques for dynamic integration of classifiers have significantly better classification accuracy than their simple static analogues. We suggest that a reason for that is that the dynamic integration better utilizes the ensemble coverage than the static integration.  相似文献   

5.
为解决垃圾网页检测过程中的“维数灾难”和不平衡分类问题,提出一种基于免疫克隆特征选择和欠采样(US)集成的二元分类器算法。首先,使用欠采样技术将训练样本集大类抽样成多个与小类样本数相近的样本集,再将其分别与小类样本合并构成多个平衡的子训练样本集;然后,设计一种免疫克隆算法遴选出多个最优的特征子集;基于最优特征子集对平衡的子样本集进行投影操作,生成平衡数据集的多个视图;最后,用随机森林(RF)分类器对测试样本进行分类,采用简单投票法确定测试样本的最终类别。在WEBSPAM UK-2006数据集上的实验结果表明,该集成分类器算法应用于垃圾网页检测:与随机森林算法及其Bagging和AdaBoost集成分类器算法相比,准确率、F1测度、AUC等指标均提高11%以上;与其他最优的研究结果相比,该集成分类器算法在F1测度上提高2%,在AUC上达到最优。  相似文献   

6.
During the last few years there has been marked attention towards hybrid and ensemble systems development, having proved their ability to be more accurate than single classifier models. However, among the hybrid and ensemble models developed in the literature there has been little consideration given to: 1) combining data filtering and feature selection methods 2) combining classifiers of different algorithms; and 3) exploring different classifier output combination techniques other than the traditional ones found in the literature. In this paper, the aim is to improve predictive performance by presenting a new hybrid ensemble credit scoring model through the combination of two data pre-processing methods based on Gabriel Neighbourhood Graph editing (GNG) and Multivariate Adaptive Regression Splines (MARS) in the hybrid modelling phase. In addition, a new classifier combination rule based on the consensus approach (ConsA) of different classification algorithms during the ensemble modelling phase is proposed. Several comparisons will be carried out in this paper, as follows: 1) Comparison of individual base classifiers with the GNG and MARS methods applied separately and combined in order to choose the best results for the ensemble modelling phase; 2) Comparison of the proposed approach with all the base classifiers and ensemble classifiers with the traditional combination methods; and 3) Comparison of the proposed approach with recent related studies in the literature. Five of the well-known base classifiers are used, namely, neural networks (NN), support vector machines (SVM), random forests (RF), decision trees (DT), and naïve Bayes (NB). The experimental results, analysis and statistical tests prove the ability of the proposed approach to improve prediction performance against all the base classifiers, hybrid and the traditional combination methods in terms of average accuracy, the area under the curve (AUC) H-measure and the Brier Score. The model was validated over seven real world credit datasets.  相似文献   

7.
Non-parametric classification procedures based on a certainty measure and nearest neighbour rule for motor unit potential classification (MUP) during electromyographic (EMG) signal decomposition were explored. A diversity-based classifier fusion approach is developed and evaluated to achieve improved classification performance. The developed system allows the construction of a set of non-parametric base classifiers and then automatically chooses, from the pool of base classifiers, subsets of classifiers to form candidate classifier ensembles. The system selects the classifier ensemble members by exploiting a diversity measure for selecting classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between base classifier outputs, i.e., to measure the degree of decision similarity between base classifiers. The pool of base classifiers consists of two kinds of classifiers: adaptive certainty-based classifiers (ACCs) and adaptive fuzzy k-NN classifiers (AFNNCs) and both utilize different types of features. Once the patterns are assigned to their classes, by the classifier fusion system, firing pattern consistency statistics for each class are calculated to detect classification errors in an adaptive fashion. Performance of the developed system was evaluated using real and simulated EMG signals and was compared with the performance of the constituent base classifiers and the performance of the fixed ensemble containing the full set of base classifiers. Across the EMG signal data sets used, the diversity-based classifier fusion approach had better average classification performance overall, especially in terms of reducing classification errors.  相似文献   

8.
基于粗集理论的选择性支持向量机集成   总被引:1,自引:0,他引:1       下载免费PDF全文
集成分类器的性能很大程度决定于各成员分类器的构造和对各成员分类器的组合方法。提出一种基于粗集理论的选择性支持向量机集成算法,该算法首先利用粗集技术产生一个属性约简集合,然后以各约简集为样本属性空间构造各成员分类器,其次通过对各成员分类器精度与差异度的计算,选择既满足个体的精度要求,又满足个体差异性要求的成员分类器进行集成。最后通过对UCI上一组实验数据的测试,证实该方法能够有效提高支持向量机的推广性能。  相似文献   

9.
曹鹏  李博  栗伟  赵大哲 《计算机应用》2013,33(2):550-553
针对大规模数据的分类准确率低且效率下降的问题,提出一种结合X-means聚类的自适应随机子空间组合分类算法。首先使用X-means聚类方法,保持原有数据结构的同时,把复杂的数据空间自动分解为多个样本子空间进行分治学习;而自适应随机子空间组合分类器,提升了基分类器的差异性并自动确定基分类器数量,提升了组合分类器的鲁棒性及分类准确性。该算法在人工和UCI数据集上进行了测试,并与传统单分类和组合分类算法进行了比较。实验结果表明,对于大规模数据集,该方法具有更好的分类精度和健壮性,并提升了整体算法的效率。  相似文献   

10.
Recent researches in fault classification have shown the importance of accurately selecting the features that have to be used as inputs to the diagnostic model. In this work, a multi-objective genetic algorithm (MOGA) is considered for the feature selection phase. Then, two different techniques for using the selected features to develop the fault classification model are compared: a single classifier based on the feature subset with the best classification performance and an ensemble of classifiers working on different feature subsets. The motivation for developing ensembles of classifiers is that they can achieve higher accuracies than single classifiers. An important issue for an ensemble to be effective is the diversity in the predictions of the base classifiers which constitute it, i.e. their capability of erring on different sub-regions of the pattern space. In order to show the benefits of having diverse base classifiers in the ensemble, two different ensembles have been developed: in the first, the base classifiers are constructed on feature subsets found by MOGAs aimed at maximizing the fault classification performance and at minimizing the number of features of the subsets; in the second, diversity among classifiers is added to the MOGA search as the third objective function to maximize. In both cases, a voting technique is used to effectively combine the predictions of the base classifiers to construct the ensemble output. For verification, some numerical experiments are conducted on a case of multiple-fault classification in rotating machinery and the results achieved by the two ensembles are compared with those obtained by a single optimal classifier.  相似文献   

11.
A data driven ensemble classifier for credit scoring analysis   总被引:2,自引:0,他引:2  
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems.  相似文献   

12.
尹光  朱玉全  陈耿 《计算机工程》2012,38(8):167-169
为提高集成分类器系统的分类性能,提出一种分类器选择集成算法MCC-SCEN。该算法选取基分类器集中具有最大互信息差异性的子集和最大个体分类能力的子集,以确定待扩展分类器集,选择具有较大混合分类能力的基分类器加入到待扩展集中,构成集成系统,进行加权投票并产生结果。实验结果表明,该方法优于经典的AdaBoost和Bagging方法,具有较高的分类准确率。  相似文献   

13.
This paper presents cluster‐based ensemble classifier – an approach toward generating ensemble of classifiers using multiple clusters within classified data. Clustering is incorporated to partition data set into multiple clusters of highly correlated data that are difficult to separate otherwise and different base classifiers are used to learn class boundaries within the clusters. As the different base classifiers engage on different difficult‐to‐classify subsets of the data, the learning of the base classifiers is more focussed and accurate. A selection rather than fusion approach achieves the final verdict on patterns of unknown classes. The impact of clustering on the learning parameters and accuracy of a number of learning algorithms including neural network, support vector machine, decision tree and k‐NN classifier is investigated. A number of benchmark data sets from the UCI machine learning repository were used to evaluate the cluster‐based ensemble classifier and the experimental results demonstrate its superiority over bagging and boosting.  相似文献   

14.
In machine learning, a combination of classifiers, known as an ensemble classifier, often outperforms individual ones. While many ensemble approaches exist, it remains, however, a difficult task to find a suitable ensemble configuration for a particular dataset. This paper proposes a novel ensemble construction method that uses PSO generated weights to create ensemble of classifiers with better accuracy for intrusion detection. Local unimodal sampling (LUS) method is used as a meta-optimizer to find better behavioral parameters for PSO. For our empirical study, we took five random subsets from the well-known KDD99 dataset. Ensemble classifiers are created using the new approaches as well as the weighted majority algorithm (WMA) approach. Our experimental results suggest that the new approach can generate ensembles that outperform WMA in terms of classification accuracy.  相似文献   

15.
基分类器之间的差异性和单个基分类器自身的准确性是影响集成系统泛化性能的两个重要因素,针对差异性和准确性难以平衡的问题,提出了一种基于差异性和准确性的加权调和平均(D-A-WHA)度量基因表达数据的选择性集成算法。以核超限学习机(KELM)作为基分类器,通过D-A-WHA度量调节基分类器之间的差异性和准确性,最后选择一组准确性较高并且与其他基分类器差异性较大的基分类器组合进行集成。通过在UCI基因数据集上进行仿真实验,实验结果表明,与传统的Bagging、Adaboost等集成算法相比,基于D-A-WHA度量的选择性集成算法分类精度和稳定性都有显著的提高,且能有效应用于癌症基因数据的分类中。  相似文献   

16.
Breast cancer is the most commonly occurring form of cancer in women. While mammography is the standard modality for diagnosis, thermal imaging provides an interesting alternative as it can identify tumors of smaller size and hence lead to earlier detection. In this paper, we present an approach to analysing breast thermograms based on image features and a hybrid multiple classifier system. The employed image features provide indications of asymmetry between left and right breast regions that are encountered when a tumor is locally recruiting blood vessels on one side, leading to a change in the captured temperature distribution. The presented multiple classifier system is based on a hybridisation of three computational intelligence techniques: neural networks or support vector machines as base classifiers, a neural fuser to combine the individual classifiers, and a fuzzy measure for assessing the diversity of the ensemble and removal of individual classifiers from the ensemble. In addition, we address the problem of class imbalance that often occurs in medical data analysis, by training base classifiers on balanced object subspaces. Our experimental evaluation, on a large dataset of about 150 breast thermograms, convincingly shows our approach not only to provide excellent classification accuracy and sensitivity but also to outperform both canonical classification approaches as well as other classifier ensembles designed for imbalanced datasets.  相似文献   

17.
Financial distress prediction is very important to financial institutions who must be able to make critical decisions regarding customer loans. Bankruptcy prediction and credit scoring are the two main aspects considered in financial distress prediction. To assist in this determination, thereby lowering the risk borne by the financial institution, it is necessary to develop effective prediction models for prediction of the likelihood of bankruptcy and estimation of credit risk. A number of financial distress prediction models have been constructed, which utilize various machine learning techniques, such as single classifiers and classifier ensembles, but improving the prediction accuracy is the major research issue. In addition, aside from improving the prediction accuracy, there have been very few studies that specifically consider lowering the Type I error. In practice, Type I errors need to receive careful consideration during model construction because they can affect the cost to the financial institution. In this study, we introduce a classifier ensemble approach designed to reduce the misclassification cost. The outputs produced by multiple classifiers are combined by utilizing the unanimous voting (UV) method to find the final prediction result. Experimental results obtained based on four relevant datasets show that our UV ensemble approach outperforms the baseline single classifiers and classifier ensembles. Specifically, the UV ensemble not only provides relatively good prediction accuracy and minimizes Type I/II errors, but also produces the smallest misclassification cost.  相似文献   

18.
面向分布式数据流大数据分类的多变量决策树   总被引:1,自引:0,他引:1  
张宇  包研科  邵良杉  刘威 《自动化学报》2018,44(6):1115-1127
分布式数据流大数据中的类别边界不规则且易变,因此基于单变量决策树的集成分类器需要较大数量的基分类器才能准确地近似表达类别边界,这将降低集成分类器的学习与分类性能.因而,本文提出了基于几何轮廓相似度的多变量决策树.在最优基准向量的引导下将n维空间样本点投影到一维空间以建立有序投影点集合,然后通过类别投影边界将有序投影点集合划分为多个子集,接着分别对不同类别集合的交集递归投影分裂,最终生成决策树.实验表明,本文提出的多变量决策树GODT具有很高的分类精度和较低的训练时间,有效结合了单变量决策树学习效率高与多变量决策树表示能力强的优点.  相似文献   

19.
Ensembles of classifiers that are trained on different parts of the input space provide good results in general. As a popular boosting technique, AdaBoost is an iterative and gradient based deterministic method used for this purpose where an exponential loss function is minimized. Bagging is a random search based ensemble creation technique where the training set of each classifier is arbitrarily selected. In this paper, a genetic algorithm based ensemble creation approach is proposed where both resampled training sets and classifier prototypes evolve so as to maximize the combined accuracy. The objective function based random search procedure of the resultant system guided by both ensemble accuracy and diversity can be considered to share the basic properties of bagging and boosting. Experimental results have shown that the proposed approach provides better combined accuracies using a fewer number of classifiers than AdaBoost.  相似文献   

20.
丁要军 《计算机应用》2015,35(12):3348-3351
针对不平衡网络流量分类精度不高的问题,在旋转森林算法的基础上结合Bagging算法的Bootstrap抽样和基于分类精度排序的基分类器选择算法,提出一种改进的旋转森林算法。首先,对原始训练集按特征进行子集划分并分别使用Bagging进行样本抽样,通过主成分分析(PCA)生成主成分系数矩阵;然后,在原始训练集和主成分系数矩阵的基础上进行特征转换,生成新的训练子集,再次使用Bagging对子集进行抽样,提升训练集的差异性,并使用训练子集训练C4.5基分类器;最后,使用测试集评价基分类器,依据总体分类精度进行排序筛选,保留分类精度较高的分类器并生成一致分类结果。在不平衡网络流量数据集上进行测试实验,依据准确率和召回率两个标准对C4.5、Bagging、旋转森林和改进的旋转森林四种算法评价,依据模型训练时间和测试时间评价四种算法的时间效率。实验结果表明改进的旋转森林算法对万维网(WWW)协议、Mail协议、Attack协议、对等网(P2P)协议的分类准确度达到99.5%以上,召回率也高于旋转森林、Bagging、C4.5三种算法,可用于网络入侵取证、维护网络安全、提升网络服务质量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号