共查询到20条相似文献,搜索用时 15 毫秒
1.
A dynamic classifier ensemble selection approach for noise data 总被引:2,自引:0,他引:2
Dynamic classifier ensemble selection (DCES) plays a strategic role in the field of multiple classifier systems. The real data to be classified often include a large amount of noise, so it is important to study the noise-immunity ability of various DCES strategies. This paper introduces a group method of data handling (GMDH) to DCES, and proposes a novel dynamic classifier ensemble selection strategy GDES-AD. It considers both accuracy and diversity in the process of ensemble selection. We experimentally test GDES-AD and six other ensemble strategies over 30 UCI data sets in three cases: the data sets do not include artificial noise, include class noise, and include attribute noise. Statistical analysis results show that GDES-AD has stronger noise-immunity ability than other strategies. In addition, we find out that Random Subspace is more suitable for GDES-AD compared with Bagging. Further, the bias-variance decomposition experiments for the classification errors of various strategies show that the stronger noise-immunity ability of GDES-AD is mainly due to the fact that it can reduce the bias in classification error better. 相似文献
2.
提出了一种新的基于边缘分类能力排序准则,用于基于排序聚集(ordered aggregation,OA)的分类器选择算法.为了表征分类器的分类能力,使用随机参考分类器对原分类器进行模拟,从而获得分类能力的概率模型.为了提高分类器集成性能,将提出的基于边缘分类能力的排序准则与动态集成选择算法相结合,首先将特征空间划分成不同能力的区域,然后在每个划分内构造最优的分类器集成,最后使用动态集成选择算法对未知样本进行分类.在UCI数据集上进行的实验表明,对比现有的排序准则,边缘分类能力的排序准则效果更好,进一步实验表明,基于边缘分类能力的动态集成选择算法较现有分类器集成算法具有分类正确率更高、集成规模更小、分类时间更短的优势. 相似文献
3.
Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit
card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier
from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic
classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm
dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information
from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure
that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute
values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the
highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative
studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams
with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or
have low confidence.
A preliminary version of this paper was published in the Proceedings of the 4th IEEE International Conference on Data Mining,
pp 305–312, Brighton, UK
Xingquan Zhu received his Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent four months with
Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From
2001 to 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN.
He is currently a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT.
His research interests include Data mining, machine learning, data quality, multimedia computing, and information retrieval.
Since 2000, Dr. Zhu has published extensively, including over 40 refereed papers in various journals and conference proceedings.
Xindong Wu is a Professor and the Chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial
Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems,
and Web information exploration. He has published extensively in these areas in various journals and conferences, including
IEEE TKDE, TPAMI, ACM TOIS, IJCAI, ICML, KDD, ICDM, and WWW, as well as 11 books and conference proceedings. Dr. Wu is the
Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on
Data Mining (ICDM), an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP).
He is the 2004 ACM SIGKDD Service Award winner.
Ying Yang received her Ph.D. in Computer Science from Monash University, Australia in 2003. Following academic appointments at the
University of Vermont, USA, she currently holds a Research Fellow at Monash University, Australia. Dr. Yang is recognized
for contributions in the fields of machine learning and data mining. She has published many scientific papers and book chapters
on adaptive learning, proactive mining, noise cleansing and discretization. Contact her at yyang@mail.csse.monash.edu.au. 相似文献
4.
Dynamic classifier selection (DCS) plays a strategic role in the field of multiple classifier systems (MCS). This paper proposes a study on the performances of DCS by Local Accuracy estimation (DCS-LA). To this end, upper bounds against which the performances can be evaluated are proposed. The experimental results on five datasets clearly show the effectiveness of the selection methods based on local accuracy estimates. 相似文献
5.
自适应随机森林分类器在每个基础分类器上分别设置了警告探测器和漂移探测器,实例训练时常常会同时触发多个警告探测器,引起多棵背景树同步训练,使得运行所需的内存大、时间长。针对此问题,提出了一种改进的自适应随机森林集成分类算法,将概念漂移探测器设置在集成学习器端,移除各基础树端的漂移探测器,并根据集成器预测准确率确定需要训练的背景树的数量。用改进后的算法对较平衡的数据流进行分类,在保证分类性能的前提下,与改进前的算法相比,运行时间有所降低,消耗内存有所减少,能更快适应数据流中出现的概念漂移。 相似文献
6.
针对基于约束得分的特征选择容易受成对约束的组成和基数影响的问题, 提出了一种基于约束得分的动态集成选择算法(dynamic ensemble selection based on bagging constraint score, BCS-DES)。该算法将bagging约束得分(bagging constraint score, BCS)引入动态集成选择算法, 通过将样本空间划分为不同的区域, 使用多种群并行遗传算法为不同测试样本选择局部最优的分类集成, 达到提高分类精度的目的。在UCI实验数据集上进行的实验表明, BCS-DES算法较现有的特征选择算法受成对约束组成和基数影响更小, 效果更好。 相似文献
7.
Diwakar Tripathi Damodar Reddy Edla Ramalingaswamy Cheruku Venkatanareshbabu Kuppili 《Computational Intelligence》2019,35(2):371-394
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets. 相似文献
8.
Different classifiers with different characteristics and methodologies can complement each other and cover their internal weaknesses; so classifier ensemble is an important approach to handle the weakness of single classifier based systems. In this article we explore an automatic and fast function to approximate the accuracy of a given classifier on a typical dataset. Then employing the function, we can convert the ensemble learning to an optimisation problem. So, in this article, the target is to achieve a model to approximate the performance of a predetermined classifier over each arbitrary dataset. According to this model, an optimisation problem is designed and a genetic algorithm is employed as an optimiser to explore the best classifier set in each subspace. The proposed ensemble methodology is called classifier ensemble based on subspace learning (CEBSL). CEBSL is examined on some datasets and it shows considerable improvements. 相似文献
9.
为解决垃圾网页检测过程中的不平衡分类和"维数灾难"问题,提出一种基于随机森林(RF)和欠采样集成的二元分类器算法。首先使用欠采样技术将训练样本集大类抽样成多个子样本集,再将其分别与小类样本集合并构成多个平衡的子训练样本集;然后基于各个子训练样本集训练出多个随机森林分类器;最后用多个随机森林分类器对测试样本集进行分类,采用投票法确定测试样本的最终所属类别。在WEBSPAM UK-2006数据集上的实验表明,该集成分类器算法应用于垃圾网页检测比随机森林算法及其Bagging和Adaboost集成分类器算法效果更好,准确率、F1测度、ROC曲线下面积(AUC)等指标提高至少14%,13%和11%。与Web spam challenge 2007 优胜团队的竞赛结果相比,该集成分类器算法在F1测度上提高至少1%,在AUC上达到最优结果。 相似文献
10.
This paper presents cluster‐based ensemble classifier – an approach toward generating ensemble of classifiers using multiple clusters within classified data. Clustering is incorporated to partition data set into multiple clusters of highly correlated data that are difficult to separate otherwise and different base classifiers are used to learn class boundaries within the clusters. As the different base classifiers engage on different difficult‐to‐classify subsets of the data, the learning of the base classifiers is more focussed and accurate. A selection rather than fusion approach achieves the final verdict on patterns of unknown classes. The impact of clustering on the learning parameters and accuracy of a number of learning algorithms including neural network, support vector machine, decision tree and k‐NN classifier is investigated. A number of benchmark data sets from the UCI machine learning repository were used to evaluate the cluster‐based ensemble classifier and the experimental results demonstrate its superiority over bagging and boosting. 相似文献
11.
Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory. 相似文献
12.
This work aims to connect two rarely combined research directions, i.e., non-stationary data stream classification and data analysis with skewed class distributions. We propose a novel framework employing stratified bagging for training base classifiers to integrate data preprocessing and dynamic ensemble selection methods for imbalanced data stream classification. The proposed approach has been evaluated based on computer experiments carried out on 135 artificially generated data streams with various imbalance ratios, label noise levels, and types of concept drift as well as on two selected real streams. Four preprocessing techniques and two dynamic selection methods, used on both bagging classifiers and base estimators levels, were considered. Experimentation results showed that, for highly imbalanced data streams, dynamic ensemble selection coupled with data preprocessing could outperform online and chunk-based state-of-art methods. 相似文献
13.
传统的分类算法大都建立在平衡数据集的基础上,当样本数据不平衡时,这些学习算法的性能往往会明显下降.对于非平衡数据分类问题,提出了一种优化的支持向量机(SVM)集成分类器模型,采用KSMOTE和Bootstrap对非平衡数据进行预处理,生成相应的SVM模型并用复合形算法优化模型参数,最后利用优化的参数并行生成SVM集成分类器模型,采用投票机制得到分类结果.对5组UCI标准数据集进行实验,结果表明采用优化的SVM集成分类器模型较SVM模型、优化的SVM模型等分类精度有了明显的提升,同时验证了不同的bootNum取值对分类器性能效果的影响. 相似文献
14.
根据免疫否定选择原理,设计了基于掩码分段匹配的否定选择分类器,用于实现规则匹配分类。给出了适用于免疫优化的分类规则编码及分类信息分的评价标准,通过免疫进化对其进行群体优化以生成更为简洁、便于理解的数据规则集。该方法使得免疫优化的各种优良特性在数据分类中得到充分的运用,避免了传统分类算法缺乏全局优化能力的缺点,提高了对样本的识别能力。实验结果表明,这种免疫分类器及优化方法是一种有效、可行的分类器设计方案,提高了数据分类的准确性。 相似文献
15.
选择性聚类融合研究进展 总被引:1,自引:0,他引:1
传统的聚类融合方法通常是将所有产生的聚类成员融合以获得最终的聚类结果。在监督学习中,选择分类融合方法会获得更好的结果,从选择分类融合中得到启示,在聚类融合中应用这种方法被定义为选择性聚类融合。对选择性聚类融合关键技术进行了综述,讨论了未来的研究方向。 相似文献
16.
Nima Hatami 《Expert systems with applications》2012,39(1):936-947
Error-correcting output coding (ECOC) is a strategy to create classifier ensembles which reduces a multi-class problem into some binary sub-problems. A key issue in designing any ECOC classifier refers to defining optimal codematrix having maximum discrimination power and minimum number of columns. This paper proposes a heuristic method for application-dependent design of optimal ECOC matrix based on a thinning algorithm. The main idea of the proposed Thinned-ECOC method is to successively remove some redundant and unnecessary columns of any initial codematrix based on a metric defined for each column. As a result, computational cost of the ensemble is reduced while preserving its accuracy. Proposed method has been validated using the UCI machine learning database and further applied to a couple of real-world pattern recognition problems (the face recognition and gene expression based cancer classification). Experimental results emphasize the robustness of Thinned-ECOC in comparison with existing state-of-the-art code generation methods. 相似文献
17.
动态集成选择算法中,待测样本的能力区域由固定样本组成,这会影响分类器选择,因此提出一种基于动态能力区域策略的DES-DCR-CIER算法。首先采用异构分类器生成基分类器池,解决同构集成分类器差异性较小和异构集成分类器数目较少的问题;然后采用相互自适应K近邻算法、逼近样本集距离中心和剔除类别边缘样本三个步骤得到待测样本的动态能力区域,基于整体互补性指数选择一组互补性强的分类器;最后通过ER规则对分类器组进行合成。在安徽合肥某三甲医院的八位超声科医生乳腺肿块诊断数据集和美国威斯康辛州乳腺癌诊断公开数据集上的实验表明,基于DES-DCR-CIER算法的诊断模型精度更优。 相似文献
18.
为提高数据分类的性能,提出了一种基于信息熵[1]的多分类器动态组合方法(EMDA)。此方法在多个UCI标准数据集上进行了测试,并与由集成学习算法—AdaBoost,训练出的各个基分类器的分类效果进行比较,证明了该算法的有效性。 相似文献
19.
针对既有历史数据又有流特征的全新应用场景,提出了一种基于组特征选择和流特征的在线特征选择算法。在对历史数据的组特征选择阶段,为了弥补单一聚类算法的不足,引入聚类集成的思想。先利用k-means方法通过多次聚类得到一个聚类集体,在集成阶段再利用层次聚类算法对聚类集体进行集成得到最终的结果。在对流特征数据的在线特征选择阶段,对组构造产生的特征组通过探讨特征间的相关性来更新特征组,最终通过组变换获得特征子集。实验结果表明,所提算法能有效应对全新场景下的在线特征选择问题,并且有很好的分类性能。 相似文献
20.
传统集成分类算法中,一般将集成数目设置为固定值,这可能会导致较低分类准确率。针对这一问题,提出了准确率爬坡集成分类算法(C-ECA)。首先,该算法不再用一些基分类器去替换相同数量的表现最差的基分类器,而是基于准确率对基分类器进行更新,然后确定最佳集成数目。其次,在C-ECA的基础上提出了基于爬坡的动态加权集成分类算法(C-DWECA)。该算法提出了一个加权函数,其在具有不同特征的数据流上训练基分类器时,可以获得基分类器的最佳权值,从而提升集成分类器的性能。最后,为了能更早地检测到概念漂移并提高最终精度,采用了快速霍夫丁漂移检测方法(FHDDM)。实验结果表明C-DWECA的准确率最高可达到97.44%,并且该算法的平均准确率比自适应多样性的在线增强(ADOB)算法提升了40%左右,也优于杠杆装袋(LevBag)、自适应随机森林(ARF)等其他对比算法。 相似文献