共查询到20条相似文献,搜索用时 0 毫秒
1.
提出了一种新的基于边缘分类能力排序准则,用于基于排序聚集(ordered aggregation,OA)的分类器选择算法.为了表征分类器的分类能力,使用随机参考分类器对原分类器进行模拟,从而获得分类能力的概率模型.为了提高分类器集成性能,将提出的基于边缘分类能力的排序准则与动态集成选择算法相结合,首先将特征空间划分成不同能力的区域,然后在每个划分内构造最优的分类器集成,最后使用动态集成选择算法对未知样本进行分类.在UCI数据集上进行的实验表明,对比现有的排序准则,边缘分类能力的排序准则效果更好,进一步实验表明,基于边缘分类能力的动态集成选择算法较现有分类器集成算法具有分类正确率更高、集成规模更小、分类时间更短的优势. 相似文献
2.
In handwritten pattern recognition, the multiple classifier system has been shown to be useful for improving recognition rates. One of the most important tasks in optimizing a multiple classifier system is to select a group of adequate classifiers, known as an Ensemble of Classifiers (EoC), from a pool of classifiers. Static selection schemes select an EoC for all test patterns, and dynamic selection schemes select different classifiers for different test patterns. Nevertheless, it has been shown that traditional dynamic selection performs no better than static selection. We propose four new dynamic selection schemes which explore the properties of the oracle concept. Our results suggest that the proposed schemes, using the majority voting rule for combining classifiers, perform better than the static selection method. 相似文献
3.
In dynamic ensemble selection (DES) techniques, only the most competent classifiers, for the classification of a specific test sample, are selected to predict the sample’s class labels. The key in DES techniques is estimating the competence of the base classifiers for the classification of each specific test sample. The classifiers’ competence is usually estimated according to a given criterion, which is computed over the neighborhood of the test sample defined on the validation data, called the region of competence. A problem arises when there is a high degree of noise in the validation data, causing the samples belonging to the region of competence to not represent the query sample. In such cases, the dynamic selection technique might select the base classifier that overfitted the local region rather than the one with the best generalization performance. In this paper, we propose two modifications in order to improve the generalization performance of any DES technique. First, a prototype selection technique is applied over the validation data to reduce the amount of overlap between the classes, producing smoother decision borders. During generalization, a local adaptive K-Nearest Neighbor algorithm is used to minimize the influence of noisy samples in the region of competence. Thus, DES techniques can better estimate the classifiers’ competence. Experiments are conducted using 10 state-of-the-art DES techniques over 30 classification problems. The results demonstrate that the proposed scheme significantly improves the classification accuracy of dynamic selection techniques. 相似文献
4.
Dynamic classifier ensemble selection (DCES) plays a strategic role in the field of multiple classifier systems. The real data to be classified often include a large amount of noise, so it is important to study the noise-immunity ability of various DCES strategies. This paper introduces a group method of data handling (GMDH) to DCES, and proposes a novel dynamic classifier ensemble selection strategy GDES-AD. It considers both accuracy and diversity in the process of ensemble selection. We experimentally test GDES-AD and six other ensemble strategies over 30 UCI data sets in three cases: the data sets do not include artificial noise, include class noise, and include attribute noise. Statistical analysis results show that GDES-AD has stronger noise-immunity ability than other strategies. In addition, we find out that Random Subspace is more suitable for GDES-AD compared with Bagging. Further, the bias-variance decomposition experiments for the classification errors of various strategies show that the stronger noise-immunity ability of GDES-AD is mainly due to the fact that it can reduce the bias in classification error better. 相似文献
5.
针对电信客户流失数据集存在的数据维度过高及单一分类器预测效果较弱的问题,结合过滤式和封装式特征选择方法的优点及组合分类器的较高预测能力,提出了一种基于Fisher比率与预测风险准则的分步特征选择方法结合组合分类器的电信客户流失预测模型。首先,基于Fisher比率从原始特征集合中提取具有较高判别能力的特征;其次,采用预测风险准则进一步选取对分类模型预测效果影响较大的特征;最后,构建基于平均概率输出和加权概率输出的组合分类器,以进一步提高客户流失预测效果。实验结果表明,相对于单步特征提取和单分类器模型,该方法能够提高对客户流失预测的效果。 相似文献
6.
Different classifiers with different characteristics and methodologies can complement each other and cover their internal weaknesses; so classifier ensemble is an important approach to handle the weakness of single classifier based systems. In this article we explore an automatic and fast function to approximate the accuracy of a given classifier on a typical dataset. Then employing the function, we can convert the ensemble learning to an optimisation problem. So, in this article, the target is to achieve a model to approximate the performance of a predetermined classifier over each arbitrary dataset. According to this model, an optimisation problem is designed and a genetic algorithm is employed as an optimiser to explore the best classifier set in each subspace. The proposed ensemble methodology is called classifier ensemble based on subspace learning (CEBSL). CEBSL is examined on some datasets and it shows considerable improvements. 相似文献
7.
The concept of a classifier competence is fundamental to multiple classifier systems (MCSs). In this study, a method for calculating the classifier competence is developed using a probabilistic model. In the method, first a randomised reference classifier (RRC) whose class supports are realisations of the random variables with beta probability distributions is constructed. The parameters of the distributions are chosen in such a way that, for each feature vector in a validation set, the expected values of the class supports produced by the RRC and the class supports produced by a modelled classifier are equal. This allows for using the probability of correct classification of the RRC as the competence of the modelled classifier. The competences calculated for a validation set are then generalised to an entire feature space by constructing a competence function based on a potential function model or regression. Three systems based on a dynamic classifier selection and a dynamic ensemble selection (DES) were constructed using the method developed. The DES based system had statistically significant higher average rank than the ones of eight benchmark MCSs for 22 data sets and a heterogeneous ensemble. The results obtained indicate that the full vector of class supports should be used for evaluating the classifier competence as this potentially improves performance of MCSs. 相似文献
8.
针对肿瘤基因表达谱样本少,维数高的特点,提出一种用于肿瘤信息基因提取和亚型识别的集成分类器算法.该算法根据基因的Fisher比率值建立候选子集,再采用相关系数和互信息两种度量方法,分别构造反映基因共表达行为和调控关系的特征子集.粒子群优化算法分别与SVM和KNN构成两个基分类器,从候选子集中提取信息基因并对肿瘤亚型进行分类,最后利用绝对多数投票方法对基分类器的结果进行整合.G.Gordon肺癌亚型识别的实验结果表明了该算法的可行性和有效性. 相似文献
9.
This paper presents several criteria for partition of classes for the support vector machine based hierarchical classification. Our clustering algorithm combines support vector machine and binary tree, it is a divisive (top-down) approach in which a set of classes is automatically separated into two smaller groups at each node of the hierarchy, it splits the classes based on the normalized cuts clustering algorithm. Our clustering algorithm considers the involved classes rather than the individual data samples. In the new proposed measures, similarity between classes is determined based on boundary complexity. In these measures, concepts such as the upper bound of error and Kolmogorov complexity are used. We reported results on several data sets and five distance/similarity measures. Experimental results demonstrate the superiority of the proposed measures compared to other measures; even when applied to nonlinearly separable data, the new criteria perform well. 相似文献
13.
为降低帧内预测的运算复杂度,根据不同的模式在宏块中出现概率的大小不同,在帧内4×4的亮度预测模式中,选取出现概率最大的5种预测模式,作为优先选择的预测模式.基于像素块的纹理特性,选择不具有方向性的直流模式通过阈值的设定来提前终止模式选择,通过提出的算式将4×4亮度块的候选模式从9种降低到1至4种,将16×16亮度块的候选模式从4种减少到1到3种.实验结果显示,在保证图像质量基本不变的情况下,该算法可节约12.26%到48.25%的编码时间,而码率只有微小的增加. 相似文献
14.
In recent years, heuristic algorithms have been successfully applied to solve clustering and classification problems. In this paper, gravitational search algorithm (GSA) which is one of the newest swarm based heuristic algorithms is used to provide a prototype classifier to face the classification of instances in multi-class data sets. The proposed method employs GSA as a global searcher to find the best positions of the representatives (prototypes). The proposed GSA-based classifier is used for data classification of some of the well-known benchmark sets. Its performance is compared with the artificial bee colony (ABC), the particle swarm optimization (PSO), and nine other classifiers from the literature. The experimental results of twelve data sets from UCI machine learning repository confirm that the GSA can successfully be applied as a classifier to classification problems. 相似文献
15.
针对离散二进制粒子群(binary particle swarm optimization,BPSO)算法在解决SVM集成选择问题时容易早熟的问题,提出了一种文化算法架构下的多种群协作算法(Ca-MultiPop).结合BPSO算法的快速演化能力,利用遗传算法(genetic algorithm,GA)增加种群的多样性;在两种进化算法中使用不同的适应度函数,兼顾了集成精度和基分类器之间的差异性.仿真结果表明,该算法在计算精度方面相对于BPSO算法在解决SVM集成选择问题时有所提高. 相似文献
16.
This paper presents a cooperative evolutionary approach for the problem of instance selection for instance based learning.
The model presented takes advantage of one of the recent paradigms in the field of evolutionary computation: cooperative coevolution.
This paradigm is based on a similar approach to the philosophy of divide and conquer. In our method, the training set is divided into several subsets that are searched independently. A population of global
solutions relates the search in different subsets and keeps track of the best combinations obtained. The proposed model has
the advantage over standard methods in that it does not rely on any specific distance metric or classifier algorithm. Additionally,
the fitness function of the individuals considers both storage requirements and classification accuracy, and the user can
balance both objectives depending on his/her specific needs, assigning different weights to each one of these two terms. The
method also shows good scalability when applied to large datasets.
The proposed model is favorably compared with some of the most successful standard algorithms, IB3, ICF and DROP3, with a
genetic algorithm using CHC method, and with four recent methods of instance selection, MSS, entropy-based instance selection,
IMOEA and LVQPRU. The comparison shows a clear advantage of the proposed algorithm in terms of storage requirements, and is,
at least, as good as any of the other methods in terms of testing error. A large set of 50 problems from the UCI Machine Learning
Repository is used for the comparison. Additionally, a study of the effect of instance label noise is carried out, showing
the robustness of the proposed algorithm. 相似文献
17.
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest.” Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well. 相似文献
18.
互联网容纳了海量的文本信息,文本分类系统能够在给定的类别下,自动将文本分门别类,更好地帮助人们挖掘有用信息.介绍了基于词频分类器集成文本分类算法.该算法计算代价小,分类召回率高,但准确率较低,分析了导致准确率低的原因,在此基础上提出了基于改进词频分类器集成的文本分类算法,改进后的算法在文本权重更新方面做了参数调整,使得算法的准确率有显著提高,最后用实验验证了改进后算法的性能.实验结果表明,基于改进词频分类器集成的文本分类算法不仅提高了分类的准确性,而且表现出较好的稳定性. 相似文献
19.
In multi-instance learning, the training set is composed of labeled bags each consists of many unlabeled instances, that is, an object is represented by a set of feature vectors instead of only
one feature vector. Most current multi-instance learning algorithms work through adapting single-instance learning algorithms
to the multi-instance representation, while this paper proposes a new solution which goes at an opposite way, that is, adapting
the multi-instance representation to single-instance learning algorithms. In detail, the instances of all the bags are collected
together and clustered into d groups first. Each bag is then re-represented by d binary features, where the value of the ith feature is set to one if the concerned bag has instances falling into the ith group and zero otherwise. Thus, each bag is represented by one feature vector so that single-instance classifiers can be
used to distinguish different classes of bags. Through repeating the above process with different values of d, many classifiers can be generated and then they can be combined into an ensemble for prediction. Experiments show that the
proposed method works well on standard as well as generalized multi-instance problems.
Zhi-Hua Zhou is currently Professor in the Department of Computer Science & Technology and head of the LAMDA group at Nanjing University.
His main research interests include machine learning, data mining, information retrieval, and pattern recognition. He is associate
editor of Knowledge and Information Systems and on the editorial boards of Artificial Intelligence in Medicine, International Journal of Data Warehousing and Mining, Journal of Computer Science & Technology, and Journal of Software. He has also been involved in various conferences.
Min-Ling Zhang received his B.Sc. and M.Sc. degrees in computer science from Nanjing University, China, in 2001 and 2004, respectively.
Currently he is a Ph.D. candidate in the Department of Computer Science & Technology at Nanjing University and a member of
the LAMDA group. His main research interests include machine learning and data mining, especially in multi-instance learning
and multi-label learning. 相似文献
20.
In this paper, we present a new nonparametric calibration method called ensemble of near-isotonic regression (ENIR). The method can be considered as an extension of BBQ (Naeini et al., in: Proceedings of twenty-ninth AAAI conference on artificial intelligence, 2015b), a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) (Zadrozny and Elkan, in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining 2002). ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular, on the real data, we evaluated ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large-scale datasets, as it is \(O(N \log N)\) time, where N is the number of samples. 相似文献
|