首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias–variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.  相似文献   

2.
Discretization techniques have played an important role in machine learning and data mining as most methods in such areas require that the training data set contains only discrete attributes. Data discretization unification (DDU), one of the state-of-the-art discretization techniques, trades off classification errors and the number of discretized intervals, and unifies existing discretization criteria. However, it suffers from two deficiencies. First, the efficiency of DDU is very low as it conducts a large number of parameters to search good results, which does not still guarantee to obtain an optimal solution. Second, DDU does not take into account the number of inconsistent records produced by discretization, which leads to unnecessary information loss. To overcome the above deficiencies, this paper presents a Uni versal Dis cretization technique, namely UniDis. We first develop a non-parametric normalized discretization criteria which avoids the effect of relatively large difference between classification errors and the number of discretized intervals on discretization results. In addition, we define a new entropy-based measure of inconsistency for multi-dimensional variables to effectively control information loss while producing a concise summarization of continuous variables. Finally, we propose a heuristic algorithm to guarantee better discretization based on the non-parametric normalized discretization criteria and the entropy-based inconsistency. Besides theoretical analysis, experimental results demonstrate that our approach is statistically comparable to DDU evaluated by a popular statistical test and it yields a better discretization scheme which significantly improves the accuracy of classification than previously other known discretization methods except for DDU by running J4.8 decision tree and Naive Bayes classifier.  相似文献   

3.
目的 细粒度分类近年来受到了越来越多研究者的广泛关注,其难点是分类目标间的差异非常小。为此提出一种分类错误指导的分层双线性卷积神经网络模型。方法 该模型的核心思想是将双线性卷积神经网络算法(B-CNN)容易分错、混淆的类再分别进行重新训练和分类。首先,为得到易错类,提出分类错误指导的聚类算法。该算法基于受限拉普拉斯秩(CLR)聚类模型,其核心“关联矩阵”由“分类错误矩阵”构造。其次,以聚类结果为基础,构建了新的分层B-CNN模型。结果 用分类错误指导的分层B-CNN模型在CUB-200-2011、 FGVC-Aircraft-2013b和Stanford-cars 3个标准数据集上进行了实验,相比于单层的B-CNN模型,分类准确率分别由84.35%,83.56%,89.45%提高到了84.67%,84.11%,89.78%,验证了本文算法的有效性。结论 本文提出了用分类错误矩阵指导聚类从而进行重分类的方法,相对于基于特征相似度而构造的关联矩阵,分类错误矩阵直接针对分类问题,可以有效提高易混淆类的分类准确率。本文方法针对比较相近的目标,尤其是有非常相近的目标的情况,通过将容易分错、混淆的目标分组并进行再训练和重分类,使得分类效果更好,适用于细粒度分类问题。  相似文献   

4.
There are several data based methods in the field of artificial intelligence which are nowadays frequently used for analyzing classification problems in the context of medical applications. As we show in this paper, the application of enhanced evolutionary computation techniques to classification problems has the potential to evolve classifiers of even higher quality than those trained by standard machine learning methods. On the basis of five medical benchmark classification problems taken from the UCI repository as well as the Melanoma data set (prepared by members of the Department of Dermatology of the Medical University Vienna) we document that the enhanced genetic programming approach presented here is able to produce comparable or even better results than linear modeling methods, artificial neural networks, kNN classification, support vector machines and also various genetic programming approaches.
Stefan WagnerEmail:
  相似文献   

5.
We propose a two-layer decision fusion technique, called Fuzzy Stacked Generalization (FSG) which establishes a hierarchical distance learning architecture. At the base-layer of an FSG, fuzzy k-NN classifiers receive different feature sets each of which is extracted from the same dataset to gain multiple views of the dataset. At the meta-layer, first, a fusion space is constructed by aggregating decision spaces of all the base-layer classifiers. Then, a fuzzy k-NN classifier is trained in the fusion space by minimizing the difference between the large sample and N-sample classification error. In order to measure the degree of collaboration among the base-layer classifiers and the diversity of the feature spaces, a new measure called, shareability, is introduced. Shearability is defined as the number of samples that are correctly classified by at least one of the base-layer classifiers in FSG. In the experiments, we observe that FSG performs better than the popular distance learning and ensemble learning algorithms when the shareability measure is large enough such that most of the samples are correctly classified by at least one of the base-layer classifiers. The relationship between the proposed and state-of-the-art diversity measures is experimentally analyzed. The tests performed on a variety of artificial and real-world benchmark datasets show that the classification performance of FSG increases compared to that of state-of-the art ensemble learning and distance learning methods as the number of classes increases.  相似文献   

6.
Regression via classification (RvC) is a method in which a regression problem is converted into a classification problem. A discretization process is used to covert continuous target value to classes. The discretized data can be used with classifiers as a classification problem. In this paper, we use a discretization method, Extreme Randomized Discretization (ERD), in which bin boundaries are created randomly to create ensembles. We present two ensemble methods for RvC problems. We show theoretically that the proposed ensembles for RvC perform better than RvC with the equal-width discretization method. We also show the superiority of the proposed ensemble methods experimentally. Experimental results suggest that the proposed ensembles perform competitively to the method developed specifically for regression problems.  相似文献   

7.
一种基于类别属性关联程度最大化离散算法   总被引:2,自引:0,他引:2  
针对现有离散化算法难以兼顾计算速度和求解质量这一难题,提出一种新的基于类别属性关联程度最大化监督离散化算法.该算法考虑了类别、属性值的空间分布特征,根据类别与属性之间的内在联系构造离散化框架,使离散化后类别和属性的关联程度最大.实验结果表明,基于类别属性关联程度最大化离散算法在保证计算速度的前提下能有效提高分类精度,减少分类规则个数.  相似文献   

8.
唐诗淇  文益民  秦一休 《软件学报》2017,28(11):2940-2960
近年来,迁移学习得到越来越多的关注.现有的在线迁移学习算法一般从单个源领域迁移知识,然而,当源领域与目标领域相似度较低时,很难进行有效的迁移学习.基于此,提出了一种基于局部分类精度的多源在线迁移学习方法——LC-MSOTL.LC-MSOTL存储多个源领域分类器,计算新到样本与目标领域已有样本之间的距离以及各源领域分类器对其最近邻样本的分类精度,从源领域分类器中挑选局部精度最高的分类器与目标领域分类器加权组合,从而实现多个源领域知识到目标领域的迁移学习.在人工数据集和实际数据集上的实验结果表明,LC-MSOTL能够有效地从多个源领域实现选择性迁移,相对于单源在线迁移学习算法OTL,显示出了更高的分类准确率.  相似文献   

9.
We consider an automated agent that needs to coordinate with a human partner when communication between them is not possible or is undesirable (tacit coordination games). Specifically, we examine situations where an agent and human attempt to coordinate their choices among several alternatives with equivalent utilities. We use machine learning algorithms to help the agent predict human choices in these tacit coordination domains. Experiments have shown that humans are often able to coordinate with one another in communication-free games, by using focal points, “prominent” solutions to coordination problems. We integrate focal point rules into the machine learning process, by transforming raw domain data into a new hypothesis space. We present extensive empirical results from three different tacit coordination domains. The Focal Point Learning approach results in classifiers with a 40–80% higher correct classification rate, and shorter training time, than when using regular classifiers, and a 35% higher correct classification rate than classical focal point techniques without learning. In addition, the integration of focal points into learning algorithms results in agents that are more robust to changes in the environment. We also present several results describing various biases that might arise in Focal Point based coordination.  相似文献   

10.
For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号