首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 875 毫秒
1.
一种改进的决策树后剪枝算法磁   总被引:1,自引:0,他引:1  
当深度和节点个数超过一定规模后,决策树对未知实例的分类准确率会随着规模的增大而逐渐降低,需要在保证分类正确率的前提下,用剪枝算法对减小决策树的规模。论文在对现有决策树剪枝算法优缺点进行分析的基础上,提出了一种综合考虑分类精度、分类稳定性以及决策树规模的后剪枝改进算法,并通过实验证明了该算法在保证模型判别精度和稳定性的前提下,可以有效地减小了决策树的规模,使得最终的自动判别模型更加简洁。  相似文献   

2.
该文主要探讨了基于数据仓库的数据挖掘技术中分类算法的决策树算法的基础理论和实施方法,分析并改进了分类方法中决策树算法;并在决策树预剪枝算法中,利用父结点与当前结点信息嫡的比值来作为是否停止决策树扩张的评判标准。  相似文献   

3.
传统关联规则挖掘在面临分类决策问题时,易出现非频繁规则遗漏、预测精度不高的问题。为得到正确合理且更为完整的规则,提出了一种改进方法 DT-AR(decision tree-association rule algorithm),利用决策树剪枝策略对关联规则集进行补充。该方法利用FP-Growth(frequent pattern growth)算法得到关联规则集,利用C4.5算法构建后剪枝决策树并提取分类规则,在进行置信度迭代筛选后与关联规则集取并集修正,利用置信度作为权重系数采取投票法进行分类。实验结果表明,与传统关联规则挖掘和决策树剪枝方法相比,该方法得到的规则在数据集分类结果上更准确。  相似文献   

4.
决策树剪枝可以提高决策树的分类准确度。代价复杂度剪枝( CCP)等常用的剪枝算法,都以降低决策树的误判率作为剪枝依据。引入赤池信息准则( AIC)评价决策树的优良性,并提出了基于AIC的决策树剪枝算法,将分类正确概率和复杂度的综合评价作为剪枝依据。通过实例分析,基于AIC的剪枝算法能够得到高分类准确度的决策树,并没有出现过拟合或剪枝不充足等问题。  相似文献   

5.
基于数据挖掘的决策树方法分析   总被引:1,自引:0,他引:1  
决策树方法因其简单、直观、准确率高等特点在数据挖掘及数据分析中得到了广泛的应用。在介绍了决策树方法的一般知识后,深入分析了决策树的生成算法与模型,并对决策树的剪枝过程进行了探讨。  相似文献   

6.
在多红外火焰探测系统中,提出了一种基于决策树的火灾识别算法。按照特种火灾探测器的国家标准实验的要求,获取实验数据。该算法首先对五个红外火焰探测器获得的数据进行多窗口重叠交叉预处理,然后提取六个火灾特征作为决策树的分类属性,对决策树进行训练、剪枝,最后得到火灾识别的最优决策树模型。将该识别模型应用于在线火灾识别,实验结果表明该决策树分类算法的准确率可以达到95.2%,识别速度在2s以内,较其他的分类识别算法有更高的准确率和更快的识别速度,具有很好的实用性。  相似文献   

7.
在多红外火焰探测系统中,提出了一种基于决策树的火灾识别算法。按照特种火灾探测器的国家标准实验的要求,获取实验数据。该算法首先对五个红外火焰探测器获得的数据进行多窗口重叠交叉预处理,然后提取六个火灾特征作为决策树的分类属性,对决策树进行训练、剪枝,最后得到火灾识别的最优决策树模型。将该识别模型应用于在线火灾识别,实验结果表明该决策树分类算法的准确率可以达到95.2%,识别速度在2S以内,较其它的分类识别算法有更高的准确率和更快的识别速度,具有很好的实用性。  相似文献   

8.
武彤  程辉 《计算机科学》2013,40(Z11):278-280,295
决策树是一种有效的分类方法,但在构建决策树模型的过程中,常常会出现模型过度拟合的现象。利用基于BP神经网络的决策树剪枝算法(BP-Pruning)进行软剪枝处理,然后根据BP-Pruning的一些不足,提出一种改进算法,简称GBP-Pruning算法。该算法通过引入遗传算法来训练BP-Pruning算法模型中的权值和阈值,从而克服了BP-Pruning算法上的不足,最后验证了GBP-Pruning算法的可行性。  相似文献   

9.
决策树是数据挖掘中的一种重要的分类器.文章在介绍了一些典型的决策树分类算法的基础上,研究了一种相关性度量的决策树分类器.其主要思想是在建立决策树过程中采用属性相关性度量来确定划分条件属性的顺序,通过阈值设定和处理简化了决策树的剪枝和优化过程,避免了使用信息熵带来的不当划分,详细描述了算法的执行过程以及正确性证明和时间复杂性分析.  相似文献   

10.
一种以相关性确定条件属性的决策树   总被引:5,自引:1,他引:5  
韩家新  王家华 《微机发展》2003,13(5):38-39,42
决策树是数据挖掘中的一种重要的分类器。文章在介绍了一些典型的决策树分类算法的基础上,研究了一种相关性度量的决策树分类器。其主要思想是在建立决策树过程中采用属性相关性度量来确定划分条件属性的顺序,通过阈值设定和处理简化了决策树的剪枝和优化过程,避免了使用信息熵带来的不当划分,详细描述了算法的执行过程以及正确性证明和时间复杂性分析。  相似文献   

11.
A generalisation of bottom-up pruning is proposed as a model level combination method for a decision tree ensemble. Bottom up pruning on a single tree involves choosing between a subtree rooted at a node, and a leaf, dependant on a pruning criterion. A natural extension to an ensemble of trees is to allow subtrees from other ensemble trees to be grafted onto a node in addition to the operations of pruning to a leaf and leaving the existing subtree intact. Suitable pruning criteria are proposed and tested for this multi-tree pruning context. Gains in both performance and in particular compactness over individually pruned trees are observed in tests performed on a number of datasets from the UCI database. The method is further illustrated on a churn prediction problem in the telecommunications domain.  相似文献   

12.
Trading Accuracy for Simplicity in Decision Trees   总被引:4,自引:0,他引:4  
Bohanec  Marko  Bratko  Ivan 《Machine Learning》1994,15(3):223-250
When communicating concepts, it is often convenient or even necessary to define a concept approximately. A simple, although only approximately accurate concept definition may be more useful than a completely accurate definition which involves a lot of detail. This paper addresses the problem: given a completely accurate, but complex, definition of a concept, simplify the definition, possibly at the expense of accuracy, so that the simplified definition still corresponds to the concept sufficiently well. Concepts are represented by decision trees, and the method of simplification is tree pruning. Given a decision tree that accurately specifies a concept, the problem is to find a smallest pruned tree that still represents the concept within some specified accuracy. A pruning algorithm is presented that finds an optimal solution by generating adense sequence of pruned trees, decreasing in size, such that each tree has the highest accuracy among all the possible pruned trees of the same size. An efficient implementation of the algorithm, based on dynamic programming, is presented and empirically compared with three progressive pruning algorithms using both artificial and real-world data. An interesting empirical finding is that the real-world data generally allow significantly greater simplification at equal loss of accuracy.  相似文献   

13.
代价敏感决策树是以最小化误分类代价和测试代价为目标的一种决策树.目前,随着数据量急剧增长,劣质数据的出现也愈发频繁.在建立代价敏感决策树时,训练数据集中的劣质数据会对分裂属性的选择和决策树结点的划分造成一定的影响.因此在进行分类任务前,需要提前对数据进行劣质数据清洗.然而在实际应用中,由于数据清洗工作所需要的时间和金钱代价往往很高,许多用户给出了自己可接受的数据清洗代价最大值,并要求将数据清洗的代价控制在这一阈值内.因此除了误分类代价和测试代价以外,劣质数据的清洗代价也是代价敏感决策树建立过程中的一个重要因素.然而,现有代价敏感决策树建立的相关研究没有考虑数据质量问题.为了弥补这一空缺,着眼于研究劣质数据上代价敏感决策树的建立问题.针对该问题,提出了3种融合数据清洗算法的代价敏感决策树建立方法,并通过实验证明了所提出方法的有效性.  相似文献   

14.
A novel pruning approach using expert knowledge for data-specific pruning   总被引:1,自引:0,他引:1  
Classification is an important data mining task that discovers hidden knowledge from the labeled datasets. Most approaches to pruning assume that all dataset are equally uniform and equally important, so they apply equal pruning to all the datasets. However, in real-world classification problems, all the datasets are not equal and considering equal pruning rate during pruning tends to generate a decision tree with large size and high misclassification rate. We approach the problem by first investigating the properties of each dataset and then deriving data-specific pruning value using expert knowledge which is used to design pruning techniques to prune decision trees close to perfection. An efficient pruning algorithm dubbed EKBP is proposed and is very general as we are free to use any learning algorithm as the base classifier. We have implemented our proposed solution and experimentally verified its effectiveness with forty real world benchmark dataset from UCI machine learning repository. In all these experiments, the proposed approach shows it can dramatically reduce the tree size while enhancing or retaining the level of accuracy.  相似文献   

15.
16.
基于PVM的博弈树的网络并行搜索   总被引:1,自引:0,他引:1  
王京辉  乔卫民 《计算机工程》2005,31(9):29-30,126
通过分析博弈理论和a-b剪枝搜索过程,提出了使用PVM构造并行搜索网络.设计和实现了基于PVM的博弈树并行搜索过程.在博弈树搜索中通过构造的并行搜索网络和使用分而治之的策略把搜索过程分布在多个计算机上同时进行,在叶计算机结点的搜索中,通过a-b剪枝技术,剪枝了大量的搜索结点.全局并行搜索和局部剪枝技术的使用,加快了搜索的速度,解决了使用单计算机搜索速度和时间不可行的问题.该博弈并行搜索模型,适用于一般的博弈树搜索问题.  相似文献   

17.
An Empirical Comparison of Pruning Methods for Decision Tree Induction   总被引:5,自引:0,他引:5  
This paper compares five methods for pruning decision trees, developed from sets of examples. When used with uncertain rather than deterministic data, decision-tree induction involves three main stages—creating a complete tree able to classify all the training examples, pruning this tree to give statistical reliability, and processing the pruned tree to improve understandability. This paper concerns the second stage—pruning. It presents empirical comparisons of the five methods across several domains. The results show that three methods—critical value, error complexity and reduced error—perform well, while the other two may cause problems. They also show that there is no significant interaction between the creation and pruning methods.  相似文献   

18.
项婧  任劼 《计算机工程与设计》2006,27(15):2905-2908
近年来,需要深入研究癌症细胞的基因表达技术正在不断增多。机器学习算法已经被广泛用于当今世界的许多领域,但是却很少应用于生物信息领域。系统研究了决策树的生成、修剪的原理和算法以及其它与决策树相关的问题;并且根据CAMDA2000(critical assessment of mieroarray data analysis)提供的急性淋巴白血病(ALL)和急性骨髓白血病(AML)数据集,设计并实现了一个基于ID3算法的决策树分类器,并利用后剪枝算法简化决策树。最后通过实验验证算法的有效性,实验结果表明利用该决策树分类器对白血病微阵列实验数据进行判别分析,分类准确率很高,证明了决策树算法在医学数据挖掘领域有着广泛的应用前景。  相似文献   

19.
This paper deals with the decision rules of a tree classifier for performing the classification at each nonterminal node, under the assumption of complete probabilistic information. For given tree structure and feature subsets to be used, the optimal decision rules (strategy) are derived which minimize the overall probability of misclassification. The primary result is illustrated by an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号