首页 | 本学科首页   官方微博 | 高级检索  
     

一种健壮有效的决策树改进模型
引用本文:刘鹏.一种健壮有效的决策树改进模型[J].计算机工程与应用,2005,41(33):172-175.
作者姓名:刘鹏
作者单位:上海财经大学经济信息管理系,上海,200433
摘    要:论文提出了一种健壮有效的决策树改进模型R-C4.5及其简化版本。该决策树模型基于著名的C4.5决策树模型,但在属性的选取和分枝策略上进行了改进。对每一个属性计算对应样本子集的熵和样本子集熵的平均值,并将样本子集熵的值不小于平均值的样本子集进行合并,形成一个临时的复合样本子集,即合并分类效果较差的分枝。利用临时复合样本子集的熵值和未合并样本子集的熵值计算该结点的修正信息增益,并选择具有最高修正信息增益的属性作为当前结点的测试属性。其分枝对应于未合并样本子集和复合样本子集。该模型的简化版本在数据预处理阶段完成。R-C4.5决策树模型在提高测试属性选择度量的可解释性、减少空枝和无意义分枝,及过度拟合等方面有了显著的提高。

关 键 词:决策树模型  C4.5  R-C4.5  分类器  数据挖掘
文章编号:1002-8331-(2005)33-0172-04
收稿时间:2005-02
修稿时间:2005-02

A Robust and Effective Decision Tree Improved Model
Liu Peng.A Robust and Effective Decision Tree Improved Model[J].Computer Engineering and Applications,2005,41(33):172-175.
Authors:Liu Peng
Affiliation:Department of Information Systems,Shanghai University of Finance and Economics,Shanghai 200433
Abstract:In this paper,a robust and effective decision tree improved model R-C4.5 and its simplified version are introduced.The model is based on C4.5,but it is improved in attribution selection and partitioning rules.In R-C4.5,the branches which have poor appearances in classification are united.In the first step,we calculate the entropies of every attribute and average entropy according to subset of samples.Then,the subsets whose entropies are not less than the average are united.In the next step ,we calculate the modified information gain of certain node using temporary complex subset and the other subsets,and attribute with the highest information gain is chosen as the test attribute for the current node.The simplified version of R-C4.5 model is implemented in data preprocessing.R-C4.5 enhances interpretability of test attribute selection,reduces the number of empty or insignificant branches and avoid the appearance of over fitting.
Keywords:decision tree  C4  5  R-C4  5  classifier  data mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号