首页 | 本学科首页   官方微博 | 高级检索  
     

分支合并对决策树归纳学习的影响
引用本文:王熙照,杨晨晓.分支合并对决策树归纳学习的影响[J].计算机学报,2007,30(8):1251-1258.
作者姓名:王熙照  杨晨晓
作者单位:河北大学数学与计算机学院,河北,保定,071002
摘    要:传统的决策树构建方法,由于其选择扩展属性时的归纳偏置,导致属性值较多的属性总会被优先选择,从而导致树的规模过大,并且泛化能力下降,因此需对其进行简化.剪枝是简化的一种,分为预剪枝和后剪枝.该文主要针对预剪枝中的分支合并进行研究.文中研究了分支合并对决策树归纳学习的影响;具体讨论了在决策树的产生过程中,选择适当的分支合并策略对决策树进行分钟合并处理后,能否增强树的可理解性,减少树的复杂程度以及提高树的泛化精度;基于信息增益,分析了分支合并后决策树的复杂程度,设计实现了一种基于正例比的分支合并算法SSID和一种基于最大增益补偿的分支合并算法MCID.实验结果显示:SSID和MCID所得到的决策树在可理解性和泛化精度方面均明显优于See5.

关 键 词:决策树归纳  归纳偏置  剪枝  分支合并  信息增益  增益补偿  分支合并  决策树  归纳学习  影响  Induction  Decision  Tree  显示  结果  实验  增益补偿  SSID  合并算法  正例  设计实现  合并后  分析  信息增益  泛化精度  程度  可理解性
修稿时间:2006-10-23

Merging-Branches Impact on Decision Tree Induction
WANG Xi-Zhao,YANG Chen-Xiao.Merging-Branches Impact on Decision Tree Induction[J].Chinese Journal of Computers,2007,30(8):1251-1258.
Authors:WANG Xi-Zhao  YANG Chen-Xiao
Affiliation:College of Mathematics and Computer Science, Hebei University, Baoding, Hebei 071002
Abstract:Since inductive bias exists during the process of selection of expanded attributes, attributes with more values are usually preferred to be selected. It consequently results in a decision tree with large scale and with poor generalization capability. Therefore it is necessary to simplify the decision tree including pre-pruning and post-pruning. This paper focuses on the pre-pruning. A new strategy of pre-pruning is given, that is, at the process of tree growth, two branches (or more) from the same node are merged into one branch and then the tree growth process continues. This paper investigates the impact of merging branches on decision tree induction. The main concerns are whether the comprehensibility, the size and the generalization accuracy of a decision tree can be improved if an appropriate merging strategy is selected and applied. Based on information gain, this paper analyzes the complexity of a decision tree before and after merging branches, and designs two algorithms of merging branches, SSID (based on the proportion of positive samples) and MCID (based on the most gain compensation). Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).
Keywords:decision tree induction  induction bias  pruning  merging branches  information gain  gain compensation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号