分支合并对决策树归纳学习的影响 Merging-Branches Impact on Decision Tree Induction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

分支合并对决策树归纳学习的影响

引用本文：	王熙照,杨晨晓.分支合并对决策树归纳学习的影响[J].计算机学报,2007,30(8):1251-1258.

作者姓名：	王熙照杨晨晓

作者单位：	河北大学数学与计算机学院,河北,保定,071002

摘要：	传统的决策树构建方法,由于其选择扩展属性时的归纳偏置,导致属性值较多的属性总会被优先选择,从而导致树的规模过大,并且泛化能力下降,因此需对其进行简化.剪枝是简化的一种,分为预剪枝和后剪枝.该文主要针对预剪枝中的分支合并进行研究.文中研究了分支合并对决策树归纳学习的影响;具体讨论了在决策树的产生过程中,选择适当的分支合并策略对决策树进行分钟合并处理后,能否增强树的可理解性,减少树的复杂程度以及提高树的泛化精度;基于信息增益,分析了分支合并后决策树的复杂程度,设计实现了一种基于正例比的分支合并算法SSID和一种基于最大增益补偿的分支合并算法MCID.实验结果显示:SSID和MCID所得到的决策树在可理解性和泛化精度方面均明显优于See5.
关键词：	决策树归纳归纳偏置剪枝分支合并信息增益增益补偿分支合并决策树归纳学习影响 Induction Decision Tree 显示结果实验增益补偿 SSID 合并算法正例设计实现合并后分析信息增益泛化精度程度可理解性
修稿时间：	2006-10-23
Merging-Branches Impact on Decision Tree Induction

WANG Xi-Zhao,YANG Chen-Xiao.Merging-Branches Impact on Decision Tree Induction[J].Chinese Journal of Computers,2007,30(8):1251-1258.

Authors:	WANG Xi-Zhao YANG Chen-Xiao

Affiliation:	College of Mathematics and Computer Science, Hebei University, Baoding, Hebei 071002

Abstract:	Since inductive bias exists during the process of selection of expanded attributes, attributes with more values are usually preferred to be selected. It consequently results in a decision tree with large scale and with poor generalization capability. Therefore it is necessary to simplify the decision tree including pre-pruning and post-pruning. This paper focuses on the pre-pruning. A new strategy of pre-pruning is given, that is, at the process of tree growth, two branches (or more) from the same node are merged into one branch and then the tree growth process continues. This paper investigates the impact of merging branches on decision tree induction. The main concerns are whether the comprehensibility, the size and the generalization accuracy of a decision tree can be improved if an appropriate merging strategy is selected and applied. Based on information gain, this paper analyzes the complexity of a decision tree before and after merging branches, and designs two algorithms of merging branches, SSID (based on the proportion of positive samples) and MCID (based on the most gain compensation). Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).

Keywords:	decision tree induction induction bias pruning merging branches information gain gain compensation
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏