首页 | 本学科首页   官方微博 | 高级检索  
     

基于相关系数的决策树优化算法
引用本文:董跃华,刘力. 基于相关系数的决策树优化算法[J]. 计算机工程与科学, 2015, 37(9): 1783-1793
作者姓名:董跃华  刘力
作者单位:;1.江西理工大学信息工程学院
摘    要:通过分析ID3算法的基本原理及其多值偏向问题,提出了一种基于相关系数的决策树优化算法。首先通过引进相关系数对ID3算法进行改进,从而克服其多值偏向问题,然后运用数学中泰勒公式和麦克劳林公式的性质,对信息增益公式进行近似简化。通过具体数据的实例验证,说明优化后的ID3算法能够解决多值偏向问题。标准数据集UCI上的实验结果表明,在构建决策树的过程中,既提高了平均分类准确率,又降低了构建决策树的复杂度,从而还缩短了决策树的生成时间,当数据集中的样本数较大时,优化后的ID3算法的效率得到了明显的提高。

关 键 词:ID3算法  相关系数  决策树  泰勒公式  信息增益
收稿时间:2014-08-25
修稿时间:2015-09-25

An optimized algorithm of decision tree based on correlation coefficients
DONG Yue hua,LIU Li. An optimized algorithm of decision tree based on correlation coefficients[J]. Computer Engineering & Science, 2015, 37(9): 1783-1793
Authors:DONG Yue hua  LIU Li
Affiliation:(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
Abstract:Aiming at the problem of multi value bias in ID3 algorithm, we propose an optimized algorithm of decision tree based on correlation coefficients. Firstly, the correlation coefficients between the attributes are introduced to improve the ID3 algorithm, and in turn the multi value bias problem is overcome. Then the properties of Taylor formula and Maclaurin formula are adopted to simplify the information gain formula. The concrete data of examples prove that the optimized ID3 algorithm can overcome multi value bias problem. Experiments on the standard UCI data sets show that the optimized algorithm of decision tree not only improves the accuracy of average classification, but also reduces the complexity in building decision trees and thus reduces the generation time of decision trees. Besides, the efficiency of the optimized ID3 algorithm increases significantly for large scale samples.
Keywords:ID3 algorithm  correlation coefficient  decision tree  Taylor formula  information gain,
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号