首页 | 本学科首页   官方微博 | 高级检索  
     

具有容噪特性的C4.5算法改进
引用本文:王伟,李磊,张志鸿.具有容噪特性的C4.5算法改进[J].计算机科学,2015,42(12):268-271, 287.
作者姓名:王伟  李磊  张志鸿
作者单位:郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001
基金项目:本文受河南省烟草专卖局科学研究与技术开发项目(HYKJM201335)资助
摘    要:针对有噪声的高维数据引起决策树预测准确率下降的问题,利用容噪主成分分析(Noise-free Principal Component Anlysis,NFPCA)算法思想对C4.5算法改进而形成NFPCA-in-C4.5算法。该算法一方面将高维数据噪声控制问题转化为拟合数据特征与控制平滑度相结合的最优化问题,从而获得主成分空间;另一方面在决策树自顶向下构建新节点的过程中,再将主成分空间恢复到原始数据空间来避免降维过程中属性特征信息永久消失。实验结果表明NFPCA-in-C4.5算法兼具降维和容噪功能,避免了降维中由特征信息损失和噪声残留造成的预测模型准确率大幅降低的问题。

关 键 词:高维数据噪声  容噪  主成分分析  C4.5算法
收稿时间:2014/12/22 0:00:00
修稿时间:2015/3/13 0:00:00

Improvement of C4.5 Algorithm with Free Noise Capacity
WANG Wei,LI Lei and ZHANG Zhi-hong.Improvement of C4.5 Algorithm with Free Noise Capacity[J].Computer Science,2015,42(12):268-271, 287.
Authors:WANG Wei  LI Lei and ZHANG Zhi-hong
Affiliation:School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China,School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China and School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China
Abstract:Against the decline of decision tree prediction accuracy rate for high-dimensional data with noise,this paper used the theory of noise-free principal component analysis(NFPCA) algorithm to improve C4.5 algorithm,forming the NFPCA-in-C4.5 algorithms.On one hand,the algorithm transforms the noise suppression problem into an optimization problem of a combination of fitting the data and controling the smoothness,getting the space of principal components.On the other hand,it lets the space of principal components back to the space of original data during the process of building a new node in the decision tree from the top to down,to avoid the loss of characteristic information permanently in the dimension reduction process.Experimental results show that NFPCA-in-C4.5 algorithm has effects of dimensionality reduction and noise reduction,and avoids significant reduction of prediction accuracy rate,which is caused by the loss of information and noise.
Keywords:High-dimensional data with noise  Noise tolerance  Principal component analysis  C4  5 algorithm
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号