首页 | 本学科首页   官方微博 | 高级检索  
     

基于增益得分的噪声发现算法
引用本文:孙微微,田绪红,刘才兴.基于增益得分的噪声发现算法[J].计算机工程与应用,2005,41(21):186-188,205.
作者姓名:孙微微  田绪红  刘才兴
作者单位:华南农业大学信息学院,广州,510642;华南农业大学信息学院,广州,510642;华南农业大学信息学院,广州,510642
基金项目:国家自然科学基金(编号:6037500)
摘    要:在现实数据集中不可避免地存在噪声,如何检测并去除噪声是数据挖掘中的一项重要研究内容。本文提出了一种基于增益的得分算法来检测噪声。为了检验该算法的有效性,以决策树为工具。在产生决策树之前,先用该算法去除训练集中的噪声,以免噪声导致决策树过大和过度拟合。对12个UCI数据集利用该算法去噪,再用C4.5生成决策树,实验结果表明,与不去噪时生成的决策树相比,改善了分类精度,且树尺寸明显减小。

关 键 词:机器学习  决策树  噪声
文章编号:1002-8331-(2005)21-0186-03

Scoring Based on Gain for Detecting Outlier
Sun Weiwei,Tian Xuhong,Liu Caixing.Scoring Based on Gain for Detecting Outlier[J].Computer Engineering and Applications,2005,41(21):186-188,205.
Authors:Sun Weiwei  Tian Xuhong  Liu Caixing
Abstract:Outliers exist in real dataset unavoidably,the detecting and eliminating of outlier is important in Data Mining.This paper proposes an algorithm of scoring based on gain for detecting outlier.For proving its availability,we use decision tree as a tool.Before constructing decision tree,we eliminate the outliers in train-set with this algorithm to avoid a larger tree and overfitting training set that are caused by outliers.After eliminating noises in 12 UCI datasets,we construct decision tree with C4.5,results show that precision is better,and tree size reduces markedly when compared with the original decision tree.
Keywords:machine learning  decision tree  outlier
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号