首页 | 本学科首页   官方微博 | 高级检索  
     

不一致数据上精确决策树生成算法
引用本文:王鹤澎,王宏志,李建中,高宏. 不一致数据上精确决策树生成算法[J]. 软件学报, 2017, 28(11): 2814-2824
作者姓名:王鹤澎  王宏志  李建中  高宏
作者单位:哈尔滨工业大学 计算机科学与技术系, 黑龙江 哈尔滨 150006,哈尔滨工业大学 计算机科学与技术系, 黑龙江 哈尔滨 150006,哈尔滨工业大学 计算机科学与技术系, 黑龙江 哈尔滨 150006,哈尔滨工业大学 计算机科学与技术系, 黑龙江 哈尔滨 150006
基金项目:国家自然科学基金(U1509216,61472099);国家科技支撑计划(2015BAH10F01)
摘    要:近年来,随着现实生活中数据量的不断增大,不一致数据的出现也越发频繁,这使得人工修正不一致数据变得更加耗时.而且,人工修正数据方法本身也存在着不可避免的人为操作错误,因此,这种修正方法不再可行.如何不提前修复不一致数据,直接在不一致数据上进行分类,是该文的核心研究内容.对决策树生成算法的目标函数进行改进,使其能够直接对不一致数据进行分类,并得到较好的分类结果.对约束条件中的特征对分类结果的影响进行了多方面衡量,从而调整该特征的影响因子,使得决策树的节点分割更加精确,分类效果更优.

关 键 词:不一致数据  决策树  分类  海量数据
收稿时间:2017-04-15
修稿时间:2017-06-16

Algorithms for Accurate Decision Tree Generation on Inconsistent Data
WANG He-Peng,WANG Hong-Zhi,LI Jian-Zhong and GAO Hong. Algorithms for Accurate Decision Tree Generation on Inconsistent Data[J]. Journal of Software, 2017, 28(11): 2814-2824
Authors:WANG He-Peng  WANG Hong-Zhi  LI Jian-Zhong  GAO Hong
Affiliation:Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150006, China,Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150006, China,Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150006, China and Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150006, China
Abstract:In recent years, with the increasing amount of data in real life, inconsistent data becomes more frequent. This makes manual correction of inconsistent data more time-consuming. Moreover, manual correction prone to human errors. Thus, such correction method is no longer feasible. How to perform classification directly on inconsistent data without correcting data beforehand is the core research content of this paper. In this paper, the objective function of the decision tree generation algorithm is improved so that it can directly classify inconsistent data and achieve better results. Multidimensional measures of the influence of the feature are used on classification results to adjust the influence factor of the feature so that nodes of the decision tree can be split more accurate to achieve more effective classification results.
Keywords:inconsistent data  decision tree  classification  massive data
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号