首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于决策森林的单调分类方法
引用本文:许行, 王文剑, 任丽芳. 一种基于决策森林的单调分类方法[J]. 计算机研究与发展, 2017, 54(7): 1477-1487. DOI: 10.7544/issn1000-1239.2017.20160154
作者姓名:许行  王文剑  任丽芳
作者单位:1.1(山西大学计算机与信息技术学院 太原 030006);2.2(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006);3.3(山西财经大学应用数学学院 太原 030006) (xuh102@126.com)
基金项目:国家自然科学基金项目(61673249,61503229);山西省回国留学人员科研基金项目(2016-004);山西省研究生教育创新项目(2016BY003)
摘    要:单调分类问题是特征与类别之间带有单调性约束的有序分类问题.对于符号数据的单调分类问题已有较好的方法,但对于数值数据,现有的方法分类精度和运行效率有限.提出一种基于决策森林的单调分类方法(monotonic classification method based on decision forest, MCDF),设计采样策略来构造决策树,可以保持数据子集与原数据集分布一致,并通过样本权重避免非单调数据的影响,在保持较高分类精度的同时有效提高了运行效率,同时这种策略可以自动确定决策森林中决策树的个数.在决策森林进行分类时,给出了决策冲突时的解决方法.提出的方法既可以处理符号数据,也可以处理数值数据.在人造数据集、UCI及真实数据集上的实验数据表明:该方法可以提高单调分类性能和运行效率,缩短分类规则的长度,解决数据集规模较大的单调分类问题.

关 键 词:单调分类  决策树  单调一致性  决策森林  集成学习

A Method for Monotonic Classification Based on Decision Forest
Xu Hang, Wang Wenjian, Ren Lifang. A Method for Monotonic Classification Based on Decision Forest[J]. Journal of Computer Research and Development, 2017, 54(7): 1477-1487. DOI: 10.7544/issn1000-1239.2017.20160154
Authors:Xu Hang  Wang Wenjian  Ren Lifang
Affiliation:1.1(School of Computer and Information Technology, Shanxi University, Taiyuan 030006);2.2(Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006);3.3(School of Applied Mathematics, Shanxi University of Finance and Economics, Taiyuan 030006)
Abstract:Monotonic classification is an ordinal classification problem in which the monotonic constraint exists between features and class. There have been some methods which can deal with the monotonic classification problem on the nominal datasets well. But for the monotonic classification problems on the numeric datasets, the classification accuracies and running efficiencies of the existing methods are limited. In this paper, a monotonic classification method based on decision forest (MCDF) is proposed. A sampling strategy is designed to generate decision trees, which can make the sampled training data subsets having a consistent distribution with the original training dataset, and the influence of non-monotonic noise data is avoided by the sample weights. It can effectively improve the running efficiency while maintaining the high classification performance. In addition, this strategy can also determine the number of trees in decision forest automatically. A solution for the classification conflicts of different trees is also provided when the decision forest determines the class of a sample. The proposed method can deal with not only the nominal data, but also the numeric data. The experimental results on artificial, UCI and real datasets demonstrate that the proposed method can improve the monotonic classification performance and running efficiency, and reduce the length of classification rules and solve the monotonic classification problem on large datasets.
Keywords:monotonic classification  decision tree  monotonic consistency  decision forest  ensemble learning
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号