首页 | 本学科首页   官方微博 | 高级检索  
     

高度不平衡数据的代价敏感随机森林分类算法
引用本文:平瑞,周水生,李冬. 高度不平衡数据的代价敏感随机森林分类算法[J]. 模式识别与人工智能, 2020, 33(3): 249-257. DOI: 10.16451/j.cnki.issn1003-6059.202003006
作者姓名:平瑞  周水生  李冬
作者单位:1.西安电子科技大学 数学与统计学院 西安 710126
基金项目:国家自然科学基金项目(No.61772020)资助。
摘    要:在处理高度不平衡数据时,代价敏感随机森林算法存在自助法采样导致小类样本学习不充分、大类样本占比较大、容易削弱代价敏感机制等问题.文中通过对大类样本聚类后,多次采用弱平衡准则对每个集群进行降采样,使选择的大类样本与原训练集的小类样本融合生成多个新的不平衡数据集,用于代价敏感决策树的训练.由此提出基于聚类的弱平衡代价敏感随机森林算法,不仅使小类样本得到充分学习,同时通过降低大类样本数量,保证代价敏感机制受其影响较小.实验表明,文中算法在处理高度不平衡数据集时性能较优.

关 键 词:不平衡数据  聚类采样  代价敏感学习  随机森林
收稿时间:2019-08-19

Cost Sensitive Random Forest Classification Algorithm for Highly Unbalanced Data
PING Rui,ZHOU Shuisheng,LI Dong. Cost Sensitive Random Forest Classification Algorithm for Highly Unbalanced Data[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(3): 249-257. DOI: 10.16451/j.cnki.issn1003-6059.202003006
Authors:PING Rui  ZHOU Shuisheng  LI Dong
Affiliation:1.School of Mathematics and Statistics, Xidian University, Xi'an 710126
Abstract:For highly unbalanced data,insufficient learning of minority class samples is caused by self-sampling method of the traditional cost sensitive random forest algorithm,and the cost sensitive mechanism of the algorithm is easily weakened by the large proportion of majority class samples.Therefore,a weak balance cost sensitive random forest algorithm based on clustering is proposed.After clustering the majority class samples,the weak balance criterion is used to reduce the samples of each cluster repeatedly.The selected majority class samples and the minority class samples of the original training set are fused to generate a number of new unbalanced datasets for the training of cost sensitive decision tree.The proposed algorithm not only enables the minority class samples to be fully learned,but also ensures that the cost sensitive mechanism is less affected by reducing the majority class samples.Experiment indicates the better performance of the proposed algorithm in processing highly unbalanced datasets.
Keywords:Imbalanced Data  Cluster Sampling  Cost Sensitive Learning  Random Forest
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号