首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征约简的随机森林改进算法研究
引用本文:王诚,高蕊.基于特征约简的随机森林改进算法研究[J].计算机技术与发展,2020(3):40-45.
作者姓名:王诚  高蕊
作者单位:南京邮电大学通信与信息工程学院
基金项目:中国博士后科学基金(SBH18028)。
摘    要:随机森林(random forest,RF)算法虽应用广泛且分类准确度很高,但在面对特征维度高且不平衡的数据时,算法分类性能被严重削弱。高维数据通常包含大量的无关和冗余的特征,针对这个问题,结合权重排序和递归特征筛选的思想提出了一种改进的随机森林算法RW_RF(ReliefF&wrapper random forest)。首先引用ReliefF算法对数据集的所有特征按正负类分类能力赋予不同的权值,再递归地删除冗余的低权值特征,得到分类性能最佳的特征子集来构造随机森林;同时改进ReliefF的抽样方式,以减轻不平衡数据对分类模型的影响。实验结果显示,在特征数目很多的数据集中,改进算法的各评价指标均高于原算法,证明提出的RW_RF算法有效精简了特征子集,减轻了冗余特征对模型分类精度的影响,同时也证明了改进算法对处理不平衡数据起到了一定的效果。

关 键 词:随机森林  权重排序  特征约简  抽样方式  RW_RF算法

An Improved Random Forest Algorithm Based on Feature Reduction
WANG Cheng,GAO Rui.An Improved Random Forest Algorithm Based on Feature Reduction[J].Computer Technology and Development,2020(3):40-45.
Authors:WANG Cheng  GAO Rui
Affiliation:(School of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
Abstract:Although the Random Forest(RF)algorithm is widely used and highly accurate in the classification,its performance is severely weakened when faced with high and unbalanced features.High-dimensional data usually contains a large number of irrelevant and redundant features,so we propose an improved random forest algorithm RW_RF(ReliefF&wrapper random forest)based on the idea of weight sorting and recursive feature screening.Firstly different weights are assigned by ReliefF algorithm to all features according to the positive and negative classification ability,and then the redundant low-weight features are deleted recursively to obtain the feature subset with the best classification performance for the random forest construction.At the same time,the ReliefF sampling method is improved to mitigate the impact of unbalanced data on the classification model.The experiment shows that the evaluation indexes are improved as a whole,which proves that the proposed RW_RF algorithm effectively reduces the feature subset and the influence of redundant features on the classification accuracy of the model.It also proves that the improved algorithm is effective on processing unbalanced data.
Keywords:random forest  weight sorting  feature reduction  sampling method  RW_RF algorithm
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号