首页 | 本学科首页   官方微博 | 高级检索  
     

结合增益率与堆叠自编码器的并行随机森林算法
引用本文:刘卫明,陈伟达,毛伊敏,陈志刚.结合增益率与堆叠自编码器的并行随机森林算法[J].计算机应用研究,2023,40(3):750-759+765.
作者姓名:刘卫明  陈伟达  毛伊敏  陈志刚
作者单位:江西理工大学 信息工程学院 江西 赣州,江西理工大学 信息工程学院 江西 赣州,江西理工大学 信息工程学院 江西 赣州,中南大学 计算机学院 湖南 长沙
基金项目:2020年度科技创新2030—“新一代人工智能”重大项目(2020AAA0109605);国家自然科学基金资助项目(41562019)
摘    要:针对大数据环境下随机森林算法存在冗余与不相关特征过多、特征子空间信息含量不足以及并行化效率低等问题,提出了结合增益率与堆叠自编码器的并行随机森林算法PRFGRSAE(parallel random forest algorithm combining gain ratio and stacked auto encoders)。首先,提出了结合非线性归一化增益率和堆叠自编码器的降维策略DRNGRSAE(dimension reduction combining nonlinear normalization gain ratio and stacked auto encoders),通过过滤特征集中的冗余和不相关特征,并利用堆叠自编码器提取特征,有效减少了冗余以及不相关特征数;其次,提出了结合拉丁超立方抽样与归一化相关度的子空间选择策略SSLF(subspace selection strategy combining Latin hypercube sampling and feature class correlation),通过对特征集进行多层划分抽样,形成空间表达度较高的特征子空...

关 键 词:大数据  MapReduce  并行随机森林  增益率  堆叠自编码器
收稿时间:2022/8/1 0:00:00
修稿时间:2023/2/8 0:00:00

Parallel random forest algorithm combining gain ratio and stacked auto encoders
LIU Weiming,CHEN Weid,Mao Yimin and CHEN Zhigang.Parallel random forest algorithm combining gain ratio and stacked auto encoders[J].Application Research of Computers,2023,40(3):750-759+765.
Authors:LIU Weiming  CHEN Weid  Mao Yimin and CHEN Zhigang
Affiliation:School of Information Engineering,Jiangxi University of Science Technology,,,
Abstract:In the big data environment, the random forest algorithm suffers from excessive redundancy and irrelevant features, the insufficient spatial information content of feature subspace, and low parallelization efficiency. To resolve these issues, this paper presented a parallel random forest algorithm combined with a gain ratio and stacked auto-encoders(PRFGRSAE). Firstly, this algorithm proposed a dimensionality reduction strategy combining nonlinear normalization gain ratio and stacked auto-encoder(DRNGRSAE), which filtered redundant and irrelevant features of the feature set and extracted features by stacked auto-encoders to reduce the number of redundant and irrelevant features effectively. Secondly, it proposed a subspace selection strategy SSLF that combined Latin hypercube sampling and normalized correlation degree, which formed feature subspaces with high spatial expression by performing multi-layer division sampling on the feature set, and ensured the feature subspace information content. Finally, it proposed a reducer allocation strategy DSVLA combining with variable action learning automata, which allocated each cluster to reducers for processing evenly and improved the parallelization efficiency effectively. Experimental results show that compared with IMRF, KSMRF, and GAPRF algorithms, the speedup ratio and accuracy of the PRFGRSAE algorithm are significantly improved. Therefore, the algorithm can obtain higher accuracy and parallel efficiency when applied to process large data, especially for data sets with more features.
Keywords:big data  MapReduce  parallel random forest  gain ratio  stacked autoencoder
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号