结合增益率与堆叠自编码器的并行随机森林算法 Parallel random forest algorithm combining gain ratio and stacked auto encoders期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合增益率与堆叠自编码器的并行随机森林算法

引用本文：	刘卫明,陈伟达,毛伊敏,陈志刚.结合增益率与堆叠自编码器的并行随机森林算法[J].计算机应用研究,2023,40(3):750-759+765.

作者姓名：	刘卫明陈伟达毛伊敏陈志刚

作者单位：	江西理工大学信息工程学院江西赣州,江西理工大学信息工程学院江西赣州,江西理工大学信息工程学院江西赣州,中南大学计算机学院湖南长沙

基金项目：	2020年度科技创新2030—“新一代人工智能”重大项目(2020AAA0109605);国家自然科学基金资助项目(41562019)

摘要：	针对大数据环境下随机森林算法存在冗余与不相关特征过多、特征子空间信息含量不足以及并行化效率低等问题，提出了结合增益率与堆叠自编码器的并行随机森林算法PRFGRSAE(parallel random forest algorithm combining gain ratio and stacked auto encoders)。首先，提出了结合非线性归一化增益率和堆叠自编码器的降维策略DRNGRSAE(dimension reduction combining nonlinear normalization gain ratio and stacked auto encoders),通过过滤特征集中的冗余和不相关特征，并利用堆叠自编码器提取特征，有效减少了冗余以及不相关特征数；其次，提出了结合拉丁超立方抽样与归一化相关度的子空间选择策略SSLF(subspace selection strategy combining Latin hypercube sampling and feature class correlation),通过对特征集进行多层划分抽样，形成空间表达度较高的特征子空...
关键词：	大数据 MapReduce 并行随机森林增益率堆叠自编码器
收稿时间：	2022/8/1 0:00:00
修稿时间：	2023/2/8 0:00:00
Parallel random forest algorithm combining gain ratio and stacked auto encoders

LIU Weiming,CHEN Weid,Mao Yimin and CHEN Zhigang.Parallel random forest algorithm combining gain ratio and stacked auto encoders[J].Application Research of Computers,2023,40(3):750-759+765.

Authors:	LIU Weiming CHEN Weid Mao Yimin and CHEN Zhigang

Affiliation:	School of Information Engineering,Jiangxi University of Science Technology,,,

Abstract:	In the big data environment, the random forest algorithm suffers from excessive redundancy and irrelevant features, the insufficient spatial information content of feature subspace, and low parallelization efficiency. To resolve these issues, this paper presented a parallel random forest algorithm combined with a gain ratio and stacked auto-encoders(PRFGRSAE). Firstly, this algorithm proposed a dimensionality reduction strategy combining nonlinear normalization gain ratio and stacked auto-encoder(DRNGRSAE), which filtered redundant and irrelevant features of the feature set and extracted features by stacked auto-encoders to reduce the number of redundant and irrelevant features effectively. Secondly, it proposed a subspace selection strategy SSLF that combined Latin hypercube sampling and normalized correlation degree, which formed feature subspaces with high spatial expression by performing multi-layer division sampling on the feature set, and ensured the feature subspace information content. Finally, it proposed a reducer allocation strategy DSVLA combining with variable action learning automata, which allocated each cluster to reducers for processing evenly and improved the parallelization efficiency effectively. Experimental results show that compared with IMRF, KSMRF, and GAPRF algorithms, the speedup ratio and accuracy of the PRFGRSAE algorithm are significantly improved. Therefore, the algorithm can obtain higher accuracy and parallel efficiency when applied to process large data, especially for data sets with more features.

Keywords:	big data MapReduce parallel random forest gain ratio stacked autoencoder

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏