首页 | 本学科首页   官方微博 | 高级检索  
     

基于弱相关化特征子空间选择的离散化随机森林并行分类算法
引用本文:陈旻骋,袁景凌,王啸岩,朱赛.基于弱相关化特征子空间选择的离散化随机森林并行分类算法[J].计算机科学,2016,43(6):55-58, 90.
作者姓名:陈旻骋  袁景凌  王啸岩  朱赛
作者单位:武汉理工大学计算机科学与技术学院 武汉430070,武汉理工大学计算机科学与技术学院 武汉430070,武汉理工大学计算机科学与技术学院 武汉430070,武汉理工大学计算机科学与技术学院 武汉430070
基金项目:本文受国家自然科学基金(61303029),湖北省自然科学基金(2014CFB836),教育部留学回国人员科研启动基金([2012]1707)资助
摘    要:随着大数据时代的到来,数据信息呈几何倍数增长。传统的分类算法将面临着极大的挑战。为了提高分类算法的效率,提出了一种基于弱相关化特征子空间选择的离散化随机森林并行分类算法。该算法在数据预处理阶段对数据集中的连续属性进行离散化。在随机森林抽取特征子空间阶段,利用属性向量空间模型计算属性间的相关性,构造弱相关化特征子空间,使所构建的决策树之间相关性降低,从而提高随机森林的分类效果;并通过研究随机森林的并行化策略,结合MapReduce框架,改进并实现了随机森林模型构建过程的双重并行化,进一步改善了算法的计算效率。

关 键 词:随机森林  离散化  弱相关化特征子空间  并行分类
收稿时间:2015/7/13 0:00:00
修稿时间:9/1/2015 12:00:00 AM

Parallelization of Random Forest Algorithm Based on Discretization and Selection of Weak-correlation Feature Subspaces
CHEN Min-cheng,YUAN Jing-ling,WANG Xiao-yan and ZHU Sai.Parallelization of Random Forest Algorithm Based on Discretization and Selection of Weak-correlation Feature Subspaces[J].Computer Science,2016,43(6):55-58, 90.
Authors:CHEN Min-cheng  YUAN Jing-ling  WANG Xiao-yan and ZHU Sai
Affiliation:School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China,School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China,School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China and School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China
Abstract:With the coming of the big data age,data information is increasing exponentially at a dramatic rate.The traditional classification algorithm will encounter great challenges.In order to improve the efficiency of classification algorithm,this paper proposd a parallel random forest algorithm based on discretization and the selection of the weak-correlation feature subspaces.This algorithm discretizes continuous attributes in data pretreatment phase.At the step of the selection of feature subspaces for growing decision trees,we used vector space modal of attributes to calculate the correlation between attributes,and then constructed the weak-correlation feature subspaces.This algorithm not only reduces the correlation among decision trees,but also improves the classifying effect of the random forest.We also designed and realized a double parallel method for building random forest model based on the MapReduce framework.This strategy goes a step further with its own charity efforts.
Keywords:Random forest  Discretization  Weak-correlation feature subspaces  Parallel classification
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号