首页 | 本学科首页   官方微博 | 高级检索  
     

随机森林的可解释性可视分析方法研究
引用本文:杨晔民,张慧军,张小龙. 随机森林的可解释性可视分析方法研究[J]. 计算机工程与应用, 2021, 57(6): 168-175. DOI: 10.3778/j.issn.1002-8331.1911-0185
作者姓名:杨晔民  张慧军  张小龙
作者单位:1.太原理工大学 信息与计算机学院,山西 晋中 0306002.山西传媒学院 融媒技术学院,山西 晋中 030619
摘    要:由于随机森林算法在很多情况下都以“黑盒”的方式存在,对于用户而言,参数调整,训练甚至最终构建的模型细节是隐蔽的,这导致了随机森林模型的可解释性非常差,在一定程度上阻碍了该模型在一些诸如医学诊断、司法、安全领域等需要透明化和可解释需求比较高的领域使用。影响该模型可解释性挑战主要来源于特征选择和数据的随机性。同时随机森林包含许多决策树,用户很难理解和比较所有决策树的结构和属性。为了解决上述问题,设计并实现了可视分析系统FORESTVis,该系统包括树视图、部分依赖图、t-SNE投影图、特征视图等多个交互式可视化组件,借助该系统,相关研究人员和从业人员可以直观地了解随机森林的基本结构和工作机制,并协助用户对模型的性能进行评估。使用Kaggle公开数据集上进行案例分析,验证了该方法的可行性和有效性。

关 键 词:随机森林  可视分析  交互设计  可解释机器学习  

Research on Interpretable Visual Analysis Method of Random Forest
YANG Yemin,ZHANG Huijun,ZHANG Xiaolong. Research on Interpretable Visual Analysis Method of Random Forest[J]. Computer Engineering and Applications, 2021, 57(6): 168-175. DOI: 10.3778/j.issn.1002-8331.1911-0185
Authors:YANG Yemin  ZHANG Huijun  ZHANG Xiaolong
Affiliation:1.College of Information and Computer, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China2.College of Media Technology, Communication University of Shanxi, Jinzhong, Shanxi 030619, China
Abstract:Random forests are typically applied in a black-box manner where the details of parameters tuning, training and even the final constructed model are hidden from the users in most cases. It leads to a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnostics, justice, and security to some extent. The interpretation challenges stem from the randomicity of feature selection and data. Furthermore, random forests contain many decision trees, it is difficult or even impossible for users to understand and compare the structures and properties of all decision trees. To tackle these issues, an interactive visual analytics system FORESTVis is designed, it includes tree view, partial dependence plots, t-SNE projection, feature view and other interactive visual components. The researchers and practitioners of the model can intuitively understand the basic structures and working mechanism of random forests and assist users in evaluating the performance of models through interactive exploration. Finally, a case study using the Kaggle public dataset shows that the method is feasible and effective.
Keywords:random forests  visual analysis  interaction design  interpretable machine learning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号