随机森林的可解释性可视分析方法研究 Research on Interpretable Visual Analysis Method of Random Forest期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

随机森林的可解释性可视分析方法研究

引用本文：	杨晔民,张慧军,张小龙. 随机森林的可解释性可视分析方法研究[J]. 计算机工程与应用, 2021, 57(6): 168-175. DOI: 10.3778/j.issn.1002-8331.1911-0185

作者姓名：	杨晔民张慧军张小龙

作者单位：	1.太原理工大学信息与计算机学院，山西晋中 0306002.山西传媒学院融媒技术学院，山西晋中 030619

摘要：	由于随机森林算法在很多情况下都以“黑盒”的方式存在，对于用户而言，参数调整，训练甚至最终构建的模型细节是隐蔽的，这导致了随机森林模型的可解释性非常差，在一定程度上阻碍了该模型在一些诸如医学诊断、司法、安全领域等需要透明化和可解释需求比较高的领域使用。影响该模型可解释性挑战主要来源于特征选择和数据的随机性。同时随机森林包含许多决策树，用户很难理解和比较所有决策树的结构和属性。为了解决上述问题，设计并实现了可视分析系统FORESTVis，该系统包括树视图、部分依赖图、t-SNE投影图、特征视图等多个交互式可视化组件，借助该系统，相关研究人员和从业人员可以直观地了解随机森林的基本结构和工作机制，并协助用户对模型的性能进行评估。使用Kaggle公开数据集上进行案例分析，验证了该方法的可行性和有效性。
关键词：	随机森林可视分析交互设计可解释机器学习
Research on Interpretable Visual Analysis Method of Random Forest

YANG Yemin,ZHANG Huijun,ZHANG Xiaolong. Research on Interpretable Visual Analysis Method of Random Forest[J]. Computer Engineering and Applications, 2021, 57(6): 168-175. DOI: 10.3778/j.issn.1002-8331.1911-0185

Authors:	YANG Yemin ZHANG Huijun ZHANG Xiaolong

Affiliation:	1.College of Information and Computer, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China2.College of Media Technology, Communication University of Shanxi, Jinzhong, Shanxi 030619, China

Abstract:	Random forests are typically applied in a black-box manner where the details of parameters tuning, training and even the final constructed model are hidden from the users in most cases. It leads to a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnostics, justice, and security to some extent. The interpretation challenges stem from the randomicity of feature selection and data. Furthermore, random forests contain many decision trees, it is difficult or even impossible for users to understand and compare the structures and properties of all decision trees. To tackle these issues, an interactive visual analytics system FORESTVis is designed, it includes tree view, partial dependence plots, t-SNE projection, feature view and other interactive visual components. The researchers and practitioners of the model can intuitively understand the basic structures and working mechanism of random forests and assist users in evaluating the performance of models through interactive exploration. Finally, a case study using the Kaggle public dataset shows that the method is feasible and effective.

Keywords:	random forests visual analysis interaction design interpretable machine learning
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏