首页 | 本学科首页   官方微博 | 高级检索  
     

基于随机森林的高维数据可视化
引用本文:吕 兵,王华珍.基于随机森林的高维数据可视化[J].计算机应用,2014,34(6):1613-1617.
作者姓名:吕 兵  王华珍
作者单位:华侨大学 计算机科学与技术学院,福建 厦门 361021
基金项目:福建省自然科学基金资助项目;华侨大学高层次人才科研启动基金资助项目
摘    要:目前对高维数据进行挖掘的方法大多是基于数学理论而非可视化的直觉。为便于直观分析和评价高维数据,提出引入随机森林(RF)方法对高维数据进行数据可视化。首先,采用RF进行有监督学习得到样本间的相似度度量,并采用主坐标分析法对其进行降维,将高维数据的关系信息变换到低维空间;然后,在低维空间中采用散点图进行可视化。在高维基因数据集上实验结果表明,基于RF有监督降维的可视化能够较好地展现高维数据的类分布规律,且优于传统的无监督降维后的可视化效果。

关 键 词:可视化  随机森林  有监督降维  坐标放缩  散点图
收稿时间:2013-12-23
修稿时间:2014-02-06

High-dimensional data visualization based on random forest
LYV Bing WANG Huazhen.High-dimensional data visualization based on random forest[J].journal of Computer Applications,2014,34(6):1613-1617.
Authors:LYV Bing WANG Huazhen
Affiliation:College of Computer Science and Technology, Huaqiao University, Xiamen Fujian 361021, China
Abstract:High-dimensional data mining methods are mostly based on the mathematical theory rather than visual intuition currently. To facilitate visual analysis and evaluation of high-dimensional data, Random Forest (RF) was introduced to visualize high-dimensional data. Firstly, RF applied supervised learning to get the proximity measurement from the source data and the principal coordinate analysis was used for dimension reduction, which transformed the high-dimensional data relationship into the low-dimensional space. Then scattering plots were used to visualize the data in low-dimensional space. The results of experiment on high-dimensional gene datasets show that visualization with supervised dimension-reduction based on RF can illustrate perfectly discrimination of class distribution and outperforms traditional unsupervised dimension-reduction.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号