首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 390 毫秒
1.
吕兵  王华珍 《计算机应用》2014,34(6):1613-1617
目前对高维数据进行挖掘的方法大多是基于数学理论而非可视化的直觉。为便于直观分析和评价高维数据,提出引入随机森林(RF)方法对高维数据进行数据可视化。首先,采用RF进行有监督学习得到样本间的相似度度量,并采用主坐标分析法对其进行降维,将高维数据的关系信息变换到低维空间;然后,在低维空间中采用散点图进行可视化。在高维基因数据集上实验结果表明,基于RF有监督降维的可视化能够较好地展现高维数据的类分布规律,且优于传统的无监督降维后的可视化效果。  相似文献   

2.
基于PCA和平行坐标的高维数据可视化   总被引:1,自引:0,他引:1       下载免费PDF全文
将平行坐标用于高维数据的可视化时,如果要展示的数据维太多,会发生可视化混乱。针对上述问题,提出一种结合主成分分析(PCA)和平行坐标的数据可视化方法PPCP。利用PCA方法对高维数据进行有效的降维处理,将降维后的数据进行平行坐标可视化展示。实验结果证明,该方法能有效地揭示高维数据之间的关系。  相似文献   

3.
基于非线性降维算法的容特征映射与径向基神经网络的快速性,提出了基于Isom ap与径向基(RBF)神经网络的图像识别方法,降维方法用测地距离取代传统的欧式距离,有助于挖掘高维数据的内在结构,径向基神经网络能够快速模拟对象数据集,识别真假图像。同时该方法结合了频谱分析对初始图像进行预处理,减少了计算量。实验结果表明该方法能快速识别真假图像,提高识别率。  相似文献   

4.
现有的径向布局可视化方法无法有效捕获高维数据的非线性结构.因此,文中提出基于维度扩展和重排的类圆映射可视化聚类方法.利用近邻传播聚类算法和多目标聚类可视化评价指标对高维数据进行维度扩展,然后对扩展后的高维数据进行维度相关性重排,最后利用类圆映射机制降维至二维可视化空间,实现高维数据有效可视化聚类.实验表明,文中提出的维度扩展和重排策略能有效提高类圆映射可视化方法聚类效果,其中的维度扩展策略也能显著提高其它径向布局可视化方法聚类效果,泛化性能较好.此外,相比同类方法,文中方法在可视化聚类准确度、拓扑保持、Dunn指数及效果上优势明显  相似文献   

5.
基于深度特征与非线性降维的图像数据集可视化方法   总被引:1,自引:0,他引:1  
为了降低传统高维图像数据降维可视化带来的损失,提高数据可视化的效果,提出了一种基于深度特征与非线性降维相结合的图像数据集可视化方法。该方法首先设计并训练了一个卷积神经网络模型,模型在MNIST手写体图像数据集上,取得了单模型最高的识别精度。其次,利用该高精度模型抽取图像数据的深度中间层特征,将该深度特征作为图像数据的有效表示。最后针对深度特征使用非线性降维方法将数据最终降低为二维,实现数据可视化。实验结果表明,该方法能够有效降低传统图像降维可视化方法中降维损失所带来的误差,可视化效果十分明显。  相似文献   

6.
秦彩云 《计算机系统应用》2011,20(6):196-199,168
高维且不独立的样本特征集使分类的质量降低,提出特征权值计算方法,并用于特征加权及特征选择,根据特征的相似性度量函数计算特征的权重,并根据权重排序去除重要性差的特征,用于解决高维样本集的特征降维问题,特征选择结果与主成份分析结果一致。并建立基于保留特征加权的云分类模型,应用于iris数据集和复杂矿石图像的分类,效果良好。  相似文献   

7.
在模式分类问题中,利用Fisher准则及K-L变换将样本数据从高维特征空间映射到低维特征空间以提取特征;而SVM(支持向量机)引进核函数隐含的映射把低维特征空间中的样本数据映射到高维特征空间来实现分类。文章利用三种方法对鸢尾属植物数据集的分类进行仿真试验,并对仿真结果进行分析比较,给出了三种方法在模式分类应用中的异同以及他们之间的内在联系和区别。  相似文献   

8.
针对网络安全领域处理海量高维数据时遇到的认知困难,提出一种基于改进的正2k边形坐标的可视化方法,利用可视化效果评价指标和相关系数对维度进行重排,优化了高维数据在降维映射后的显示效果。实验结果表明,该方法保持了易于发现维度分布信息的优点,并解决了散点图数据混乱和呈现信息单一等缺点,能够更加有效地发现聚类信息,为管理员进一步分析和挖掘威胁提供了便利。  相似文献   

9.
高维数据可视化分析是数据分析与可视化领域的研究热点,传统的降维方法得到的低维空间往往难以解释,不利于人们对高维数据的可视化分析与探索。提出一种新的可视化解释器(Explainer)方法,将L1稀疏正则化特征选取引入到高维数据的可视化处理过程中,建立起高层语义标签与少量的关键特征之间的联系。通过可视化设计与实验验证了该方法可以有效改善高维数据的可视化分析性能。  相似文献   

10.
高维心电图数据存在大量不相关特征,基于监督机器学习技术很难同时获得较高敏感性与特异性。在预处理操作心电图数据,如校准基线漂移、去除高频噪声和拟合多项式特征的基础上,提出一种基于监督多元对应分析(MCA)降维技术的分类模型自动分类心跳。该方法离散化连续心电图数据为类属数据,并发展有监督MCA降维技术提取心电图数据关键特征,用各种分类算法自动分类心电图心跳数据。在PTB诊断数据库的心电图数据集上测试结果表明,与几种基于监督机器学习分类技术相比,在监督MCA降维框架中各种分类算法能以较高敏感性和特异性自动分类心电图心跳数据。  相似文献   

11.
High‐dimensional data visualization is receiving increasing interest because of the growing abundance of high‐dimensional datasets. To understand such datasets, visualization of the structures present in the data, such as clusters, can be an invaluable tool. Structures may be present in the full high‐dimensional space, as well as in its subspaces. Two widely used methods to visualize high‐dimensional data are the scatter plot matrix (SPM) and the parallel coordinate plot (PCP). SPM allows a quick overview of the structures present in pairwise combinations of dimensions. On the other hand, PCP has the potential to visualize not only bi‐dimensional structures but also higher dimensional ones. A problem with SPM is that it suffers from crowding and clutter which makes interpretation hard. Approaches to reduce clutter are available in the literature, based on changing the order of the dimensions. However, usually this reordering has a high computational complexity. For effective visualization of high‐dimensional structures, also PCP requires a proper ordering of the dimensions. In this paper, we propose methods for reordering dimensions in PCP in such a way that high‐dimensional structures (if present) become easier to perceive. We also present a method for dimension reordering in SPM which yields results that are comparable to those of existing approaches, but at a much lower computational cost. Our approach is based on finding relevant subspaces for clustering using a quality criterion and cluster information. The quality computation and cluster detection are done in image space, using connected morphological operators. We demonstrate the potential of our approach for synthetic and astronomical datasets, and show that our method compares favorably with a number of existing approaches.  相似文献   

12.
Visualization methods could significantly improve the outcome of automated knowledge discovery systems by involving human judgment. Star coordinate is a visualization technique that maps k-dimensional data onto a circle using a set of axes sharing the same origin at the center of the circle. It provides the users with the ability to adjust this mapping, through scaling and rotating of the axes, until no mapped point-clouds (clusters) overlap one another. In this state, similar groups of data are easily detectable. However an effective adjustment could be a difficult or even an impossible task for the user in high dimensions. This is specially the case when the input space dimension is about 50 or more.In this paper, we propose a novel method toward automatic axes adjustment for high dimensional data in Star Coordinate visualization method. This method finds the best two-dimensional view point that minimizes intra-cluster distances while keeping the inter-cluster distances as large as possible by using label information. We call this view point a discernible visualization, where clusters are easily detectable by human eye. The label information could be provided by the user or could be the result of performing a conventional clustering method over the input data. The proposed approach optimizes the Star Coordinate representation by formulating the problem as a maximization of a Fisher discriminant. Therefore the problem has a unique global solution and polynomial time complexity. We also prove that manipulating the scaling factor alone is effective enough for creating any given visualization mapping. Moreover it is showed that k-dimensional data visualization can be modeled as an eigenvalue problem. Using this approach, an optimal axes adjustment in the Star Coordinate method for high dimensional data can be achieved without any user intervention. The experimental results demonstrate the effectiveness of the proposed approach in terms of accuracy and performance.  相似文献   

13.
Researchers and analysts in modern industrial and academic environments are faced with a daunting amount of multi‐dimensional data. While there has been significant development in the areas of data mining and knowledge discovery, there is still the need for improved visualizations and generic solutions. The state‐of‐the‐art in visual analytics and exploratory data visualization is to incorporate more profound analysis methods while focusing on fast interactive abilities. The common trend in these scenarios is to either visualize an abstraction of the data set or to better utilize screen‐space. This paper presents a novel technique that combines clustering, dimension reduction and multi‐dimensional data representation to form a multivariate data visualization that incorporates both detail and overview. This amalgamation counters the individual drawbacks of common projection and multi‐dimensional data visualization techniques, namely ambiguity and clutter. A specific clustering criterion is used to decompose a multi‐dimensional data set into a hierarchical tree structure. This decomposition is embedded in a novel Dimensional Anchor visualization through the use of a weighted linear dimension reduction technique. The resulting Structural Decomposition Tree (SDT) provides not only an insight of the data set's inherent structure, but also conveys detailed coordinate value information. Further, fast and intuitive interaction techniques are explored in order to guide the user in highlighting, brushing, and filtering of the data.  相似文献   

14.
为了解决多维数据的维数过高、数据量过大带来的平行坐标可视化图形线条密集交叠以及数据规律特征不易获取的问题,提出基于主成分分析和K-means聚类的平行坐标(PCAKP,principal component analysis and k-means clustering parallel coordinate)可视化方法。该方法首先对多维数据采用主成分分析方法进行降维处理,其次对降维后的数据采用K-means聚类处理,最后对聚类得到的数据采用平行坐标可视化技术进行可视化展示。以统计局网站发布的数据为测试数据,对PCAKP可视化方法进行测试,与传统平行坐标可视化图形进行对比,验证了PCAKP可视化方法的实用性和有效性。  相似文献   

15.
近年来网络安全日志数据呈现出爆炸式的增长,但现有的可视化技术难以支持高维度、多粒度的Netflow日志实现完善的可视化分析.因此本文提出了一种全新的网络安全可视化框架设计方案,采用三维柱状图展示Netflow日志的流量时序图,以帮助用户快速了解和掌握网络中的异常时刻.引用信息熵算法针对平行坐标轴的维度数据进行处理,便于用户对多维度图形的理解,利用矩阵图、气泡图和流量时序图进行细节分析,最后利用该系统实现了对DDOS攻击和端口扫描攻击的网络异常案例分析.研究证明本系统丰富的可视化图形以及简单易用的协同交互,能较好的支撑网络安全人员从网络整体运行状态分析,到定位异常时刻、监测网络行为细节的全部过程.  相似文献   

16.
VizCluster and its Application on Classifying Gene Expression Data   总被引:1,自引:0,他引:1  
Visualization enables us to find structures, features, patterns, and relationships in a dataset by presenting the data in various graphical forms with possible interactions. A visualization can provide a qualitative overview of large and complex datasets, can summarize data, and can assist in identifying regions of interest and appropriate parameters focused on quantitative analysis. Recently, DNA microarray technology provides a broad snapshot of the state of the cell, by measuring the expression levels of thousands of genes simultaneously. Such information can thus be used to analyze different samples by gene expression profiles. It has already had a significant impact on the field of bioinformatics, requiring innovative techniques to efficiently and effectively extract, analyze, and visualize these fast growing data.In this paper, we present a dynamic interactive visualization environment, VizCluster, and its application on classifyinggene expression data. VizCluster takes advantage of graphical visualization methods to reveal underlining data patterns. It combines the merits of both high dimensional projection scatter-plot and parallel coordinate plot. In its core lies a nonlinear projection which maps the n-dimensional vectors onto two-dimensional points. To preserve the information at different scales and yet reduce the typical problem of parallel coordinate plots being messy caused by overlapping lines, a zip zooming viewing method is proposed. Integrated with other features, VizCluster is developed to give a simple, fast, intuitive, and yet powerful view of the data set. Its primary applications are on the classification of samples and evaluation of gene clusters for microarray datasets. Three gene expression datasets are used to illustrate the approach. We demonstrate that VizCluster approach is promising to be used for analyzing and visualizing microarray data sets and further development is worthwhile.  相似文献   

17.
基于近邻方法的高维数据可视化聚类发现   总被引:2,自引:0,他引:2  
提出了一种新颖的基于近邻方法的高维数据可经聚类方法,并实现了一个近邻可视化聚类发现系统VisNN。已有的解决高维数据可视化聚类方法主要是通过降维把维数据投影到二维或三维空间上,从而达到可视化目的。  相似文献   

18.
基于小波分析的时间序列数据挖掘模型   总被引:2,自引:0,他引:2  
论文提出一个基于小波分析的时间序列挖掘模型TSMiner,它支持时间序列数据挖掘的整个过程。该模型由5部分组成:原始数据的可视化、数据预处理、数据约简,模式发现和结果模式可视化。该模型应用小波实现数据的多层次可视化表示、数据约简和多尺度模式发现。它可以帮助用户观察高维数据,理解中间结果和解释发现的模式。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号