首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 956 毫秒
1.
目的 平行坐标是经典的多维数据可视化方法,但在用于地理空间多维数据分析时,往往存在空间位置信息缺失和空间关联分析不确定等问题。对此,本文设计了一种有效关联平行坐标和地图的地理空间多维数据可视分析方法。方法 根据多维属性信息对地理空间位置进行聚类分析,引入Voronoi图和颜色明暗映射对地理空间各类区域进行显著标识,利用平行坐标呈现地理空间多维属性信息,引入互信息度量地理空间聚类与属性类别的相关性,动态地确定平行坐标轴排列顺序,进一步计算属性轴与地图之间数据线的绑定位置,对数据线的布局进行优化处理,降低地图与平行坐标系间数据线分布的紊乱程度。结果 有效集成上述可视化设计及数据分析方法,设计与实现一种基于平行坐标轴动态排列的地理空间多维数据可视化分析系统,提供便捷的用户交互模式,通过2组具有明显地理空间多维属性特征的数据进行测试,验证了本文可视分析方法的有效性和实用性。结论 本文提出的可视分析方法和工具可以帮助用户快速分析地理空间多维属性存在的空间分布特征及其关联模式,为地理空间多维数据的探索提供了有效手段。  相似文献   

2.
Existing network visualizations support hierarchical exploration, which rely on user interactions to create and modify graph hierarchies based on the patterns in the data attributes. It will take a relatively long time for users to identify the impact of different attributes on the cluster structure. To address this problem, this paper proposes a visual analytical approach, called HybridVis, creating an interactive layout to reveal clusters of obvious characteristics on one or more attributes at different scales. HybridVis can help people gain social insight and better understand the roles of attributes within a cluster. First, an approximate optimal graph hierarchy based on an energy model is created, considering both data attributes and relationships among data items. Then a layout algorithm and a level-dependent perceptual view for multi-scale graphs are proposed to show the attribute-driven graph hierarchy. Several views, which interact with each other, are designed in HybridVis, including a graphical view of the relationships among clusters; a cluster tree revealing the cluster scales and the details of attributes on parallel coordinates augmented with histograms and interactions. From the meaningful and globally approximate optimal abstraction, users can navigate a large multivariate graph with an overview+detail to explore and rapidly find the potential correlations between the graph structure and the attributes of data items. Finally, experiments using two real world data sets are performed to demonstrate the effectiveness of our methods.  相似文献   

3.
This paper presents a method of the unsupervised discovery of valid clusters using statistics on the modes of the probability density function in scale space. First, a Gaussian scale-space theory is applied to the kernel density estimation to derive the hierarchical relationships among the modes of the probability density function in scale space. The data points are classified into clusters according to the mode hierarchy. Second, the algorithm of cluster discovery is presented. The valid clusters are discovered by testing whether each cluster is distinguishable from spurious clusters obtained from uniformly random points. The statistical hypothesis test for cluster discovery requires distribution forms of annihilation scales of the modes estimated from the uniformly random points. The distribution forms are experimentally shown to be unimodal. Finally, cluster discovery is demonstrated using synthetic data and benchmark data.  相似文献   

4.
Data sets resulting from physical simulations typically contain a multitude of physical variables. It is, therefore, desirable that visualization methods take into account the entire multi-field volume data rather than concentrating on one variable. We present a visualization approach based on surface extraction from multi-field particle volume data. The surfaces segment the data with respect to the underlying multi-variate function. Decisions on segmentation properties are based on the analysis of the multi-dimensional feature space. The feature space exploration is performed by an automated multi-dimensional hierarchical clustering method, whose resulting density clusters are shown in the form of density level sets in a 3D star coordinate layout. In the star coordinate layout, the user can select clusters of interest. A selected cluster in feature space corresponds to a segmenting surface in object space. Based on the segmentation property induced by the cluster membership, we extract a surface from the volume data. Our driving applications are Smoothed Particle Hydrodynamics (SPH) simulations, where each particle carries multiple properties. The data sets are given in the form of unstructured point-based volume data. We directly extract our surfaces from such data without prior resampling or grid generation. The surface extraction computes individual points on the surface, which is supported by an efficient neighborhood computation. The extracted surface points are rendered using point-based rendering operations. Our approach combines methods in scientific visualization for object-space operations with methods in information visualization for feature-space operations.  相似文献   

5.
This work proposes a novel data clustering algorithm based on the potential field model, with a hierarchical optimization mechanism on the algorithm. There are two stages in this algorithm. Firstly, we build an edge-weighted tree based on the mutual distances between all data points and their hypothetical potential values derived from the data distribution. Using the tree structure, the dataset can be divided into an appropriate number of initial sub-clusters, with the cluster centers close to the local minima of the potential field. Then the sub-clusters are further merged according to the well-designed merging criteria by analyzing their border potential values and the cluster average potential values. The proposed clustering algorithm follows a hierarchical clustering mechanism, and aims to optimize the initial sub-cluster results in the first stage. The algorithm takes advantage of the cluster merging criteria to merge the sub-clusters, so it can automatically stop the clustering process without designating the number of clusters in advance. The experimental results show that the proposed algorithm produces the most satisfactory clustering results in most cases compared with other existing methods, and can effectively identify the data clusters with arbitrary shape, size and density.  相似文献   

6.
基于密度的聚类算法作为数据挖掘方法中的一种主要方法,不仅可以从数据集中发现任意形状的簇,而且可以观察到一个并发的、完整的聚类结构,以及具有对噪声数据不敏感的特点。针对目前常用的几种基于密度的聚类算法及改进算法进行讨论,分析了这些密度聚类算法各自的优缺点,并且以地理信息系统为应用研究背景,提出了基于密度的聚类算法与GIS相结合,通过对多维数据属性特征的提取,扩展到多维数据的处理,在三维空间地形数据中的分析中取得了高效的聚类结果。  相似文献   

7.
We introduce a method for organizing multivariate displays and for guiding interactive exploration through high-dimensional data. The method is based on nine characterizations of the 2D distributions of orthogonal pairwise projections on a set of points in multidimensional Euclidean space. These characterizations include such measures as density, skewness, shape, outliers, and texture. Statistical analysis of these measures leads to ways for 1) organizing 2D scatterplots of points for coherent viewing, 2) locating unusual (outlying) marginal 2D distributions of points for anomaly detection and 3) sorting multivariate displays based on high-dimensional data, such as trees, parallel coordinates, and glyphs  相似文献   

8.
逄琳  刘方爱 《计算机应用》2016,36(6):1634-1638
针对传统的聚类算法对数据集反复聚类,且在大型数据集上计算效率欠佳的问题,提出一种基于层次划分的最佳聚类数和初始聚类中心确定算法——基于层次划分密度的聚类优化(CODHD)。该算法基于层次划分,对计算过程进行研究,不需要对数据集进行反复聚类。首先,扫描数据集获得所有聚类特征的统计值;其次,自底向上地生成不同层次的数据划分,计算每个划分数据点的密度,将最大密度点定为中心点,计算中心点距离更高密度点的最小距离,以中心点密度与最小距离乘积之和的平均值为有效性指标,增量地构建一条关于不同层次划分的聚类质量曲线;最后,根据曲线的极值点对应的划分估计最佳聚类数和初始聚类中心。实验结果表明,所提CODHD算法与预处理阶段的聚类优化(COPS)算法相比,聚类准确度提高了30%,聚类算法效率至少提高14.24%。所提算法具有较强的可行性和实用性。  相似文献   

9.
Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach—a cluster tree to determine such cluster structure and understand hidden information present in data sets of nested clusters or clusters of multi-density. We embed the agglomerative k-means algorithm in the generation of cluster tree to detect such clusters. Experimental results on both synthetic data sets and real data sets are presented to illustrate the effectiveness of the proposed method. Compared with some existing clustering algorithms (DBSCAN, X-means, BIRCH, CURE, NBC, OPTICS, Neural Gas, Tree-SOM, EnDBSAN and LDBSCAN), our proposed cluster tree approach performs better than these methods.  相似文献   

10.
Illustrative parallel coordinates (IPC) is a suite of artistic rendering techniques for augmenting and improving parallel coordinate (PC) visualizations. IPC techniques can be used to convey a large amount of information about a multidimensional dataset in a small area of the screen through the following approaches: (a) edge‐bundling through splines; (b) visualization of “branched ” clusters to reveal the distribution of the data; (c) opacity‐based hints to show cluster density; (d) opacity and shading effects to illustrate local line density on the parallel axes; and (e) silhouettes, shadows and halos to help the eye distinguish between overlapping clusters. Thus, the primary goal of this work is to convey as much information as possible in a manner that is aesthetically pleasing and easy to understand for non‐experts.  相似文献   

11.
Multidimensional projection‐based visualization methods typically rely on clustering and attribute selection mechanisms to enable visual analysis of multidimensional data. Clustering is often employed to group similar instances according to their distance in the visual space. However, considering only distances in the visual space may be misleading due to projection errors as well as the lack of guarantees to ensure that distinct clusters contain instances with different content. Identifying clusters made up of a few elements is also an issue for most clustering methods. In this work we propose a novel multidimensional projection‐based visualization technique that relies on representative instances to define clusters in the visual space. Representative instances are selected by a deterministic sampling scheme derived from matrix decomposition, which is sensitive to the variability of data while still been able to handle classes with a small number of instances. Moreover, the sampling mechanism can easily be adapted to select relevant attributes from each cluster. Therefore, our methodology unifies sampling, clustering, and feature selection in a simple framework. A comprehensive set of experiments validate our methodology, showing it outperforms most existing sampling and feature selection techniques. A case study shows the effectiveness of the proposed methodology as a visual data analysis tool.  相似文献   

12.
13.
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis.  相似文献   

14.
针对传统最小生成树聚类算法需要事先知道聚类数目和使用静态全局分类依据,导致聚类密度相差较大时,算法有效性下降,计算复杂度大等问题,提出一种改进的最小生成树自适应分层聚类算法,根据最近邻关系,自动为每个聚类簇设定独立的阈值,使之适应分布密度相差较大的情况,并能自动确定聚类数目。实验表明,算法具有较好的性能,尤其对数据密度分布不均匀的情况也能得到较好的聚类结果。  相似文献   

15.
This paper is concerned with a stepwise mode of objective function-based fuzzy clustering. A revealed structure in data becomes refined in a successive manner by starting with the most dominant relationships and proceeding with its more detailed characterization. Technically, the proposed process develops a so-called hierarchy of clusters. Given the underlying clustering mechanism of the fuzzy C means (FCM), the produced architecture is referred to as a hierarchical FCM or hierarchical FCM tree (HFCM tree). We discuss the design of the tree demonstrating how its growth is guided by a certain mapping criterion. It is also shown how a structure at the higher level is effectively used to build clusters at the consecutive level by making use of the conditional FCM. Detailed investigations of computational complexity contrast a stepwise development of clusters with a single-step clustering completed for the equivalent number of clusters occurring in total at all final nodes of the HFCM tree. The analysis quantifies a significant reduction of the stepwise refinement of the clusters. Experimental studies include synthetic data as well as those coming from the machine learning repository.  相似文献   

16.
This paper describes the location of 3D objects in either depth or intensity data using parallel pose clustering. A leader-based partitional algorithm is used that allows the number of clusters to be selected on the basis of the input data, which is important because the number of pose clusters cannot usually be determined in advance. In comparison with previous work, no assumptions are made about the number or distribution of data patterns, or that the processor topology should be matched to this distribution. After overcoming a parallel bottleneck, we show that our approach exhibits superlinear speedup, since the overall computation is reduced in the parallel system. Isolated pose estimates may be eliminated from the cluster space after an initial stage, which may be done with low probability of missing a true cluster. The algorithm has been tested using real and synthetic data on a transputer-based MIMD architecture.  相似文献   

17.
In a graph theory model, clustering is the process of division of vertices into groups, with a higher density of edges within groups than between them. In this paper, we introduce a new clustering method for detecting such groups and use it to analyse some classic social networks. The new method has two distinguished features: non-binary hierarchical tree and the feature of overlapping clustering. A non-binary hierarchical tree is much smaller than the binary-trees constructed by most traditional methods and, therefore, it clearly highlights meaningful clusters which significantly reduces further manual efforts for cluster selections. The present method is tested by several bench mark data sets for which the community structure was known beforehand and the results indicate that it is a sensitive and accurate method for extracting community structure from social networks.  相似文献   

18.
现今如何在大数据库中找到有用的数据类型已成为一个研究热点,而对数据库中分类簇的识别是该领域广泛研究的一个问题。提出一种分层自适应快速k-means(Hierarchical Adaptive Fast K-means,HAFKM)算法对图像数据库分类聚簇。HAFKM根据提出的分层策略构建一棵非平衡聚类树,通过自适应的方法CEC(Cluster Evaluation Criterion)确定了除根节点外的每棵子树的分支数目,而在聚类树的每一层聚类中使用一种提出的判别函数(the cost-function)在颜色直方图上根据颜色等级直接聚类,从而可以在整棵树上快速聚类。实验表明,HAFKM通过在非平衡树上逐层聚类,并且通过CEC准确判断聚类数目,可以快速、高效的实现数据库的分类聚簇。  相似文献   

19.
针对现有层次聚类算法难以处理不完备数据集,同时考虑样本与类簇之间的不确定关系,提出一种面向不完备数据的集对粒层次聚类算法-SPGCURE.首先,采用集对信息粒的知识对缺失值进行处理,不同于以往算法中将缺失属性删除或者填充,用集对联系度中的差异度来表示缺失属性值,提出一种改进的集对信息距离度量方法,用于考量不完备数据样本间的紧密程度;其次,基于改进后的集对距离度量,给出各个类簇的类内平均距离的定义,形成以正同域Cs(样本一定属于类簇)、边界域Cu(样本可能属于类簇)和负反域Co(样本不属于类簇)表示的集对粒层次聚类;SPGCURE算法在完备和不完备数据都适用,最后,选用5个经典的UCI数据集,与常用的经典及改进聚类算法进行实验评价,结果表明,SPGCURE算法在准确度、F-measure、调整兰德系数和标准互信息等指标上均具有不错的聚类性能.  相似文献   

20.
针对密度峰值算法在选取聚类中心时的时间复杂度过高,需要人工选择截断距离并且处理流形数据时有可能出现多个密度峰值,导致聚类准确率下降等问题,提出一种新的密度峰值聚类算法,从聚类中心选择、离群点筛选、数据点分配三方面进行讨论和分析,并给出相应的聚类算法。在聚类中心的选择上采取KNN的思想计算数据点的密度,离群点的筛选和剪枝以及数据点分配则利用Voronoi图的性质,结合数据点的分布特征进行处理,并在最后应用层次聚类的思想以合并相似类簇,提高聚类准确率。实验结果表明:所提算法与实验对比算法相比较,具有较好的聚类效果和准确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号