首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Spatial clustering analysis is an important issue that has been widely studied to extract the meaningful subgroups of geo-referenced data. Although many approaches have been developed in the literature, efficiently modeling the network constraint that objects (e.g. urban facility) are observed on or alongside a street network remains a challenging task for spatial clustering. Based on the techniques of mathematical morphology, this paper presents a new spatial clustering approach NMMSC designed for mining the grouping patterns of network-constrained point objects. NMMSC is essentially a hierarchical clustering approach, and it generally consists of two main steps: first, the original vector data is converted to raster data by utilizing basic linear unit of network as the pixel in network space; second, based on the specified 1-dimensional raster structure, an extended mathematical morphology operator (i.e. dilation) is iteratively performed to identify spatial point agglomerations with hierarchical structure snapped on a network. Compared to existing methods of network-constrained hierarchical clustering, our method is more efficient for cluster similarity computation with linear time complexity. The effectiveness and efficiency of our approach are verified through the experiments with real and synthetic data sets.  相似文献   

2.
随着信息技术的飞速发展和大数据时代的来临,数据呈现出高维性、非线性等复杂特征。对于高维数据来说,在全维空间上往往很难找到反映分布模式的特征区域,而大多数传统聚类算法仅对低维数据具有良好的扩展性。因此,传统聚类算法在处理高维数据的时候,产生的聚类结果可能无法满足现阶段的需求。而子空间聚类算法搜索存在于高维数据子空间中的簇,将数据的原始特征空间分为不同的特征子集,减少不相关特征的影响,保留原数据中的主要特征。通过子空间聚类方法可以发现高维数据中不易展现的信息,并通过可视化技术展现数据属性和维度的内在结构,为高维数据可视分析提供了有效手段。总结了近年来基于子空间聚类的高维数据可视分析方法研究进展,从基于特征选择、基于子空间探索、基于子空间聚类的3种不同方法进行阐述,并对其交互分析方法和应用进行分析,同时对高维数据可视分析方法的未来发展趋势进行了展望。  相似文献   

3.
数据聚类的可视分析方法利用可视化与交互技术帮助用户对聚类过程与结果进行 多角度分析,从而发现数据内部隐藏的结构和关系。但由于高维数据自身的“维度诅咒”问题 使得聚类分析面临着许多挑战,例如模型参数设定、数据特征捕捉、结果解释以及可视化展现 等。本文从高维数据聚类过程中遇到的问题出发,首先总结了高维数据聚类过程中常用的数据 处理方法并对其性能进行了比较,这些方法能够较好地解决“维度诅咒”问题,帮助用户挖掘 数据中存在的聚类模式。在分析和理解不同聚类结果中包含的数据内部结构和规律时,由于前 期采取的数据处理方法不同,因此需要采取不同的探索分析策略,所以本文将近10 年来高维数 据聚类的可视分析方法分为2 大类进行总结,即基于降维的聚类可视分析方法和基于子空间聚 类的可视分析方法。最后对该领域目前存在的机遇与挑战进行了讨论。  相似文献   

4.
A Multi-Resolution Content-Based Retrieval Approach for Geographic Images   总被引:7,自引:0,他引:7  
Current retrieval methods in geographic image databases use only pixel-by-pixel spectral information. Texture is an important property of geographical images that can improve retrieval effectiveness and efficiency. In this paper, we present a content-based retrieval approach that utilizes the texture features of geographical images. Various texture features are extracted using wavelet transforms. Based on the texture features, we design a hierarchical approach to cluster geographical images for effective and efficient retrieval, measuring distances between feature vectors in the feature space. Using wavelet-based multi-resolution decomposition, two different sets of texture features are formulated for clustering. For each feature set, different distance measurement techniques are designed and experimented for clustering images in a database. The experimental results demonstrate that the retrieval efficiency and effectiveness improve when our clustering approach is used.  相似文献   

5.
Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining and visualization to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a one-dimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is also amenable for processing together with aspatial variables by any existing (non-spatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two space-filling curves, six hierarchical clustering based methods, and a one-dimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the complete-linkage clustering consistently gives the best overall performance, surpassing well-known space-filling curves in preserving spatial locality. Moreover, clustering-based methods can encode not only simple geographic locations, e.g., x and y coordinates, but also a wide range of other spatial relations, e.g., network distances or arbitrarily weighted graphs.  相似文献   

6.
吴斐然  陈海东  黄劲  陈为 《软件学报》2014,25(S2):111-118
聚类是研究空间多变量数据的重要工具之一.但是自动聚类算法通常需要用户预设参数然后生成结果,缺乏一种有效的交互机制将用户介入到聚类的过程,使之动态改变参数并对结果进行调整和评估.为此提出一种面向空间多变量数据聚类的可视分析流程,首先运用自动聚类算法对原始三维空间进行聚类,针对三维空间不易交互的缺陷将数据点投影到二维平面进行交互选择和可视编码,设置多种视图使用户实时而全面地理解数据分布和模式,交互地修正聚类结果,并根据一些编码的统计信息来判断结果的合理性和正确性.整个流程是渐进式的,即用户通过迭代逐步细化结果,最终抽取兴趣域.案例分析表明,新的可视分析流程能够有效地提高空间自动聚类算法的精度,也极大地缩短了用户交互的时间.  相似文献   

7.
Spatial interactions (or flows), such as population migration and disease spread, naturally form a weighted location-to-location network (graph). Such geographically embedded networks (graphs) are usually very large. For example, the county-to-county migration data in the U.S. has thousands of counties and about a million migration paths. Moreover, many variables are associated with each flow, such as the number of migrants for different age groups, income levels, and occupations. It is a challenging task to visualize such data and discover network structures, multivariate relations, and their geographic patterns simultaneously. This paper addresses these challenges by developing an integrated interactive visualization framework that consists three coupled components: (1) a spatially constrained graph partitioning method that can construct a hierarchy of geographical regions (communities), where there are more flows or connections within regions than across regions; (2) a multivariate clustering and visualization method to detect and present multivariate patterns in the aggregated region-to-region flows; and (3) a highly interactive flow mapping component to map both flow and multivariate patterns in the geographic space, at different hierarchical levels. The proposed approach can process relatively large data sets and effectively discover and visualize major flow structures and multivariate relations at the same time. User interactions are supported to facilitate the understanding of both an overview and detailed patterns.  相似文献   

8.
现有的径向布局可视化方法无法有效捕获高维数据的非线性结构.因此,文中提出基于维度扩展和重排的类圆映射可视化聚类方法.利用近邻传播聚类算法和多目标聚类可视化评价指标对高维数据进行维度扩展,然后对扩展后的高维数据进行维度相关性重排,最后利用类圆映射机制降维至二维可视化空间,实现高维数据有效可视化聚类.实验表明,文中提出的维度扩展和重排策略能有效提高类圆映射可视化方法聚类效果,其中的维度扩展策略也能显著提高其它径向布局可视化方法聚类效果,泛化性能较好.此外,相比同类方法,文中方法在可视化聚类准确度、拓扑保持、Dunn指数及效果上优势明显  相似文献   

9.
This paper presents a new method that deals with a supervised learning task usually known as multiple regression. The main distinguishing feature of our technique is the use of a multistrategy approach to this learning task. We use a clustering method to form sub-sets of the training data before the actual regression modeling takes place. This pre-clustering stage creates several training sub-samples containing cases that are nearby to each other from the perspective of the multidimensional input space. Supervised learning within each of these sub-samples is easier and more accurate as our experiments show. We call the resulting method clustered partial linear regression. Predictions using these models are preceded by a cluster membership query for each test case. The cluster membership probability of a test case is used as a weight in an averaging process that calculates the final prediction. This averaging process involves the predictions of the regression models associated to the clusters for which the test case may belong. We have tested this general multistrategy approach using several regression techniques and we have observed significant accuracy gains in several data sets. We have also compared our method to bagging that also uses an averaging process to obtain predictions. This experiment showed that the two methods are significantly different. Finally, we present a comparison of our method with several state-of-the-art regression methods showing its competitiveness.  相似文献   

10.
Mirkin  Boris 《Machine Learning》1999,35(1):25-39
Based on a reinterpretation of the square-error criterion for classical clustering, a separate-and-conquer version of K-Means clustering is presented and a contribution weight is determined for each variable of every cluster. The weight is used to produce conjunctive concepts that describe clusters and to reduce or transform the variable (feature) space.  相似文献   

11.
Parallel coordinate plots (PCPs) are among the most useful techniques for the visualization and exploration of high-dimensional data spaces. They are especially useful for the representation of correlations among the dimensions, which identify relationships and interdependencies between variables. However, within these high-dimensional spaces, PCPs face difficulties in displaying the correlation between combinations of dimensions and generally require additional display space as the number of dimensions increases. In this paper, we present a new technique for high-dimensional data visualization in which a set of low-dimensional PCPs are interactively constructed by sampling user-selected subsets of the high-dimensional data space. In our technique, we first construct a graph visualization of sets of well-correlated dimensions. Users observe this graph and are able to interactively select the dimensions by sampling from its cliques, thereby dynamically specifying the most relevant lower dimensional data to be used for the construction of focused PCPs. Our interactive sampling overcomes the shortcomings of the PCPs by enabling the visualization of the most meaningful dimensions (i.e., the most relevant information) from high-dimensional spaces. We demonstrate the effectiveness of our technique through two case studies, where we show that the proposed interactive low-dimensional space constructions were pivotal for visualizing the high-dimensional data and discovering new patterns.  相似文献   

12.
Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization   总被引:21,自引:1,他引:20  
Web usage mining, possibly used in conjunction with standard approaches to personalization such as collaborative filtering, can help address some of the shortcomings of these techniques, including reliance on subjective user ratings, lack of scalability, and poor performance in the face of high-dimensional and sparse data. However, the discovery of patterns from usage data by itself is not sufficient for performing the personalization tasks. The critical step is the effective derivation of good quality and useful (i.e., actionable) aggregate usage profiles from these patterns. In this paper we present and experimentally evaluate two techniques, based on clustering of user transactions and clustering of pageviews, in order to discover overlapping aggregate profiles that can be effectively used by recommender systems for real-time Web personalization. We evaluate these techniques both in terms of the quality of the individual profiles generated, as well as in the context of providing recommendations as an integrated part of a personalization engine. In particular, our results indicate that using the generated aggregate profiles, we can achieve effective personalization at early stages of users' visits to a site, based only on anonymous clickstream data and without the benefit of explicit input by these users or deeper knowledge about them.  相似文献   

13.
刘芳 《计算机应用研究》2012,29(4):1300-1303
提出了用无监督的自组织映射方法对金融数据进行聚类,并用平行坐标和交互式的圆形平行坐标方法在二维平面上表示出来。用这种方法形成清晰的可视化聚类结果,不仅有效地总结了数据特征,还提高了聚类的可视效果,从而便于发现数据的变化趋势。  相似文献   

14.
钱宇 《软件学报》2008,19(8):1965-1979
可视化技术的发展极大地提高了传统数据挖掘技术的效率.通过结合人类识别模式的能力,计算机程序能够更有效的发现隐藏在数据中的规律和信息.作为聚类分析的重要步骤,噪音消除一直都是困绕数据挖掘研究者的问题,尤其对于不同领域的应用,由于噪音的模型和定义不同,单一的数据处理方法无法有效而准确地去除域相关的噪音.本文针对这一问题,提出了一个新型的可视化噪音处理方法CLEAN.CLEAN的独特之处在于它设计的噪音处理技术和提出的可视化方法有机地结合在一起.噪音处理算法为可视化模型生成所需数据,同时针对噪音处理算法选择可视化方法,从而达到提高整个数据处理系统性能的目的.这样不仅降低了噪音去除过程中主观因素的影响,还可以帮助数据挖掘程序去除领域相关的噪音.同时源数据的质量,算法参数的选择和不同噪音去除算法的精确性都可以在所使用的可视化模型中反映出来.实验表明CLEAN能够有效地帮助空间数据聚类算法在噪音环境下发现数据的自然聚类.  相似文献   

15.
对于揭示困难情况(高维且大类数)下的类间远近关系,现有的聚类结果可视化方法的效果均不理想。提出了以线珠模式和(非)线性坐标表示类间远近关系的可视化方法,其优点是在上述困难情况下也能准确地显示各个聚类之间的远近关系或距离。对模拟和真实数据集的实验结果表明,线珠模式方法十分有效,能准确地显示基于降维可视化方法无法正确显示的类间远近关系。  相似文献   

16.
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.  相似文献   

17.
Visual analytics of multidimensional multivariate data is a challenging task because of the difficulty in understanding metrics in attribute spaces with more than three dimensions. Frequently, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records with similar attribute values. A large number of (typically hierarchical) clustering algorithms have been developed to group individual records to clusters of statistical significance. However, only few visualization techniques exist for further exploring and understanding the clustering results. We propose visualization and interaction methods for analyzing individual clusters as well as cluster distribution within and across levels in the cluster hierarchy. We also provide a clustering method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional multivariate space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. To visually represent the cluster hierarchy, we present a 2D radial layout that supports an intuitive understanding of the distribution structure of the multidimensional multivariate data set. Individual clusters can be explored interactively using parallel coordinates when being selected in the cluster tree. Furthermore, we integrate circular parallel coordinates into the radial hierarchical cluster tree layout, which allows for the analysis of the overall cluster distribution. This visual representation supports the comprehension of the relations between clusters and the original attributes. The combination of the 2D radial layout and the circular parallel coordinates is used to overcome the overplotting problem of parallel coordinates when looking into data sets with many records. We apply an automatic coloring scheme based on the 2D radial layout of the hierarchical cluster tree encoding hue, saturation, and value of the HSV color space. The colors support linking the 2D radial layout to other views such as the standard parallel coordinates or, in case data is obtained from multidimensional spatial data, the distribution in object space.  相似文献   

18.
利用广义细胞自动机实现的智能数据聚类   总被引:2,自引:0,他引:2  
现有的数据聚类方法仍存在着各种不足,聚类速度和结果的质量不能满足大型、高维数据库上的聚类需求。本文提出了一种新的基于广义细胞自动机的数据聚类算法,利用细胞自动机的自组织能力对数据进行聚类分析。聚类结果的质量不受聚类大小和聚类形状的影响,可以通过随机抽样应用于大数据集。文章在细胞结构及细胞动力学规则中引入了细胞核的概念,让细胞自动机利用自身的演化找出数据中的聚类信息。文章通过分析证明了本文方法的有效性,并通过模拟软件对算法性能进行了详细的实验,证明了算法的实用性和高效性。  相似文献   

19.
提出一种基于主分量分析和系统成团法的快速聚类方法.通过构造主分量空间将分散在一组变量上的高维天体光谱投影到两个主分量上,每一个主分量都是原始变量的线性组合,主分量之间互为正交关系,在剔除冗余信息的同时,得到二维坐标;以此为输入,使用系统成团法进行聚类分析研究,实现高维天体光谱的快速自动分类处理.以上述方法为基础设计天体光谱自动分类软件,实现海量光谱的快速、准确分类.  相似文献   

20.
Unlike conventional unsupervised classification methods, such as K‐means and ISODATA, which are based on partitional clustering techniques, the methodology proposed in this work attempts to take advantage of the properties of Kohonen's self‐organizing map (SOM) together with agglomerative hierarchical clustering methods to perform the automatic classification of remotely sensed images. The key point of the proposed method is to execute the cluster analysis process by means of a set of SOM prototypes, instead of working directly with the original patterns of the image. This strategy significantly reduces the complexity of the data analysis, making it possible to use techniques that have not normally been considered viable in the processing of remotely sensed images, such as hierarchical clustering methods and cluster validation indices. Through the use of the SOM, the proposed method maps the original patterns of the image to a two‐dimensional neural grid, attempting to preserve the probability distribution and topology of the input space. Afterwards, an agglomerative hierarchical clustering method with restricted connectivity is applied to the trained neural grid, generating a simplified dendrogram for the image data. Utilizing SOM statistic properties, the method employs modified versions of cluster validation indices to automatically determine the ideal number of clusters for the image. The experimental results show examples of the application of the proposed methodology and compare its performance to the K‐means algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号