首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
空间聚类是空间数据挖掘中一个非常重要的方法.本文在分析DBSCAN算法不足的基础上,提出一种改进的空间聚类算法(AISCA).为了能够有效处理大规模空间数据库,算法采用一种新的抽样技术.另外,通过引入匹配邻域的概念,使得算法在聚类时不仅考虑空间属性也考虑非空间属性.二维空间数据测试结果表明算法是可行、有效的.  相似文献   

2.
孙志伟  赵政 《计算机应用》2005,25(6):1379-1381
在很多有效的聚类算法中,DBSCAN算法对于聚类空间数据有着非常好的性能,依赖于基于密度的聚类定义,DBSCAN可以发现任意形状的聚类,而且执行效率很高。但是,DBSCAN没有考虑非空间属性,而非空间属性对聚类的结果也起着十分重要的作用。在DBscAN的基础上,参考DBRS的概念,进一步考虑了非空间属性的数据类型,从而提出了可以处理空间和非空间数据的新的聚类方法,并给出了主要的算法。  相似文献   

3.
一种改进的基于密度的抽样聚类算法   总被引:1,自引:0,他引:1  
基于密度的聚类算法DBSCAN是一种有效的空间聚类算法,它能够发现任意形状的聚类并且有效地处理噪声。然而,DBSCAN算法也有一些缺点,例如,①在聚类时只考虑空间属性没有考虑非空间属性;②在对大规模空间数据库进行聚类分析时需要较大的内存支持和I/O消耗。为此,在分析DBSCAN算法不足的基础上,提出了一种改进的基于密度的抽样聚类(improved density-based spatial clustering algorithm with sampling,IDBSCAS)算法,使之能够有效地处理大规模空间数据库,并且它不仅考虑了空间属性也考虑了非空间属性。2维空间数据的测试结果表明,该算法是可行、有效的。  相似文献   

4.
为弥补属性空间聚类方法只关注对象属性信息以及结构聚类方法只关注对象间关系信息的不足,提出一种基于属性-关系综合相似度的聚类算法.在构建基于属性距离的有权网络后,算法给出对象间综合相似度以及类间综合相似度的计算方法,并设计相应策略自底向上实现聚类.与属性空间聚类和结构聚类方法相比,该算法由于兼顾了属性和关系信息而具有更高...  相似文献   

5.
针对近邻传播算法不适合处理多重尺度和任意形状数据的问题,提出了一种基于多维空间可变换的MSAAP(multidimensional similarity adaptive affinity propagation)算法。首先,通过熵值法计算数据样本点的属性权重;然后,根据属性权重构造出一种新型计算相似性矩阵的方法;最后,根据属性权重的优先级将样本点的空间划分成若干个空间块,并计算空间块的吸引度和归属度之和,进而调整样本点的空间分布。通过13个不同形状的UCI数据集和3个人脸数据库进行对比实验,从准确率、算法时间、聚类个数3个维度去分析,最终实验结果证明所提出的MSAAP算法聚类效果更优。  相似文献   

6.
林姿琼 《福建电脑》2010,26(2):115-115,118
针对空间对象的多属性特点,将对象的地理空间位置属性和非空间属性结合纳入相似度衡量,使聚类结果更具有客观性。  相似文献   

7.
三支概念分析是人工智能领域一个非常重要的研究方向,该理论最大的优势是可以同时研究形式背景中对象“共同具有”和“共同不具有”的属性。众所周知,经过属性聚类生成的新形式背景与原形式背景具有较强的联系,同时原三支概念与经过属性聚类得到的新三支概念也存在紧密的内在联系。为此,进行属性聚类下三支概念的对比研究和分析。首先基于属性聚类提出悲观属性聚类、乐观属性聚类以及一般属性聚类的概念,并研究了这三种属性聚类的关系;然后,通过对比聚类过程与三支概念形成的过程,研究了原三支概念与新三支概念的区别,分别从面向对象和面向属性的角度提出两个最低约束指数,探索了属性聚类对三支概念格的影响,进一步丰富了三支概念分析理论,为可视化数据处理领域提供了可行的思路。  相似文献   

8.
基于属性链表的概念格纵横向维护算法   总被引:5,自引:0,他引:5  
概念格的维护是对已建好的概念格进行对象的插入、删除和修改、属性的删减操作时使概念格保持其特性的一种操作。该文提出了一种基于属性链表的概念格的纵横向维护算法,并对算法进行了分析,得出了较高的时间效率。  相似文献   

9.
DBSCAN是一个基于密度的聚类算法。该算法将具有足够高密度的区域划分为簇,并可以在带有“噪声”的空间数据库中发现任意形状的聚类。但DBSCAN算法没有考虑非空间属性,且DBSCAN算法需扫描空间数据库中每个点的ε-邻域来寻找聚类,这使得DBSCAN算法的应用受到了一定的局限。文中提出了一种基于DBSCAN的算法,可以处理非空间属性,同时又可以加快聚类的速度。  相似文献   

10.
一个改进的基于DBSCAN的空间聚类算法研究   总被引:2,自引:0,他引:2  
DBSCAN是一个基于密度的聚类算法。该算法将具有足够高密度的区域划分为簇,并可以在带有“噪声”的空间数据库中发现任意形状的聚类。但DBSCAN算法没有考虑非空间属性,且DBSCAN算法需扫描空间数据库中每个点的ε-邻域来寻找聚类,这使得DBSCAN算法的应用受到了一定的局限。文中提出了一种基于DBSCAN的算法,可以处理非空间属性,同时又可以加快聚类的速度。  相似文献   

11.
Clustering sensor data discovers useful information hidden in sensor networks. In sensor networks, a sensor has two types of attributes: a geographic attribute (i.e, its spatial location) and non-geographic attributes (e.g., sensed readings). Sensor data are periodically collected and viewed as spatial data streams, where a spatial data stream consists of a sequence of data points exhibiting attributes in both the geographic and non-geographic domains. Previous studies have developed a dual clustering problem for spatial data by considering similarity-connected relationships in both geographic and non-geographic domains. However, the clustering processes in stream environments are time-sensitive because of frequently updated sensor data. For sensor data, the readings from one sensor are similar for a period, and the readings refer to temporal locality features. Using the temporal locality features of the sensor data, this study proposes an incremental clustering (IC) algorithm to discover clusters efficiently. The IC algorithm comprises two phases: cluster prediction and cluster refinement. The first phase estimates the probability of two sensors belonging to a cluster from the previous clustering results. According to the estimation, a coarse clustering result is derived. The cluster refinement phase then refines the coarse result. This study evaluates the performance of the IC algorithm using synthetic and real datasets. Experimental results show that the IC algorithm outperforms exiting approaches confirming the scalability of the IC algorithm. In addition, the effect of temporal locality features on the IC algorithm is analyzed and thoroughly examined in the experiments.  相似文献   

12.
In this paper, a new approach for centralised and distributed learning from spatial heterogeneous databases is proposed. The centralised algorithm consists of a spatial clustering followed by local regression aimed at learning relationships between driving attributes and the target variable inside each region identified through clustering. For distributed learning, similar regions in multiple databases are first discovered by applying a spatial clustering algorithm independently on all sites, and then identifying corresponding clusters on participating sites. Local regression models are built on identified clusters and transferred among the sites for combining the models responsible for identified regions. Extensive experiments on spatial data sets with missing and irrelevant attributes, and with different levels of noise, resulted in a higher prediction accuracy of both centralised and distributed methods, as compared to using global models. In addition, experiments performed indicate that both methods are computationally more efficient than the global approach, due to the smaller data sets used for learning. Furthermore, the accuracy of the distributed method was comparable to the centralised approach, thus providing a viable alternative to moving all data to a central location.  相似文献   

13.
目的 平行坐标是经典的多维数据可视化方法,但在用于地理空间多维数据分析时,往往存在空间位置信息缺失和空间关联分析不确定等问题。对此,本文设计了一种有效关联平行坐标和地图的地理空间多维数据可视分析方法。方法 根据多维属性信息对地理空间位置进行聚类分析,引入Voronoi图和颜色明暗映射对地理空间各类区域进行显著标识,利用平行坐标呈现地理空间多维属性信息,引入互信息度量地理空间聚类与属性类别的相关性,动态地确定平行坐标轴排列顺序,进一步计算属性轴与地图之间数据线的绑定位置,对数据线的布局进行优化处理,降低地图与平行坐标系间数据线分布的紊乱程度。结果 有效集成上述可视化设计及数据分析方法,设计与实现一种基于平行坐标轴动态排列的地理空间多维数据可视化分析系统,提供便捷的用户交互模式,通过2组具有明显地理空间多维属性特征的数据进行测试,验证了本文可视分析方法的有效性和实用性。结论 本文提出的可视分析方法和工具可以帮助用户快速分析地理空间多维属性存在的空间分布特征及其关联模式,为地理空间多维数据的探索提供了有效手段。  相似文献   

14.
Spatial data objects that possess attributes in the optimization domain and the geographic domain are now widely available. For example, sensor data are one kind of spatial data objects. The location of a sensor is an attribute in the geographic domain, while its reading is an attribute in the optimization domain. Previous studies discuss dual clustering problems that attempt to partition spatial data objects into several groups, such that objects in the same group have similar values in their optimization attributes and form a compact region in the geographic domain. However, previous studies do not clearly define compact regions. Therefore, this paper formulates a connective dual clustering problem with an explicit connected constraint given. Objects with a geographic distance smaller than or equal to the connected constraint are connected. The goal of the connective dual clustering problem is to derive clusters that contain objects with similar values in the optimization domain and are connected in the geographic domain. This study further proposes an algorithm CLS (Clustering with Local Search) to efficiently derive clusters. This algorithm consists of two phases: the ConGraph (standing for Connective Graph) transformation phase and the clustering phase. In the ConGraph transformation phase, CLS first transforms the data objects into a ConGraph that captures geographic constraints among data objects and selects initial seeds for clustering. Then, the initial seeds selected nearby data objects and formed coarse clusters by exploring local search in the clustering phase. Moreover, coarse clusters are merged and finely turned. Experiments show that CLS algorithm is more efficient and scalable than existing methods.  相似文献   

15.
Field geological observations have both spatial and non-spatial aspects and recording them directly on a personal computer using a digital mapping tool has become a practical and effective alternative to traditional methods of field data collection and mapping. This paper presents the design of a cost-effective, stand-alone digital field-mapping tool named GRDM that caters to special requirements of field-based studies concerned with spatial disposition of the statistics of field measurements. Such studies require recording multiple observations for individual attributes at each field location to capture the inter-site variability and automatic computation of their statistics. Field observations include directional data that are circular in nature. Therefore, computation of their exclusive statistics within the field system is also necessary. To meet these requirements, GRDM was designed for field personnel lacking expertise in customizing a GIS. Its design automatically accommodates a list of values for each non-spatial attribute attached to individual location points and generates statistics from the lists. The system treats the orientation values as a distinct numeric data type and computes circular statistics for them. It makes both the original data as well as their statistics simultaneously available for extraction of thematic information.  相似文献   

16.
障碍空间中不确定数据聚类算法   总被引:2,自引:0,他引:2  
近些年,由于数据采集的不精确和数据本身的不确定性,使不确定性在位置数据中普通存在。在障碍空间中,聚类不确定数据面临新的挑战。提出了障碍空间中聚类不确定数据的OBS-UK-means(obstacle uncertain K-means)算法,并提出了分别基于R树和Voronoi图的两种剪枝策略和最近距离区域的概念,大大减少了计算量。通过实验验证了OBS-UK-means算法的高效性和准确性,同时证明了剪枝策略在不损害聚类有效性的情况下,能够有效地提高聚类效率。  相似文献   

17.
局部离群点挖掘算法研究   总被引:14,自引:0,他引:14  
离群点可分为全局离群点和局部离群点.在很多情况下,局部离群点的挖掘比全局离群点的挖掘更有意义.现有的基于局部离群度的离群点挖掘算法存在检测精度依赖于用户给定的参数、计算复杂度高等局限.文中提出将对象属性分为固有属性和环境属性,用环境属性确定对象邻域、固有属性计算离群度的方法克服上述局限;并以空间数据为例,将空间属性与非空间属性分开,用空间属性确定空间邻域,用非空间属性计算空间离群度,设计了空间离群点挖掘算法.实验结果表明,所提算法具有对用户依赖性少、检测精度高、可伸缩性强和运算效率高的优点.  相似文献   

18.
邹志文  秦程 《计算机应用》2021,41(3):733-737
现有的R-树空间聚类技术在通常通过随机指定或者计算空间数据间的欧氏距离来选取聚类中心,而未考虑空间数据间的主题相关度。这些导致聚类结果受初始k值影响,空间数据间的关联仅仅是基于地理位置的。针对此种情况,提出了一种基于k-means++的动态构建空间主题R树(TR-tree)方法。首先,在传统的k-means++算法上,通过聚类测度函数动态地确定k个聚类簇,并在聚类测度函数中引入潜在狄利克雷分布(LDA)模型来计算每个空间数据文本的主题概率,从而加强空间数据间的主题关联度;其次,通过主题概率选取概率最大的聚类中心;最后,构建TR-tree,并且在构建时动态分配空间数据。实验结果表明:虽然构建R-树的时间略有增加,但该方法在索引效率及节点间关联度上较仅仅基于地理位置聚类构建R-树的算法有明显提升。  相似文献   

19.
A spatial query interface has been designed and implemented in the object-oriented paradigm for heterogeneous data sets. The object-oriented approach presented is shown to be highly suitable for querying typical multiple heterogeneous sources of spatial data. The spatial query model takes into consideration two common components of spatial data: spatial location and attributes. Spatial location allows users to specify an area or a region of interest, also known as a spatial range query. Also, the spatial query allows users to query spatial orientation and relationships (geometric and topological relationships) among other spatial data within the selected area or region. Queries on the properties and values of attributes provide more detailed non-spatial characteristics of spatial data. A query model specific to spatial data involves exploitation of both spatial and attribute components. This paper presents a conceptual spatial query model of heterogeneous data sets based on the object-oriented data model used in the geospatial information distribution system (GIDS).  相似文献   

20.
Spatial clustering has attracted a lot of research attention due to its various applications. In most conventional clustering problems, the similarity measurement mainly takes the geometric attributes into consideration. However, in many real applications, the nongeometric attributes are what users are concerned about. In the conventional spatial clustering, the input data set is partitioned into several compact regions and data points which are similar to one another in their nongeometric attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. To remedy this, we propose and explore in this paper a new clustering problem on two domains, called dual clustering, where one domain refers to the optimization domain and the other refers to the constraint domain. Attributes on the optimization domain are those involved in the optimization of the objective function, while those on the constraint domain specify the application dependent constraints. Our goal is to optimize the objective function in the optimization domain while satisfying the constraint specified in the constraint domain. We devise an efficient and effective algorithm, named Interlaced Clustering-Classification, abbreviated as ICC, to solve this problem. The proposed ICC algorithm combines the information in both domains and iteratively performs a clustering algorithm on the optimization domain and also a classification algorithm on the constraint domain to reach the target clustering effectively. The time and space complexities of the ICC algorithm are formally analyzed. Several experiments are conducted to provide the insights into the dual clustering problem and the proposed algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号