首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 289 毫秒
1.
一种处理障碍约束的基于密度的空间聚类算法   总被引:1,自引:0,他引:1  
杨杨  孙志伟  赵政 《计算机应用》2007,27(7):1688-1691
在现有的基于障碍约束的空间聚类算法COD_CLARANS、DBCLuC、AUTOCLUST+和DBRS+的基础上,提出了一种新的基于密度的空间聚类算法——基于障碍距离的密度聚类算法(DBCOD)。该算法在DBCLuC算法的基础上,采用障碍距离代替欧几里得距离作为相异度的度量标准,并在预处理过程中用障碍多边形合并化简方法来提高障碍物的处理效率。仿真实验结果表明,DBCOD算法不仅具有密度聚类算法的优点,而且聚类结果比传统基于障碍约束的密度聚类算法更合理、更加符合实际情况。  相似文献   

2.
聚类方法是空间数据挖掘的重要方法之一。本文针对聚类时障碍的约束,提出了一种基于障碍约束的空间聚类算法———DBCluOC(Density-BasedCluster-ingwithObstaclesConstraints)。文章首先介绍了基于密度的聚类算法DBSCAN(Density-BasedSpatialClusteringofApplicationwithNoise)以及直接密度可达、密度相连、簇等概念。然后介绍了对约束条件模拟和对多边形约简的方法,并在此基础上提出了DBCluOC算法。最后对算法进行了分析,描述了算法的执行情况。该算法虽然是在二维空间下实现的,但在综合考虑其他因素的前提下,可以推广到对高维…  相似文献   

3.
传统的空间聚类算法解决的是未带障碍约束的空间数据聚类问题,而现实的地理空间中经常会存在河流、山脉等阻碍物,因此,传统空间聚类算法不适用于带障碍数据约束的现实空间.在解析了带障碍空间聚类相关概念和定义的前提下,对带障碍约束条件的空间聚类算法进行梳理,给出了这类算法的研究历史和沿袭关系,并把这类算法按七个维度分为四大类,分析了每类的技术优缺点,最后给出了带障碍约束的空间聚类算法的未来研究趋向.  相似文献   

4.
带障碍约束的遗传K中心空间聚类分析   总被引:1,自引:0,他引:1       下载免费PDF全文
空间聚类分析是空间数据挖掘中的一个重要研究课题。传统聚类算法忽略了真实世界中许多约束条件的存在,而约束条件的存在会影响聚类结果的合理性。讨论了带障碍约束的空间聚类问题,研究了一种基于遗传和划分相结合的带障碍约束空间数据聚类分析方法,设计了一个带障碍约束的遗传K中心空间聚类分析算法。对比实验表明,该方法兼顾了局部收敛和全局收敛性能,考虑到了现实障碍物对聚类结果的影响,使得聚类结果更具有实际意义,其结果优于传统K中心聚类及单纯的遗传聚类,不足之处是其计算速度相对较慢。  相似文献   

5.
空间实体的存在会对空间聚类结果产生重要的影响。传统的空间聚类算法通常没有考虑空间实体的约束作用,很难保证聚类结果的真实性。针对空间约束中的障碍约束和便利约束,本文提出了一种改进的基于空间拓扑相邻关系的密度聚类算法CD—DBSCAN。该算法充分利用空间对象间的拓扑相邻关系,既考虑了空间障碍的阻隔作用,又兼顾了空间便利的连通作用。聚类结果研究表明,该算法能够有效地挖掘出约束条件下的数据集的聚集特征。  相似文献   

6.
一种处理障碍约束的聚类算法   总被引:1,自引:0,他引:1  
根据障碍约束空间聚类问题的特点,利用图论的相关知识,提出了一种分阶段的基于图的聚类的算法。首先,通过最小生成树聚类算法,在不考虑障碍约束的情况下对空间对象进行聚类;然后,引入障碍物对上一步的聚类结果进行分割;最后,根据被障碍物分割后形成的各个类之间的障碍距离,将距离较近的两个类合并,形成最终的聚类结果。最后通过实验验证了算法的效果,而且输入参数少,时间复杂度低。  相似文献   

7.
粒子群K-Medoids带障碍约束空间聚类分析研究   总被引:1,自引:0,他引:1  
空间聚类分析是空间数据挖掘研究领域中的一个重要研究课题.传统聚类算法忽略了真实世界中许多约束条件的存在,而约束条件的存在会影响聚类结果的合理性.本文在分析粒子群优化算法和划分算法的基础上,研究一种基于粒子群和划分相结合的带障碍约束空间聚类分析方法,设计了一个粒子群K-Medoids带障碍约束空间聚类分析算法.对比实验表明,该方法不仅兼顾了局部收敛和全局收敛性能,又充分考虑到了现实障碍物对聚类结果的影响,使得聚类结果更具实际意义.与遗传K-Medoids带障碍约束空间聚类分析相比,该方法具有更好的可伸缩性,且所需输入的参数相对较少,更适合于对聚类速度要求较高的动态约束条件场合.  相似文献   

8.
在现有的基于空间约束的空间聚类算法DBCluC和DBRS+等的研究和比较基础上,提出了一种新的处理物理约束的基于密度的空间聚类算法——DBCluC+。该算法在DBCluC算法基础上,采用网络拓扑结构建模通达对象,并增加通达对象访问点的宽度属性,从而采用约束距离(constrained distance)代替简单的欧几里德距离或障碍距离(obstacle distance)作为相异度的度量标准。理论分析和实验结果表明,DBCluC+算法不仅具有密度聚类算法的优点,而且聚类结果比传统的处理通达约束的聚类算法更  相似文献   

9.
基于粒子群优化的带障碍约束空间聚类分析   总被引:1,自引:0,他引:1  
聚类分析是空间数据挖掘的主要方法之一.传统聚类算法忽略了真实世界中许多约束条件的存在,而约束条件的存在会影响聚类结果的合理性.在分析K中心聚类方法易陷入局部极小值和对初始值敏感的基础上,提出了一种新的聚类方法--基于粒子群优化的带障碍约束空间聚类方法.实验结果表明,该聚类方法不仅使得聚类结果更具实际意义,而且在全局寻优能力方面明显优于K中心聚类方法,且有较快的收敛速度.  相似文献   

10.
基于成对约束的判别型半监督聚类分析   总被引:9,自引:1,他引:9  
尹学松  胡恩良  陈松灿 《软件学报》2008,19(11):2791-2802
现有一些典型的半监督聚类方法一方面难以有效地解决成对约束的违反问题,另一方面未能同时处理高维数据.通过提出一种基于成对约束的判别型半监督聚类分析方法来同时解决上述问题.该方法有效地利用了监督信息集成数据降维和聚类,即在投影空间中使用基于成对约束的K均值算法对数据聚类,再利用聚类结果选择投影空间.同时,该算法降低了基于约束的半监督聚类算法的计算复杂度,并解决了聚类过程中成对约束的违反问题.在一组真实数据集上的实验结果表明,与现有相关半监督聚类算法相比,新方法不仅能够处理高维数据,还有效地提高了聚类性能.  相似文献   

11.
聚类数的确定在聚类分析中是一个基本却具有挑战性的问题.一方面,最佳聚类数根据不同的评价标准、用户偏好或需求可能不一致,因此将不同聚类数的聚类结果呈现给用户作参考是有意义的.另一方面,增加聚类数虽会使聚类结果更加紧致,却会削弱不同类之间的分离性,所以选择合适的聚类数是一个在最小化聚类数与最大化类内紧致性或类间分离性之间取...  相似文献   

12.
Chameleon: hierarchical clustering using dynamic modeling   总被引:8,自引:0,他引:8  
Clustering is a discovery process in data mining. It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. Many advanced algorithms have difficulty dealing with highly variable clusters that do not follow a preconceived model. By basing its selections on both interconnectivity and closeness, the Chameleon algorithm yields accurate results for these highly variable clusters. Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged. Furthermore, one set of schemes (the CURE algorithm and related schemes) ignores the information about the aggregate interconnectivity of items in two clusters. Another set of schemes (the Rock algorithm, group averaging method, and related schemes) ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters. By considering either interconnectivity or closeness only, these algorithms can select and merge the wrong pair of clusters. Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters. Chameleon finds the clusters in the data set by using a two-phase algorithm. During the first phase, Chameleon uses a graph partitioning algorithm to cluster the data items into several relatively small subclusters. During the second phase, it uses an algorithm to find the genuine clusters by repeatedly combining these subclusters  相似文献   

13.
许多实际问题的解决不仅需要聚类算法给出类标,更依赖于类间远近关系的辨别.对于类数较多且高维数据的困难情况,基于降维的聚类结果可视化方法通常会出现聚类的重叠、交织或强行拉远现象,使得一些类间的远近关系无法分辨或被错误显示;而现有的类间距离方法则不能揭示两个聚类是远离还是靠近.本文提出了双几何体模型方法来描述两个聚类的类间关系,并设计了相对边界距离、绝对边界距离和区域疏密程度等测量类间远近程度的方法.本文方法既考虑了两个聚类的最近样本集之间的绝对距离,也考虑了聚类边界区域的疏密程度,其优点是在上述困难情况下也能准确揭示高维空间中的类间关系.对真实数据集的实验结果表明,双几何体模型方法能有效地识别现有聚类可视化方法无法辨别的类间远近关系.  相似文献   

14.
Clustering is a popular technique for analyzing microarray data sets, with n genes and m experimental conditions. As explored by biologists, there is a real need to identify coregulated gene clusters, which include both positive and negative regulated gene clusters. The existing pattern-based and tendency-based clustering approaches cannot directly be applied to find such coregulated gene clusters, because they are designed for finding positive regulated gene clusters. In this paper, in order to cluster coregulated genes, we propose a coding scheme that allows us to cluster two genes into the same cluster if they have the same code, where two genes that have the same code can be either positive or negative regulated. Based on the coding scheme, we propose a new algorithm for finding maximal subspace coregulated gene clusters with new pruning techniques. A maximal subspace coregulated gene cluster clusters a set of genes on a condition sequence such that the cluster is not included in any other subspace coregulated gene clusters. We conduct extensive experimental studies. Our approach can effectively and efficiently find maximal subspace coregulated gene clusters. In addition, our approach outperforms the existing approaches for finding positive regulated gene clusters.  相似文献   

15.
Robust projected clustering   总被引:4,自引:2,他引:2  
Projected clustering partitions a data set into several disjoint clusters, plus outliers, so that each cluster exists in a subspace. Subspace clustering enumerates clusters of objects in all subspaces of a data set, and it tends to produce many overlapping clusters. Such algorithms have been extensively studied for numerical data, but only a few have been proposed for categorical data. Typical drawbacks of existing projected and subspace clustering algorithms for numerical or categorical data are that they rely on parameters whose appropriate values are difficult to set appropriately or that they are unable to identify projected clusters with few relevant attributes. We present P3C, a robust algorithm for projected clustering that can effectively discover projected clusters in the data while minimizing the number of required parameters. P3C does not need the number of projected clusters as input, and can discover, under very general conditions, the true number of projected clusters. P3C is effective in detecting very low-dimensional projected clusters embedded in high dimensional spaces. P3C positions itself between projected and subspace clustering in that it can compute both disjoint or overlapping clusters. P3C is the first projected clustering algorithm for both numerical and categorical data.  相似文献   

16.
A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data’s dimensionality. In this paper, we propose to summarize the set of subspace clusters into k representative clusters to alleviate the problem. Typically, subspace clusters can be clustered further into k groups, and the set of representative clusters can be selected from each group. In such a way, only the most representative subspace clusters will be returned to user. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present two approximate methods: PCoC and HCoC. The greatest advantage of our methods is that we only need a subset of subspace clusters as the input instead of the complete set of subspace clusters. Precisely, only the clusters in low-dimensional subspaces are computed and assembled into representative clusters in high-dimensional subspaces. The approximate results can be found in polynomial time. Our performance study shows both the effectiveness and efficiency of these methods.  相似文献   

17.
This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters’ members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.  相似文献   

18.
Cluster analysis is a useful method which reveals underlying structures and relations of items after grouping them into clusters. In the case of temporal data, clusters are defined over time intervals where they usually exhibit structural changes. Conventional cluster analysis does not provide sufficient methods to analyze these structural changes, which are, however, crucial in the interpretation and evaluation of temporal clusters. In this paper, we present two novel and interactive visualization techniques that enable users to explore and interpret the structural changes of temporal clusters. We introduce the temporal cluster view, which visualizes the structural quality of a number of temporal clusters, and temporal signatures, which represents the structure of clusters over time. We discuss how these views are utilized to understand the temporal evolution of clusters. We evaluate the proposed techniques in the cluster analysis of mixed lipid bilayers.  相似文献   

19.
20.
In this paper, we present an agglomerative fuzzy $k$-means clustering algorithm for numerical data, an extension to the standard fuzzy $k$-means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters in a data set, which is a well known problem in $k$-means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20000 objects and 100 clusters, and the WINE data set of 178 objects, 17 dimensions and 3 clusters from UCI, have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号