首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
Most clustering algorithms become ineffective when provided with unsuitable parameters or applied to datasets which are composed of clusters with diverse shapes, sizes, and densities. To alleviate these deficiencies, we propose a novel split-and-merge hierarchical clustering method in which a minimum spanning tree (MST) and an MST-based graph are employed to guide the splitting and merging process. In the splitting process, vertices with high degrees in the MST-based graph are selected as initial prototypes, and K-means is used to split the dataset. In the merging process, subgroup pairs are filtered and only neighboring pairs are considered for merge. The proposed method requires no parameter except the number of clusters. Experimental results demonstrate its effectiveness both on synthetic and real datasets.  相似文献   

2.
一种处理障碍约束的聚类算法   总被引:1,自引:0,他引:1  
根据障碍约束空间聚类问题的特点,利用图论的相关知识,提出了一种分阶段的基于图的聚类的算法。首先,通过最小生成树聚类算法,在不考虑障碍约束的情况下对空间对象进行聚类;然后,引入障碍物对上一步的聚类结果进行分割;最后,根据被障碍物分割后形成的各个类之间的障碍距离,将距离较近的两个类合并,形成最终的聚类结果。最后通过实验验证了算法的效果,而且输入参数少,时间复杂度低。  相似文献   

3.
最小生成树用于基因表示数据的聚类算法   总被引:6,自引:0,他引:6  
在生物学研究中,需要对植物和动物分类,对基因进行分类,以获得对种群固有结构的认识.使用聚类分析方法,有效地鉴别基因表示数据的模式,将它们分组成为由类似对象组成的多个类,对研究基因的结构、功能以及不同种类基因之间的关系都具有重要意义.将图论的最小生成树理论引入分子生物学中基因表示数据的聚类分析方法,设计了生成树的表示和基于最小生成树的聚类算法,证明了该方法对于一些准则函数能够产生全局最优簇,并根据实验结果对算法进行了讨论和评价.  相似文献   

4.
聚类中心初始化的新方法   总被引:4,自引:1,他引:3  
k-均值聚类算法易受初始聚类中心的影响而陷入局部最优解.现有聚类中心初始化方法尚未得到广泛认可.本文依据每个类内至少有一个数据稠密区,且处于不同类的数据稠密区比处于同一类的数据稠密区相距更远的假设,在数据集合上构造一棵最小支撑树,应用根树原理在其上搜索数据稠密区并估计其密度,从中选出密度大且足够分离的数据稠密区,以其内的点作为初始聚类中心,得到了一个聚类中心初始化的新方法.将此方法与现有的方法进行比较,仿真实验表明,本文方法性能更优越.  相似文献   

5.
This paper presents two different versions of a new internal index for clustering validation using graphs. These graphs capture the structural characteristics of each cluster. In this way, the new index overcomes the limitations of traditional indices based on statistics measurements and it is effective on clusters of different shapes and sizes. These graphs are generated through an iterative process based on the principal component analysis, which partitions the clusters in a configurable number of “sub-clusters”. Then, a minimum spanning tree based on the centroids of each of these sub-clusters is built and used to estimate both the quality of the clusters and the distances between them. In particular, the quality of a cluster is defined in this paper as the level of “cohesion” among its sub-clusters. The difference between the two versions of the proposed index is how this level of "cohesion" is measured. Finally, a comparison of the performance of these two versions of the proposed index with a selected group of well-known internal indices is carried out. In these tests, the two versions of the index show a superior capacity to deal with datasets that present different configurations of variances, densities, geometries and levels of noise.  相似文献   

6.
针对目前已有的聚类算法不能很好地处理包含不同密度的簇数据,或者不能很好地区分相邻的密度相差不大的簇的问题,提出1种新的基于严格最近邻居和共享最近邻居的聚类算法.通过构造共享严格最近邻图,使样本点在密度一致的区域保持连接,而在密度不同的相邻区域断开连接,并尽可能去除噪声点和孤立点.该算法可以处理包含有不同密度的簇数据,而且在处理高维数据时具有较低的时间复杂度、实验结果证明,该算法能有效找出不同大小、形状和密度的聚类.  相似文献   

7.
XML结构聚类     
郝晓丽  冯志勇 《计算机应用》2005,25(6):1398-1400
针对当前XML文档结构聚类算法的一些不足,提出采用段匹配的概念来计算两棵XML文档树中的路径相似性,并在此基础上得出两棵树整体的相似度量。在整个聚类过程中,算法还把一组相关文档与一个XML聚类代表相关联,该聚类代表就包含了一个文档集合中所有文档的最相关的特征。为了构建聚类代表,算法通过构造最佳匹配树,合并树,修剪树三步来实现。通过比较聚类代表,发现新的聚类时更新聚类代表来完成文档聚类。实验结果就充分展现了算法的有效性。  相似文献   

8.
周世波  徐维祥 《控制与决策》2018,33(11):1921-1930
聚类是数据挖掘领域的一个重要研究方向,针对复杂数据集中存在的簇间密度不均匀、聚类形态多样、聚类中心的识别等问题,引入样本点k近邻信息计算样本点的相对密度,借鉴快速搜索和发现密度峰值聚类(CFSFDP)算法的簇中心点识别方法,提出一种基于相对密度和决策图的聚类算法,实现对任意分布形态数据集聚类中心快速、准确地识别和有效聚类.在7类典型测试数据集上的实验结果表明,所提出的聚类算法具有较好的适用性,与经典的DBSCAN算法和CFSFDP等算法相比,在没有显著提高时间复杂度的基础上,聚类效果更好,对不同类型数据集的适应性也更广.  相似文献   

9.
Clustering problems are applicable to several areas of science. Approaches and algorithms are as varied as the applications. From a graph theory perspective, clustering can be generated by partitioning an input graph into a vertex-disjoint union of cliques (clusters) through addition and deletion of edges. Finding the minimum number of edges additions and deletions required to cluster data that can be represented as graphs is a well-known problem in combinatorial optimization, often referred to as cluster editing problem. However, many real-world clustering applications are characterized by overlapping clusters, that is, clusters that are non-disjoint. In these situations cluster editing cannot be applied to these problems. Literature concerning a relaxation of the cluster editing, where clusters can overlap, is scarce. In this work, we propose the overlapping cluster editing problem, a variation of the cluster editing where the goal is to partition a graph, also by editing edges, into maximal cliques that are not necessarily disjoint. In addition, we also present three slightly different versions of a hybrid heuristic to solve this problem. Each hybrid heuristic is based on coupling two metaheuristicsthat, together, generate a set of clusters; and one of three mixed-integer linear programming models, also introduced in this paper, that uses these clusters as input. The objective with the metaheuristics is to limit the solution exploration space in the models’ resolution, therefore reducing its computational time.Tests results show that the all proposed hybrid heuristic versions are able to generate good-quality overlapping cluster editing solutions. In particular, one version of the hybrid heuristic achieved, at a low computational cost, the best results in 51 of 112 randomly-generated graphs. Although the other two hybrid heuristic versions have harder to solve models, they obtained reasonable results in medium-sized randomly-generated graphs. In addition, the hybrid heuristic achieved good results identifying labeled overlapping clusters in a supervised data set experiment. Furthermore, we also show that, with our new problem definition, clustering a vertex in more than one cluster can reduce the edges editing cost.  相似文献   

10.
针对传统最小生成树聚类算法需要事先知道聚类数目和使用静态全局分类依据,导致聚类密度相差较大时,算法有效性下降,计算复杂度大等问题,提出一种改进的最小生成树自适应分层聚类算法,根据最近邻关系,自动为每个聚类簇设定独立的阈值,使之适应分布密度相差较大的情况,并能自动确定聚类数目。实验表明,算法具有较好的性能,尤其对数据密度分布不均匀的情况也能得到较好的聚类结果。  相似文献   

11.
Robust clustering with applications in computer vision   总被引:3,自引:0,他引:3  
A clustering algorithm based on the minimum volume ellipsoid (MVE) robust estimator is proposed. The MVE estimator identifies the least volume region containing h percent of the data points. The clustering algorithm iteratively partitions the space into clusters without prior information about their number. At each iteration, the MVE estimator is applied several times with values of h decreasing from 0.5. A cluster is hypothesized for each ellipsoid. The shapes of these clusters are compared with shapes corresponding to a known unimodal distribution by the Kolmogorov-Smirnov test. The best fitting cluster is then removed from the space, and a new iteration starts. Constrained random sampling keeps the computation low. The clustering algorithm was successfully applied to several computer vision problems formulated in the feature space paradigm: multithresholding of gray level images, analysis of the Hough space, and range image segmentation  相似文献   

12.
Clustering became a classical problem in databases, data warehouses, pattern recognition, artificial intelligence, and computer graphics. Applications in large spatial databases, point-based graphics, etc., give rise to new requirements for the clustering algorithms: automatic discovering of arbitrary shaped and/or non-homogeneous clusters, discovering of clusters located in low-dimensional hyperspace, detecting cluster boundaries. On that account, a new clustering and boundary detecting algorithm, ADACLUS, is proposed. It is based on the specially constructed adaptive influence function, and therefore, discovers clusters of arbitrary shapes and diverse densities, adequately captures clusters boundaries, and it is robust to noise. Normally ADACLUS performs clustering purely automatically without any preliminary parameter settings. But it also gives the user an optional possibility to set three parameters with clear meaning in order to adjust clustering for special applications. The algorithm was tested on various two-dimensional data sets, and it exhibited its effectiveness in discovering clusters of complex shapes and diverse densities. Linear complexity of the ADACLUS gives it an advantage over some well-known algorithms.  相似文献   

13.
Clustering is an important technique in data mining. The innovative algorithm proposed in this paper obtains clusters by first identifying boundary points as opposed to existing methods that calculate core cluster points before expanding to the boundary points. To achieve this, an affine space-based boundary detection algorithm was employed to divide data points into cluster boundary and internal points. A connection matrix was then formed by establishing neighbor relationships between internal and boundary points to perform clustering. Our clustering algorithm with an affine space-based boundary detection algorithm accurately detected clusters in datasets with different densities, shapes, and sizes. The algorithm excelled at dealing with high-dimensional datasets.  相似文献   

14.
基于最小路由代价树的大规模显微图像拼接方法   总被引:1,自引:1,他引:0       下载免费PDF全文
为了对大规模显微图像进行高质量的拼接,首先提出拼接图的概念及获得高质量全景图像的3个原则,然后采用分块-空间聚类算法配准相邻图像,同时评估配准质量,并计算拼接图的边的权值;最后在此基础上,提出了一种基于最小路由代价生成树的图像拼接方法,该方法通过计算拼接图的最小路由代价生成树来确定所有图像的全局位置,并用来生成全景图像。实验结果表明,该方法可获得高质量的全景图像。  相似文献   

15.
Due to their ability to detect clusters with irregular boundaries, minimum spanning tree-based clustering algorithms have been widely used in practice. However, in such clustering algorithms, the search for nearest neighbor in the construction of minimum spanning trees is the main source of computation and the standard solutions take O(N^{2}) time. In this paper, we present a fast minimum spanning tree-inspired clustering algorithm, which, by using an efficient implementation of the cut and the cycle property of the minimum spanning trees, can have much better performance than O(N^{2}).  相似文献   

16.
针对传统K-means算法对初始聚类中心敏感的问题,提出了基于数据样本分布情况的动态选取初始聚类中心的改进K-means算法。该算法根据数据点的距离构造最小生成树,并对最小生成树进行剪枝得到K个初始数据集合,得到初始的聚类中心。由此得到的初始聚类中心非常地接近迭代聚类算法收敛的聚类中心。理论分析与实验表明,改进的K-means算法能改善算法的聚类性能,减少聚类的迭代次数,提高效率,并能得到稳定的聚类结果,取得较高的分类准确率。  相似文献   

17.
K-means算法所使用的聚类准则函数是将数据集中各个簇的误差平方值直接相加而得到的,不能有效处理簇的密度不均且大小差异较大的数据集。为此,将K-means算法的聚类准则函数定义为加权的簇内标准差之和,权重为簇内数据对象数占总数目的比例。同时,调整了传统K-means算法将数据对象重新分配给簇的方法,采用一个数据对象到中心点的加权距离代替传统K-means算法中的距离,将数据对象分配给使加权距离最小的中心点所在的簇。实验结果表明,针对模拟数据集的聚类,改进K-means算法可以明显减少大而稀的簇中数据对象被错误地分配到相邻的小而密簇的可能性,改善了聚类的质量;针对UCI数据集的聚类,改进算法使得各个簇更为紧凑,从而验证了改进K-means算法的有效性。  相似文献   

18.
面向复杂簇的聚类算法研究与实现   总被引:2,自引:0,他引:2  
有效聚类各种复杂的数据对象簇是聚类算法应用干事务对象划分、图像分割、机器学习等方面需要解决的关键技术.在分析与研究现有聚类算法的基础上,提出一种基于密度和自适应密度可达的改进算法.实验证明,该算法能够有效聚类任意分布形状、不同密度、不同尺度的簇;同时,算法的计算复杂度与传统基于密度的聚类算法相比有明显的降低.  相似文献   

19.
Finding the degree-constrained minimum spanning tree (d-MST) of a graph is a well-studied NP-hard problem of importance in communications network design and other network-related problems. In this paper we describe some previously proposed algorithms for solving the problem, and then introduce a novel tree construction algorithm called the randomized primal method (RPM) which builds degree-constrained trees of low cost from solution vectors taken as input. RPM is applied in three stochastic iterative search methods: simulated annealing, multistart hillclimbing, and a genetic algorithm. While other researchers have mainly concentrated on finding spanning trees in Euclidean graphs, we consider the more general case of random graph problems. We describe two random graph generators which produce particularly challenging d-MST problems. On these and other problems we find that the genetic algorithm employing RPM outperforms simulated annealing and multistart hillclimbing. Our experimental results provide strong evidence that the genetic algorithm employing RPM finds significantly lower-cost solutions to random graph d-MST problems than rival methods  相似文献   

20.
A topological and dynamical characterization of the cluster structures described by the support vector clustering is developed. It is shown that each cluster can be decomposed into its constituent basin level cells and can be naturally extended to an enlarged clustered domain, which serves as a basis for inductive clustering. A simplified weighted graph preserving the topological structure of the clusters is also constructed and is employed to develop a robust and inductive clustering algorithm. Simulation results are given to illustrate the robustness and effectiveness of the proposed method  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号