首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a method of the unsupervised discovery of valid clusters using statistics on the modes of the probability density function in scale space. First, a Gaussian scale-space theory is applied to the kernel density estimation to derive the hierarchical relationships among the modes of the probability density function in scale space. The data points are classified into clusters according to the mode hierarchy. Second, the algorithm of cluster discovery is presented. The valid clusters are discovered by testing whether each cluster is distinguishable from spurious clusters obtained from uniformly random points. The statistical hypothesis test for cluster discovery requires distribution forms of annihilation scales of the modes estimated from the uniformly random points. The distribution forms are experimentally shown to be unimodal. Finally, cluster discovery is demonstrated using synthetic data and benchmark data.  相似文献   

2.
Visual analytics of multidimensional multivariate data is a challenging task because of the difficulty in understanding metrics in attribute spaces with more than three dimensions. Frequently, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records with similar attribute values. A large number of (typically hierarchical) clustering algorithms have been developed to group individual records to clusters of statistical significance. However, only few visualization techniques exist for further exploring and understanding the clustering results. We propose visualization and interaction methods for analyzing individual clusters as well as cluster distribution within and across levels in the cluster hierarchy. We also provide a clustering method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional multivariate space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. To visually represent the cluster hierarchy, we present a 2D radial layout that supports an intuitive understanding of the distribution structure of the multidimensional multivariate data set. Individual clusters can be explored interactively using parallel coordinates when being selected in the cluster tree. Furthermore, we integrate circular parallel coordinates into the radial hierarchical cluster tree layout, which allows for the analysis of the overall cluster distribution. This visual representation supports the comprehension of the relations between clusters and the original attributes. The combination of the 2D radial layout and the circular parallel coordinates is used to overcome the overplotting problem of parallel coordinates when looking into data sets with many records. We apply an automatic coloring scheme based on the 2D radial layout of the hierarchical cluster tree encoding hue, saturation, and value of the HSV color space. The colors support linking the 2D radial layout to other views such as the standard parallel coordinates or, in case data is obtained from multidimensional spatial data, the distribution in object space.  相似文献   

3.
In this paper we propose an encoding scheme and ad hoc operators for a genetic approach to hierarchical graph clustering. Given a connected graph whose vertices correspond to points within a Euclidean space and a fitness function, a hierarchy of graphs in which each vertex corresponds to a connected subgraph of the graph below is generated. Both the number of clustering levels and the number of clusters on each level are not fixed a priori and are subject to optimization.  相似文献   

4.
5.
This paper is concerned with a stepwise mode of objective function-based fuzzy clustering. A revealed structure in data becomes refined in a successive manner by starting with the most dominant relationships and proceeding with its more detailed characterization. Technically, the proposed process develops a so-called hierarchy of clusters. Given the underlying clustering mechanism of the fuzzy C means (FCM), the produced architecture is referred to as a hierarchical FCM or hierarchical FCM tree (HFCM tree). We discuss the design of the tree demonstrating how its growth is guided by a certain mapping criterion. It is also shown how a structure at the higher level is effectively used to build clusters at the consecutive level by making use of the conditional FCM. Detailed investigations of computational complexity contrast a stepwise development of clusters with a single-step clustering completed for the equivalent number of clusters occurring in total at all final nodes of the HFCM tree. The analysis quantifies a significant reduction of the stepwise refinement of the clusters. Experimental studies include synthetic data as well as those coming from the machine learning repository.  相似文献   

6.
Recent advances in modeling tools enable non‐expert users to synthesize novel shapes by assembling parts extracted from model databases. A major challenge for these tools is to provide users with relevant parts, which is especially difficult for large repositories with significant geometric variations. In this paper we analyze unorganized collections of 3D models to facilitate explorative shape synthesis by providing high‐level feedback of possible synthesizable shapes. By jointly analyzing arrangements and shapes of parts across models, we hierarchically embed the models into low‐dimensional spaces. The user can then use the parameterization to explore the existing models by clicking in different areas or by selecting groups to zoom on specific shape clusters. More importantly, any point in the embedded space can be lifted to an arrangement of parts to provide an abstracted view of possible shape variations. The abstraction can further be realized by appropriately deforming parts from neighboring models to produce synthesized geometry. Our experiments show that users can rapidly generate plausible and diverse shapes using our system, which also performs favorably with respect to previous modeling tools.  相似文献   

7.
高阶异构数据层次联合聚类算法   总被引:1,自引:0,他引:1  
在实际应用中,包含多种特征空间信息的高阶异构数据广泛出现.由于高阶联合聚类算法能够有效融合多种特征空间信息提高聚类效果,近年来逐渐成为研究热点.目前高阶联合聚类算法多数为非层次聚类算法.然而,高阶异构数据内部往往隐藏着层次聚簇结构,为了更有效地挖掘数据内部隐藏的层次聚簇模式,提出了一种高阶层次联合聚类算法(high-order hierarchical co-clustering algorithm, HHCC).该算法利用变量相关性度量指标Goodman-Kruskal τ衡量对象变量和特征变量的相关性,将相关性较强的对象划分到同一个对象聚簇中,同时将相关性较强的特征划分到同一个特征聚簇中.HHCC算法采用自顶向下的分层聚类策略,利用指标Goodman-Kruskal τ评估每层对象和特征的聚类质量,利用局部搜索方法优化指标Goodman-Kruskal τ,自动确定聚簇数目,获得每层的聚类结果,最终形成树状聚簇结构.实验结果表明HHCC算法的聚类效果优于4种经典的同构层次聚类算法和5种已有的非层次高阶联合聚类算法.  相似文献   

8.
CURE算法是一种凝聚的层次聚类算法,它首先提出了使用多代表点描述簇的思想。本文通过对已有的基于多代表点的层次聚类算法特点的分析,提出了一种新的基于多代表点的层次聚类算法WRPC。它使用了基于影响因子的簇代表点选取机制和基于k-近邻方法的小簇合并机制,可以发现形状、尺寸更为复杂的簇。实验结果表明,该算法在保证执行效率的情况下取得了更好的聚类效果。  相似文献   

9.
In distributed data mining, adopting a flat node distribution model can affect scalability. To address the problem of modularity, flexibility and scalability, we propose a Hierarchically-distributed Peer-to-Peer (HP2PC) architecture and clustering algorithm. The architecture is based on a multi-layer overlay network of peer neighborhoods. Supernodes, which act as representatives of neighborhoods, are recursively grouped to form higher level neighborhoods. Within a certain level of the hierarchy, peers cooperate within their respective neighborhoods to perform P2P clustering. Using this model, we can partition the clustering problem in a modular way across neighborhoods, solve each part individually using a distributed K-means variant, then successively combine clusterings up the hierarchy where increasingly more global solutions are computed. In addition, for document clustering applications, we summarize the distributed document clusters using a distributed keyphrase extraction algorithm, thus providing interpretation of the clusters. Results show decent speedup, reaching 165 times faster than centralized clustering for a 250-node simulated network, with comparable clustering quality to the centralized approach. We also provide comparison to the P2P K-means algorithm and show that HP2PC accuracy is better for typical hierarchy heights. Results for distributed cluster summarization match those of their centralized counterparts with up to 88% accuracy.  相似文献   

10.
针对传统K均值聚类方法采用聚类前随机选择聚类个数K而导致的聚类结果不理想的问题,结合空间中的层次结构,提出一种改进的层次K均值聚类算法。该方法通过初步聚类,判断是否达到理想结果,从而决定是否继续进行更细层次的聚类,如此迭代执行,从而生成一棵层次型K均值聚类树,在该树形结构上可以自动地选择聚类的个数。标准数据集上的实验结果表明,与传统的K均值聚类方法相比,提出的改进的层次聚类方法的确能够取得较优秀的聚类效果。  相似文献   

11.
We introduce novel dissimilarity into a probabilistic clustering task to properly measure dissimilarity among multiple clusters when each cluster is characterized by a subpopulation in the mixture model. This measure of dissimilarity is called redundancy-based dissimilarity among probability distributions. From aspects of both source coding and a statistical hypothesis test, we shed light on several of the theoretical reasons for the redundancy-based dissimilarity among probability distributions being a reasonable measure of dissimilarity among clusters. We also elucidate a principle in common for the measures of redundancy-based dissimilarity and Ward's method in terms of hierarchical clustering criteria. Moreover, we show several related theorems that are significant for clustering tasks. In the experiments, properties of the measure of redundancy-based dissimilarity are examined in comparison with several other measures.  相似文献   

12.
13.
基于视觉系统的聚类算法   总被引:15,自引:0,他引:15  
人类对于结构的感知方式和产生数据的物理系统原理对于聚类分析而言具有同等的重要性。因此,在聚类算法的设计和分析中,模拟人类的主要器官-视觉系统将帮助我们解决这一领域的一些基本问题。从这一观点出发,文中提出一类基于初级视觉系统尺度空间理论的聚类算法,并通过引入显著性假设,将生物物理学中的Weber定律与聚类结构的有效性联系起来。由此产生的聚类算法简洁有效,并可部分地回答那些与人类感知数据结构相关联的聚类有效性问题。我们的数值试验表明这一方法具有广泛的应用前景。  相似文献   

14.
网络拓扑结构影响着传感器节点的负载均衡与生存周期,分簇结构是无线传感网络的一种有效地拓扑管理方式。根据血管网络特征以及对构建无线传感器网络拓扑结构的启示,提出了无线传感器网络非均匀等级分簇拓扑结构。分析血管网络结构特征,建立数学模型和网络拓扑结构,对具有压力差的网络节点进行等级标定。根据改进粒子群算法进行非等概率静态分簇,形成不同等级区域具有密度和规模不等的非均匀等级分簇拓扑结构。仿真分析表明,此算法能优化网络分簇,均衡节点能耗,延长网络生命期,避免网络能耗热点问题。  相似文献   

15.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGCIust) that builds layers of subgraphs based on this matrix and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure, we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.  相似文献   

16.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGClust) that builds layers of subgraphs based on this matrix, and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.  相似文献   

17.
依据信息论的思想,对基于层次的K-均值聚类算法(HKMA)过程进行了分析,该算法首先采用层次方法对文档进行初始聚类,得到的聚类总数作为k均值算法中的k值,在此基础上,通过k均值聚类对聚类结果进行修正。实验结果表明,HKMA执行时间整体上优于k-means算法,而且随着数据量的增大执行时间的增长幅度也较小。  相似文献   

18.
This paper proposes a modular approach to the design of hierarchical consensus protocols for the mobile ad hoc network with a static and known set of hosts. A two-layer hierarchy is imposed on the network by grouping mobile hosts into clusters, each with a clusterhead. The messages from and to the hosts in the same cluster are merged/unmerged by the clusterhead so as to reduce the message cost and improve the scalability. The proposed modular approach separates the concerns of clustering hosts from achieving consensus. A clustering function, called eventual clusterer (denoted as diamC), is designed for constructing and maintaining the two-layer hierarchy. Similar to unreliable failure detectors, diamC greatly facilitates the design of hierarchical protocols by providing the fault-tolerant clustering function transparently. We propose an implementation of diamC based on the failure detector diamS. Using diamC, we design a new hierarchical consensus protocol. As shown by the performance evaluation results, the proposed consensus protocol can save both message cost and time cost. Our proposed modular design is therefore effective and can lead to efficient solutions to achieving consensus in mobile ad hoc networks.  相似文献   

19.
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster density, we propose that the clustering validity index be used not only globally to find optimal partitions of input data, but also locally to determine which two neighboring clusters are to be merged in a hierarchical clustering of Self-Organizing Map (SOM). A new two-level SOM-based clustering algorithm using the clustering validity index is also proposed. Experimental results on synthetic and real data sets demonstrate that the proposed clustering algorithm is able to cluster data in a better way than classical clustering algorithms on an SOM.  相似文献   

20.
本论文在对各种算法深入分析的基础上,尤其在对基于密度的聚类算法he基于层次的聚类算法深入研究的基础上,提出了一种全新的基于密度和层次的快速聚类算法。该算法保持了基于密度聚类算法发现任意形状簇的优点,而且具有近似线性的时间复杂性,因此该算法适合对大规模数据的挖掘。理论分析和实验结果也证明了基于密度和层次的聚类算法具有处理任意形状簇的聚类、对噪音数据不敏感的特点,并且其执行效率明显高于传统的DBSCAN算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号