首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 98 毫秒
聚类混合型数据,通常是依据样本属性类别的不同分别进行评价。但这种将样本属性划分到不同子空间中分别度量的方式,割裂了样本属性原有的统一性;导致对样本个体的相似性评价产生了非一致的度量偏差。针对这一问题,提出以二进制编码样本属性,再由海明差异对属性编码施行统一度量的新的聚类算法。新算法通过在统一的框架内对混合型数据实施相似性度量,避免了对样本属性的切割,在此基础上又根据不同属性的性质赋予其不同的权重,并以此评价样本个体之间的相似程度。实验结果表明,新算法能够有效地聚类混合型数据;与已有的其他聚类算法相比较,表现出更好的聚类准确率及稳定性。  相似文献   

为获得更贴近于混合属性数据点集空间的相异性度量,从而探测出数据点集的更有意义的聚类分布,提出了一种推进式优化特征权重的K-中心点聚类算法。对该聚类算法进行了必要的讨论,给出其时间复杂度分析及算法收敛性分析。为实现该聚类算法的特征权重优化步骤,给出了二种不同的特征权重优化方法和几个自适应优化距离权重系数、目标函数系数的方法。这些优化方法在一定的理论层次上解决了相异性度量的自适应优化问题。通过几个UCI标准数据集验证了该聚类算法有时能取得更好的聚类质量,从而说明该加权聚类算法具有一定的有效性。给出了几点研究展望,为下一步的研究指明了方向。  相似文献   

基于流形距离的人工免疫半监督聚类算法   总被引:1,自引:1,他引:0  
将流形距离作为样本间相似性的基本度量测度,加入成对约束信息,通过近部传播得出新的度量矩阵。把聚 类问题转化为一优化数学模型。采用克隆选择算法求解这个优化模型,得出最后的聚类结果,通过人工数据集和UCI 标准数据集验证了这种方法具有较高的准确性。  相似文献   

吕佳 《计算机应用》2009,29(5):1380-1384
针对K-means聚类算法无法正确识别非凸形状簇的缺陷,提出一种基于Delaunay三角剖分密度度量的聚类方法,利用Delaunay三角剖分图的最近性、邻接性等优良特性来反映数据自身特点并进行密度度量,同时以混沌优化方法实现聚类目标函数的全局优化,达到全局最小解。实验结果证明,基于Delaunay三角剖分密度度量方式的聚类算法能发现任意非凸形状簇。  相似文献   

当前聚类集成的研究主要是围绕着集成策略的优化展开,而针对基聚类质量的度量及优化却较少研究。基于信息熵理论提出了一种基聚类的质量度量指标,并结合三支决策思想构造了面向基聚类的三支筛选方法。首先预设基聚类筛选三支决策的阈值α、β,然后计算各基聚类中类簇质量的平均值,并把其作为各基聚类的质量度量指标,最后实施三支决策。决策策略为:当某个基聚类的质量度量指标小于阈值β时,删除该基聚类;当某个基聚类的质量度量指标大于等于阈值α时,保留该基聚类;当某个基聚类的质量度量指标大于等于β小于α时,重新计算该基聚类质量,并且再次实施上述三支决策直至没有基聚类被删除或达到指定迭代次数。对比实验结果表明,基聚类三支筛选方法能够有效提升聚类集成效果。  相似文献   

空间聚类分析及评价方法   总被引:5,自引:0,他引:5  
空间聚类是空间数据挖掘研究的重点内容之一,被广泛应用在空间数据分析中,简要分析了空间数据的复杂性,深入研究了不同空间聚类算法的主要思想,列举了其主要的代表性算法,并从外部度量和内部度量两个方面对空间聚类质量评价方法进行了阐述,并对空间聚类研究存在的问题和进一步需要研究内容进行了探讨和展望.  相似文献   

方向相似性聚类方法DSCM   总被引:10,自引:2,他引:10  
针对方向性数据提出了一种鲁棒的基于方向相似性度量的聚类方法DSCM.DSCM首先基于方向性度量构造目标函数,然后通过不动点迭代法对目标函数优化,获得各个样本的最终稳定状态,最后基于样本的最终状态集利用层次聚类技术实现聚类.DSCM的优势在于对方向性数据聚类时不依赖于具体的初始化参数,且能自组织地求解最优聚类划分因而有很好的鲁棒性.通过实验证实了DSCM的有效性以及对已有的两个传统方向性聚类算法的优越性.  相似文献   

聚类算法在数据分析及数据挖掘等许多领域有广泛应用,在聚类方法中引入一种新的距离度量标准替代传统的Euclidean距离度量标准以提高其健壮性,并在此基础上提出基于粒子群算法(Particle Swarm Optimization,简称PSO)的聚类方法和基于量子行为的微粒群优化算法(Quantum-behaved Particle Swarm Optimization,简称QPSO)的聚类方法,然后将两种聚类方法应用于图像分割.实验结果表明,基于QPSO的聚类方法性能优于基于PSO的聚类方法.  相似文献   

基础聚类成员预处理是聚类集成算法中的一个重要研究步骤。众多研究表明,基础聚类成员集合的差异性会影响聚类集成算法性能。当前聚类集成研究围绕着生成基础聚类和优化集成策略展开,而针对基础聚类成员的差异性度量及其优化的研究尚不完善。文中基于Jaccard相似性提出一种基础聚类成员差异性度量指标,并结合三支决策思想提出了基础聚类成员差异性三支过滤方法。该方法首先设定基础聚类成员的三支决策的初始阈值α(0)和β(0),然后计算各个基础聚类成员的差异性度量指标,进而实施三支决策。其决策策略为:当基础聚类成员的差异性度量指标小于指定阈值α(0)时,删除该基础聚类成员;当基础聚类成员的差异性度量指标大于指定阈值β(0)时,保留该基础聚类成员;当基础聚类成员的差异性度量指标大于α(0)且小于β(0)时,该基础聚类成员被归入三支决策边界域等待进一步判断。当结束一轮三支决策后,算法将重新计算三支决策阈值α(1)和β(1)并对上轮三支决策边界域重新进行三支决策,直至没有基础聚类成员被归入三支决策边界域或达到指定迭代次数。对比实验表明基础差异性度量的基础聚类三支过滤方法能够有效地提升聚类集成效果。  相似文献   

针对混合属性空间中具有同一(或相近)分布特性的带类别标记的小样本集和无类别标记的大样本数据集,提出了一种基于MST的自适应优化相异性度量的半监督聚类方法。该方法首先采用决策树方法来获取小样本集的"规则聚类区域",然后根据"同一聚类的数据点更为接近"的原则自适应优化建构在该混合属性空间中的相异性度量,最后将优化后的相异性度量应用于基于MST的聚类算法中,以获得更为有效的聚类结果。仿真实验结果表明,该方法对有些数据集是有改进效果的。为进一步推广并在实际中发掘出该方法的应用价值,本文在最后给出了一个较有价值的研究展望。  相似文献   

We propose a graph model for mutual information based clustering problem. This problem was originally formulated as a constrained optimization problem with respect to the conditional probability distribution of clusters. Based on the stationary distribution induced from the problem setting, we propose a function which measures the relevance among data objects under the problem setting. This function is utilized to capture the relation among data objects, and the entire objects are represented as an edge-weighted graph where pairs of objects are connected with edges with their relevance. We show that, in hard assignment, the clustering problem can be approximated as a combinatorial problem over the proposed graph model when data is uniformly distributed. By representing the data objects as a graph based on our graph model, various graph based algorithms can be utilized to solve the clustering problem over the graph. The proposed approach is evaluated on the text clustering problem over 20 Newsgroup and TREC datasets. The results are encouraging and indicate the effectiveness of our approach.  相似文献   

适用于区间数据的基于相互距离的相似性传播聚类   总被引:1,自引:0,他引:1  
谢信喜  王士同 《计算机应用》2008,28(6):1441-1443
符号聚类是对传统聚类的重要扩展,而区间数据是一类常见的符号数据。传统聚类中使用的对称性度量不一定适用于度量区间数据,且算法初始化也一直是干扰聚类的严重问题。因此,提出了一种适用于区间数据的度量--相互距离,并在此度量的基础上采用了一种全新的聚类方法--相似性传播聚类,解决了初始化干扰问题,从而得出了适用于区间数据的基于相互距离的相似性传播聚类。通过理论阐述和实验比较,说明了该算法比基于欧氏聚类的K-均值算法要好。  相似文献   

Cluster analysis deals with the problem of organization of a collection of objects into clusters based on a similarity measure, which can be defined using various distance functions. The use of different similarity measures allows one to find different cluster structures in a data set. In this article, an algorithm is developed to solve clustering problems where the similarity measure is defined using the L1‐norm. The algorithm is designed using the nonsmooth optimization approach to the clustering problem. Smoothing techniques are applied to smooth both the clustering function and the L1‐norm. The algorithm computes clusters sequentially and finds global or near global solutions to the clustering problem. Results of numerical experiments using 12 real‐world data sets are reported, and the proposed algorithm is compared with two other clustering algorithms.  相似文献   

We introduced a spectral clustering algorithm based on the bipartite graph model for the Manufacturing Cell Formation problem in [Oliveira S, Ribeiro JFF, Seok SC. A spectral clustering algorithm for manufacturing cell formation. Computers and Industrial Engineering. 2007 [submitted for publication]]. It constructs two similarity matrices; one for parts and one for machines. The algorithm executes a spectral clustering algorithm on each separately to find families of parts and cells of machines. The similarity measure in the approach utilized limited information between parts and between machines. This paper reviews several well-known similarity measures which have been used for Group Technology. Computational clustering results are compared by various performance measures.  相似文献   

聚类分析是应用最为广泛的数学方法之一,但又被认为是数学上不严格的一类方法。主要原因在于聚类过程及其结果没有统计学标准。本文建立了具有随机化统计检验的聚类分析算法,用于对若干个样品进行有显著性标记的聚类分析。该算法由三部分组成:距离测度计算、随机化检验和系统聚类。在该算法中,有14种距离测度、三种系统聚类方方法及指标加权与否可供选择。样品之间的距离定义为:1-随机化检验的P检验值;两类间的距离若满足P检验标准则合并为同一类是统计上显著的、可接受的,否则就是不显著的、不可接受的。算法的特点是:用随机化方法进行差异显著性检验,使得对多种距离测度可进行严格的统计检验,随机化检验不需统计前提和假设,适用于各种统计问问题;用于差异显著性检验的随机化方法需要随机化数值为正整数值,适用范围过窄,用数值同步移位和平移方法可使之适用于实数域。算法用Java语言网络化实现,包含六个类和一个HTFML文件。可通过网络在多种Java兼容的浏览器上实现算法共享。根据水稻田无脊椎动物多样性的调查数据,本文对该算法进行了对比分析,并讨论了选择距离测度的一些原则和进一步研究的途径等问题。  相似文献   

Minimum spanning tree partitioning algorithm for microaggregation   总被引:7,自引:0,他引:7  
This paper presents a clustering algorithm for partitioning a minimum spanning tree with a constraint on minimum group size. The problem is motivated by microaggregation, a disclosure limitation technique in which similar records are aggregated into groups containing a minimum of k records. Heuristic clustering methods are needed since the minimum information loss microaggregation problem is NP-hard. Our MST partitioning algorithm for microaggregation is sufficiently efficient to be practical for large data sets and yields results that are comparable to the best available heuristic methods for microaggregation. For data that contain pronounced clustering effects, our method results in significantly lower information loss. Our algorithm is general enough to accommodate different measures of information loss and can be used for other clustering applications that have a constraint on minimum group size.  相似文献   

In this paper, we present a particle swarm optimizer (PSO) to solve the variable weighting problem in projected clustering of high-dimensional data. Many subspace clustering algorithms fail to yield good cluster quality because they do not employ an efficient search strategy. In this paper, we are interested in soft projected clustering. We design a suitable k-means objective weighting function, in which a change of variable weights is exponentially reflected. We also transform the original constrained variable weighting problem into a problem with bound constraints, using a normalized representation of variable weights, and we utilize a particle swarm optimizer to minimize the objective function in order to search for global optima to the variable weighting problem in clustering. Our experimental results on both synthetic and real data show that the proposed algorithm greatly improves cluster quality. In addition, the results of the new algorithm are much less dependent on the initial cluster centroids. In an application to text clustering, we show that the algorithm can be easily adapted to other similarity measures, such as the extended Jaccard coefficient for text data, and can be very effective.  相似文献   

传统的轨迹聚类方法存在定义轨迹相似度难度大,聚类过程中容易忽略轨迹细节等问题.基于矢量场的轨迹聚类(VFC)在保持轨迹原始运动特征的基础上,利用矢量场的几何结构可以很好地度量轨迹相似度.引入加权拟合方法,降低噪声对聚类的影响,以解决VFC鲁棒性较差问题.采用层次聚类动态地决定聚类类别数,以解决聚类类别数不能自适应的问题,提高聚类有效性.采用亚特兰大飓风数据作为实验原始轨迹数据,分别使用经典矢量场的轨迹聚类,k-means聚类,k-mediods聚类以及提出的方法进行实验,实验结果证明了加权拟合矢量场的层次聚类算法的有效性.  相似文献   

In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters; this is known as the automatic clustering problem. Because of lack of prior domain knowledge, it is difficult to choose an appropriate number of clusters, especially when the data have many dimensions, when clusters differ widely in shape, size, and density, and when overlapping exists among groups. In the late 1990s, the automatic clustering problem gave rise to a new era in cluster analysis with the application of nature-inspired metaheuristics. Since then, researchers have developed several new algorithms in this field. This paper presents an up-to-date review of all major nature-inspired metaheuristic algorithms used thus far for automatic clustering. Also, the main components involved during the formulation of metaheuristics for automatic clustering are presented, such as encoding schemes, validity indices, and proximity measures. A total of 65 automatic clustering approaches are reviewed, which are based on single-solution, single-objective, and multiobjective metaheuristics, whose usage percentages are 3%, 69%, and 28%, respectively. Single-objective clustering algorithms are adequate to efficiently group linearly separable clusters. However, a strong tendency in using multiobjective algorithms is found nowadays to address non-linearly separable problems. Finally, a discussion and some emerging research directions are presented.  相似文献   

Measures of interestingness play a crucial role in association rule mining. An important methodological problem, on which several papers appeared in the literature, is to provide a reasonable classification of the measures. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of clustering of the measures. Unlike the existing studies, our method reveals overlapping clusters of interestingness measures. We argue that the overlap between clusters is a desired feature of natural groupings of measures and that because formal concepts are used as factors in Boolean factor analysis, the resulting clusters have a clear meaning and are easy to interpret. We conduct three case studies on clustering of measures, provide interpretations of the resulting clusters and compare the results to those of the previous approaches reported in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号