首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
现有粗糙K-means聚类算法及系列改进、衍生算法均是从不同角度描述交叉类簇边界区域中的不确定性数据对象,却忽视类簇间规模的不均衡对聚类迭代过程及结果的影响.文中引入区间2-型模糊集的概念度量类簇的边界区域数据对象,提出基于区间2-型模糊度量的粗糙K-means聚类算法.首先根据类簇的数据分布生成边界区域样本对交叉类簇的隶属度区间,体现数据样本的空间分布信息.然后进一步考虑类簇的数据样本规模,在隶属度区间的基础上自适应地调整边界区域的样本对交叉类簇的影响系数.文中算法削弱边界区域对较小规模类簇的中心均值迭代的不利影响,提高聚类精度.在人工数据集及UCI标准数据集的测试分析验证算法的有效性.  相似文献   

2.
一种聚簇消减大规模数据的支持向量分类算法   总被引:3,自引:0,他引:3  
陈光喜  徐健  成彦 《计算机科学》2009,36(3):184-188
针对支持向量分类机对大规模数据集训练速度慢的瓶颈,提出一种聚簇消减数据集方法.首先建立样本中心距离函数,计算聚簇集的比例半径,然后利用聚簇集镜像扫描样本点确定簇集类,同一类样本特性的聚簇集中只保留代表样本点,建立异类点删除矩阵,通过上述方法消减样本集.证明了这种簇消减算法有较低的时间复杂度,并利用实验说明了保留代表点的有效意义.最后通过随机数据和UCI标准数据库验证了算法在保证分类精度的同时提高了分类速度.  相似文献   

3.
针对软子空间聚类算法搜寻聚类中心点容易陷入局部最优的缺点,提出在软子空间聚类框架下,结合量子行为粒子群优化(QPSO)和梯度下降法优化软子空间聚类目标函数的模糊聚类算法.根据QPSO全局寻优的特点,求解子空间中全局最优中心点,利用梯度下降法收敛速度快的特点,求解样本点的模糊权重和隶属度矩阵,最终获取样本点的最优聚类结果.在UCI数据集上的实验表明,文中算法可提高聚类精度和聚类结果的稳定性.  相似文献   

4.
贺娜  马盈仓 《计算机工程》2022,48(7):114-121+150
现有多视图模糊C均值聚类(FCM)算法通常将一个多视图分解为多个单视图进行数据处理,导致视图数据聚类精度降低,从而影响全局数据划分结果。为实现高维数据和多视图数据的高效聚类,提出一种基于KL信息的多视图自加权模糊聚类算法。将多个视图信息及其权重进行拟合融入标准FCM算法,求解多个隶属度矩阵和质心矩阵。在此基础上,通过附加KL信息作为模糊正则项进一步修正共识隶属度矩阵并保持权重分布的平滑性,其中KL信息是视图隶属度与其共识隶属度的比值,最小化KL信息会使每个视图的隶属度偏向于共识隶属度以得到更好的聚类结果。实验结果表明,该算法相比于传统聚类算法具有更好的聚类效果和更快的收敛速度,尤其在3-Sources数据集上相比于MVASM算法的聚类精度、标准化互信息和纯度分别提升了7.46、15.34和5.48个百分点。  相似文献   

5.
模糊C-均值(FCM)聚类算法是目前最流行的数据集模糊划分方法之一.但是,有关聚类类别数的合理选择和确定,即聚类有效性分析,对FCM算法而言仍是一个开放性问题.为此,本文结合数据集的几何结构信息和FCM算法的模糊划分信息,重新定义了划分矩阵,进而利用划分模糊度提出了一种新的模糊聚类有效性函数.实验结果表明该方法是有效的且具有良好的鲁棒性.  相似文献   

6.
针对粗糙K-means聚类及其相关衍生算法需要提前人为给定聚类数目、随机选取初始类簇中心导致类簇交叉区域的数据划分准确率偏低等问题,文中提出基于混合度量与类簇自适应调整的粗糙模糊K-means聚类算法.在计算边界区域的数据对象归属于不同类簇的隶属程度时,综合考虑局部密度和距离的混合度量,并采用自适应调整类簇数目的策略,获得最佳聚类数目.选取数据对象稠密区域中距离最小的两个样本的中点作为初始类簇中心,将附近局部密度高于平均密度的对象划分至该簇后再选取剩余的初始类簇中心,使初始类簇中心的选取更合理.在人工数据集和UCI标准数据集上的实验表明,文中算法在处理类簇交叠严重的球簇状数据集时,具有自适应性,聚类精度较优.  相似文献   

7.
顾苏杭  王士同 《控制与决策》2020,35(9):2081-2093
提出利用特征增量学习和数据风格信息双知识表达约束的模糊K平面聚类(ISF-KPC)算法.为了获得更好的泛化性,聚类前利用高斯核函数对原输入特征进行增长式的特征扩维.考虑数据集中来源于同一聚类的样本具有相同的风格,以矩阵的形式表达数据风格信息,并采用迭代的方式确定每个聚类的风格矩阵.大量实验结果表明,双知识表达约束的ISF-KPC与对比算法相比能够取得竞争性的聚类性能,尤其在具有典型风格数据集上能够取得优异的聚类性能.  相似文献   

8.
模糊k-平面聚类算法   总被引:2,自引:1,他引:1  
在k-平面聚类(kPC)算法的基础上,通过引入模糊隶属关系,提出模糊k-平面聚类(FkPC)算法.与kPC类似,FkPC同样从原型选择的角度出发,以k个超平面替代传统的点(类中心)作为聚类原型.同时,由于模糊隶属度的引入,FkPC更能体现各样本点和与之对应的聚类平面的隶属关系.在人工数据集和标准数据集上的实验,均证实了FkPC算法的聚类有效性.更深入地揭示出除相似性度量之外,原型表示对聚类结果同样有着至关重要的影响.  相似文献   

9.
针对模糊C-均值聚类(FCM)算法对噪声敏感、容易收敛到局部极小值的问题,提出一种基于交叉熵的模糊聚类算法。通过引入交叉熵重新定义了传统FCM算法的目标函数,利用交叉熵度量样本隶属度之间的差异性,并采用拉格朗日求解方法和朗伯W函数解决了目标函数的优化问题,此外,分析了样本划分矩阵的分布情况,依据分布特性对噪声样本进行识别。人工数据集合和标准数据集加噪的实验结果表明,该算法提高了传统FCM算法的抗干扰能力,具有更强的鲁棒性,噪声样本识别的准确率较高。  相似文献   

10.
针对基于粒子群的模糊聚类算法以隶属度编码时对噪音敏感,以及处理样本数小于样本维数的数据集效果较差等问题,通过改进其中的模糊聚类约束方法,提出一种改进的基于粒子群的模糊聚类方法.当样本对各类的隶属度之和不为1时,新方法在粒子群优化得出的隶属度基础上,根据样本与各类之间的距离对隶属度进一步分配,以使隶属度满足模糊聚类约束条件.新方法显著地改善了在隶属度编码下使用粒子群进行模糊聚类的效果,并通过典型的数据集进行了验证.  相似文献   

11.
基于模糊C均值(FCM)和局部自适应聚类(LAC)提出一种针对高维数据的联机局部自适应模糊C均值聚类算法(OLAFCM).OLAFCM通过为各类属性分别赋以相应的局部权重,使各类属性分布在不同属性组合的张量子空间内,从而有效降低采用全局降维方法造成的信息损失,同时适合聚类数据流.最后,在人工模拟和真实数据集上验证OLAFCM比之现有基于全局降维的划分联机聚类算法具有更好的性能.  相似文献   

12.
Clustering categorical data sets using tabu search techniques   总被引:2,自引:0,他引:2  
Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The fuzzy k-means-type algorithm is best suited for implementing this clustering operation because of its effectiveness in clustering data sets. However, working only on numeric values limits its use because data sets often contain categorical values. In this paper, we present a tabu search based clustering algorithm, to extend the k-means paradigm to categorical domains, and domains with both numeric and categorical values. Using tabu search based techniques, our algorithm can explore the solution space beyond local optimality in order to aim at finding a global solution of the fuzzy clustering problem. It is found that the clustering results produced by the proposed algorithm are very high in accuracy.  相似文献   

13.
The first stage of organizing objects is to partition them into groups or clusters. The clustering is generally done on individual object data representing the entities such as feature vectors or on object relational data incorporated in a proximity matrix.This paper describes another method for finding a fuzzy membership matrix that provides cluster membership values for all the objects based strictly on the proximity matrix. This is generally referred to as relational data clustering. The fuzzy membership matrix is found by first finding a set of vectors that approximately have the same inter-vector Euclidian distances as the proximities that are provided. These vectors can be of very low dimension such as 5 or less. Fuzzy c-means (FCM) is then applied to these vectors to obtain a fuzzy membership matrix. In addition two-dimensional vectors are also created to provide a visual representation of the proximity matrix. This allows comparison of the result of automatic clustering to visual clustering. The method proposed here is compared to other relational clustering methods including NERFCM, Rouben’s method and Windhams A-P method. Various clustering quality indices are also calculated for doing the comparison using various proximity matrices as input. Simulations show the method to be very effective and no more computationally expensive than other relational data clustering methods. The membership matrices that are produced by the proposed method are less crisp than those produced by NERFCM and more representative of the proximity matrix that is used as input to the clustering process.  相似文献   

14.
This article presents a multi-objective genetic algorithm which considers the problem of data clustering. A given dataset is automatically assigned into a number of groups in appropriate fuzzy partitions through the fuzzy c-means method. This work has tried to exploit the advantage of fuzzy properties which provide capability to handle overlapping clusters. However, most fuzzy methods are based on compactness and/or separation measures which use only centroid information. The calculation from centroid information only may not be sufficient to differentiate the geometric structures of clusters. The overlap-separation measure using an aggregation operation of fuzzy membership degrees is better equipped to handle this drawback. For another key consideration, we need a mechanism to identify appropriate fuzzy clusters without prior knowledge on the number of clusters. From this requirement, an optimization with single criterion may not be feasible for different cluster shapes. A multi-objective genetic algorithm is therefore appropriate to search for fuzzy partitions in this situation. Apart from the overlap-separation measure, the well-known fuzzy Jm index is also optimized through genetic operations. The algorithm simultaneously optimizes the two criteria to search for optimal clustering solutions. A string of real-coded values is encoded to represent cluster centers. A number of strings with different lengths varied over a range correspond to variable numbers of clusters. These real-coded values are optimized and the Pareto solutions corresponding to a tradeoff between the two objectives are finally produced. As shown in the experiments, the approach provides promising solutions in well-separated, hyperspherical and overlapping clusters from synthetic and real-life data sets. This is demonstrated by the comparison with existing single-objective and multi-objective clustering techniques.  相似文献   

15.
基于商空间的非均匀粒度聚类分析   总被引:4,自引:0,他引:4  
徐峰  张铃 《计算机工程》2005,31(3):26-28,53
采用距离度量空间的手段讨论了商空间的模糊粒度聚类,结合信息融合技术用不同粒度合成聚类结果,认为聚类可以以非均匀粒度来描述样本集。据此提出了使用Gaussian型函数定义商空间的距离函数的模糊聚类算法(FCluster算法),算法用距离表示信息粒度,不需要定义隶属函数和求出相似矩阵,并且不需要讨论参数的选择。仿真实验说明了算法可以很直观地从不同粒度(距离)观察聚类结果,大大降低了计算复杂度和空间复杂度,适于处理大数据量的样本,并且Gaussian型函数定义的距离对试验样本可以达到很好的效果。  相似文献   

16.
聚类的错误主要表现为两种形式:将原属不同类的数据分到同一个聚类和将原属同一类的数据分到不同聚类。文中提出类内不一致性和类间重叠度两个指标分别度量聚类中出现这两类错误的程度。一个好的模糊分割中包含的聚类错误应尽可能少。同时,聚类紧致度应尽可能大。基于这两个错误度量指标和紧致性度量,提出一种有效性函数来判断模糊聚类的有效性。实验结果表明,提出的有效性函数能有效判断最佳聚类数并且有较好的鲁棒性。  相似文献   

17.
侯晓凡  吴成茂 《计算机科学》2016,43(10):297-303
针对模糊局部C-均值聚类算法计算复杂度高且对大数据样本集进行聚类时极为耗时的特点,提出了快速的模糊局部C-均值聚类分割算法。该算法将目标像素点与其邻域像素点构成的共生矩阵引入模糊局部C-均值算法,得到新的聚类隶属度和聚类中心表达式。对像素分类时,利用邻域像素隶属度进行滤波处理,进一步改善了算法的抗噪性。实验结果表明,该算法满足了图像分割有效性的需求,相较于模糊局部C-均值聚类算法,该算法具有更好的分割性能和实时性,能更好地满足实际场合图像分割的需要。  相似文献   

18.
介绍一种基于模糊逻辑的数据聚类技术,讨论了模糊C均值聚类方法。模糊C均值算法就是利用模糊逻辑理论和聚类思想,将n样本划分到c个类别中的一个,使得被划分到同一簇的对象之间相似度最大,而不同簇之间的相似度最小。  相似文献   

19.
Wireless sensor networks have become increasingly popular because of their ability to cater to multifaceted applications without much human intervention. However, because of their distributed deployment, these networks face certain challenges, namely, network coverage, continuous connectivity and bandwidth utilization. All of these correlated issues impact the network performance because they define the energy consumption model of the network and have therefore become a crucial subject of study. Well-managed energy usage of nodes can lead to an extended network lifetime. One way to achieve this is through clustering. Clustering of nodes minimizes the amount of data transmission, routing delay and redundant data in the network, thereby conserving network energy. In addition to these advantages, clustering also makes the network scalable for real world applications. However, clustering algorithms require careful planning and design so that balanced and uniformly distributed clusters are created in a way that the network lifetime is enhanced. In this work, we extend our previous algorithm, titled the zone-based energy efficient routing protocol for mobile sensor networks (ZEEP). The algorithm we propose optimizes the clustering and cluster head selection of ZEEP by using a genetic fuzzy system. The two-step clustering process of our algorithm uses a fuzzy inference system in the first step to select optimal nodes that can be a cluster head based on parameters such as energy, distance, density and mobility. In the second step, we use a genetic algorithm to make a final choice of cluster heads from the nominated candidates proposed by the fuzzy system so that the optimal solution generated is a uniformly distributed balanced set of clusters that aim at an enhanced network lifetime. We also study the impact and dominance of mobility with regard to the variables. However, before we arrived at a GFS-based solution, we also studied fuzzy-based clustering using different membership functions, and we present our understanding on the same. Simulations were carried out in MATLAB and ns2. The results obtained are compared with ZEEP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号