首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
经典硬聚类算法HCM(hardc-means)完全基于欧氏距离,针对其无法较好应对各簇规模差异较大的情况,提出在每个欧氏距离项上加入一个影响力因子,使基于距离的标准转变为更通用的基于角度的标准的方法(HCMef算法)。用该算法对二维空间中两类分布密度基本一致,样本数对比分别为1000∶1000、1000∶5000和1000∶10000,正态分布且类边界从较模糊到较清晰的不同数据进行试验。结果显示,HCMef方法可以很好地找到聚类中心的标准设定值,在各种情况下都有很明显优势,表现出很强的稳定性。表明该方法在二维两类情况下的可行性,并值得做进一步推广研究。  相似文献   

2.
为解决传统聚类方法对不同规模类不能正确聚类的问题,探讨了带影响力因子的硬聚类方法。为每个类均赋予一个影响力因子,使样本的隶属关系不只受距离的影响,而且受类的规模的影响。通过对18个数据集的实验,证明该方法的可行性,并且观察了影响力因子的取值对收敛过程和算法产生结果的影响,提出了今后的工作重点。  相似文献   

3.
针对大规模数据集减法聚类时间复杂度高的问题,提出一种基于Nyst(o)m密度值逼近的减法聚类方法.特别适用于大规模数据集的减法聚类问题,可极大程度降低减法聚类的时间复杂度.基于Nystr(o)m逼近理论,结合经典减法聚类样本密度值计算的特点,巧妙地将Nystr(o)m理论用于减法聚类未采样样本之间密度权值矩阵的逼近,从而实现了对所有样本的密度值逼近,最后沿用经典减法聚类修正样本密度值的方法,实现整个减法聚类过程.将本文算法在人工数据、标准彩色图像及UCI数据集上进行了实验,详细说明了本文算法利用少数采样样本逼近多数未采样样本密度权值、密度值以及进行减法聚类的详细过程,并给出了聚类准确率、耗时及算法性能加速比.实验结果表明,与经典的减法聚类相比,本文算法在不影响聚类结果的情况下,对于较大规模数据集,可显著降低减法聚类的时间复杂度,极大程度地提高减法聚类的实时性能.  相似文献   

4.
密度峰值聚类(DPC)算法是一种新颖的基于密度的聚类算法,其原理简单、运行效率高.但DPC算法的局部密度只考虑了样本之间的距离,忽略了样本所处的环境,导致算法对密度分布不均数据的聚类效果不理想;同时,样本分配过程易产生分配错误连带效应.针对上述问题,提出一种基于相对密度估计和多簇合并的密度峰值聚类(DPC-RD-MCM)算法. DPC-RD-MCM算法结合K近邻和相对密度思想,定义了相对K近邻的局部密度,以降低类簇疏密程度对类簇中心的影响,避免稀疏区域没有类簇中心;重新定义微簇间相似性度量准则,通过多簇合并策略得到最终聚类结果,避免分配错误连带效应.在密度分布不均数据集、复杂形态数据集和UCI数据集上,将DPC-RD-MCM算法与DPC及其改进算法进行对比,实验结果表明:DPC-RD-MCM算法能够在密度分布不均数据上获得十分优异的聚类效果,在复杂形态数据集和UCI数据集的聚类性能上高于对比算法.  相似文献   

5.
大多数集成聚类算法使用K-means算法生成基聚类,得到的基聚类效果不太理想.通常在使用共协矩阵对基聚类进行集成时,忽视了基聚类多样性的不同,平等地对待基聚类,且以样本为操作单元生成共协矩阵.当样本数目或集成规模较大时,计算负担显著增加.针对上述问题,提出超簇加权的集成聚类算法(ECWSC).该算法使用随机选点与K-means选点相结合来获取地标点,对地标点使用谱聚类算法得到其聚类结果,再将样本点映射到与之最近邻的地标点上生成基聚类.在此基础上,以信息熵为依据计算基聚类的不确定性,并对基聚类赋予相应权重,使用加权的方式得到加权超簇的共协矩阵,对共协矩阵使用层次聚类算法得到集成结果.选取7个真实数据集和4个人工数据集作为实验数据集,从准确度、鲁棒性和时间复杂度方面进行验证.对比实验结果表明,该算法能够有效提升集成聚类的性能.  相似文献   

6.
密度分布不均数据是指类簇间样本分布疏密程度不同的数据.密度峰值聚类(DPC)算法在处理密度分布不均数据时,倾向于在密度较高区域内找到类簇中心,并易将稀疏类簇的样本分配给密集类簇.为避免上述缺陷,提出一种面向密度分布不均数据的近邻优化密度峰值聚类(DPC-NNO)算法.DPC-NNO算法结合逆近邻和k近邻定义新的局部密度,提高稀疏样本的局部密度,使算法能更准确地找到类簇中心;定义分配策略时引入共享近邻,计算样本间相似性,构造相似矩阵,使同一类簇样本联系更紧密,避免错误分配样本.将所提出的DPC-NNO算法与IDPC-FA、DPCSA、FNDPC、FKNN-DPC、DPC算法进行对比,实验结果表明,DPC-NNO算法在处理密度分布不均数据时能获得优异的聚类效果,对于复杂数据集和UCI数据集,DPC-NNO算法的综合性能优于对比算法.  相似文献   

7.
针对K-medoids聚类算法对初始聚类中心敏感、聚类结果依赖于初始聚类中心的缺陷,提出一种局部方差优化的K-medoids聚类算法,以期使K-medoids的初始聚类中心分布在不同的样本密集区域,聚类结果尽可能地收敛到全局最优解.该算法引入局部方差的概念,根据样本所处位置的局部样本分布定义样本的局部方差,以样本局部标准差为邻域半径,选取局部方差最小且位于不同区域的样本作为K-medoids的初始中心,充分利用了方差所提供的样本分布信息.在规模大小不等的UCI数据集以及带有不同比例噪声的不同规模的人工模拟数据集上进行实验,并利用六种聚类算法性能测试指标进行测试,结果表明该算法具有聚类效果好、抗噪性能强的优点,而且适用于大规模数据集的聚类.提出的Num-近邻方差优化的K-medoids聚类算法优于快速K-me-doids聚类算法及基于邻域的改进K-medoids聚类算法.  相似文献   

8.
针对快速K-medoids聚类算法存在密度计算复杂耗时和初始聚类中心可能位于同一类簇的缺陷,以及基于邻域的K-medoids算法的邻域半径需要人为给定一个调节系数的主观性缺陷,分别以样本间距离均值和相应样本的标准差为邻域半径,以方差作为样本分布密集程度的度量,选取方差值最小且其间距离不低于邻域半径的样本为K-medoids的初始聚类中心,提出了两种方差优化初始中心的K-medoids算法。在UCI数据集和人工模拟数据集上进行了实验测试,并对各种聚类指标进行了比较,结果表明该算法需要的聚类时间短,得到的聚类结果优,适用于较大规模数据集的聚类。  相似文献   

9.
针对传统协同过滤算法面临数据稀疏、忽略用户时间上下文信息及对兴趣物品偏好程度等问题,本文提出基于谱聚类与多因子融合的协同过滤推荐算法。首先将FCM聚类融入到谱聚类算法的关键步骤,并通过聚类有效性指数对用户聚类个数进行优化,以降低生成最近邻的时耗;然后将Salton因子、时间衰减因子、用户偏好因子进行融合,从而对相似度进行改进;最后获取系统当前时间为目标用户生成推荐列表。Movielens上的实验结果表明,本文提出的算法在推荐精度、覆盖率及新颖度指标上有较大改善,提升了推荐性能。  相似文献   

10.
针对基于基因表达式编程(GEP)的自动聚类算法GEP-Cluster中聚类中心的筛选和聚合、计算数据对象到各聚类中心距离两个关键步骤效率不高的问题,提出了一种基于统一计算设备架构(CUDA)和GEP的自动聚类改进算法(CGEP-Cluster)。CGEP-Cluster算法采用基因阅读运算器方法对GEP-Cluster算法的聚类中心筛选和聚合步骤进行改进,并基于CUDA将GEP-Cluster算法中数据对象到各聚类中心距离的计算并行化。实验结果表明,在数据对象规模较大时,CGEP-Cluster算法可获得8倍左右的加速比。CGEP-Cluster算法可用于聚类数未知且数据对象规模较大情况下的自动聚类。  相似文献   

11.
Novel Cluster Validity Index for FCM Algorithm   总被引:5,自引:0,他引:5       下载免费PDF全文
How to determine an appropriate number of clusters is very important when implementing a specific clustering algorithm, like c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are originated from partition or geometrical property of the data set. In this paper, the authors developed a novel cluster validity index for FCM, based on the optimality test of FCM. Unlike the previous cluster validity indices, this novel cluster validity index is inherent in FCM itself. Comparison experiments show that the stability index can be used as cluster validity index for the fuzzy c-means.  相似文献   

12.
Clustering with a genetically optimized approach   总被引:5,自引:0,他引:5  
Describes a genetically guided approach to optimizing the hard (J 1) and fuzzy (Jm) c-means functionals used in cluster analysis. Our experiments show that a genetic algorithm (GA) can ameliorate the difficulty of choosing an initialization for the c-means clustering algorithms. Experiments use six data sets, including the Iris data, magnetic resonance, and color images. The genetic algorithm approach is generally able to find the lowest known Jm value or a Jm associated with a partition very similar to that associated with the lowest Jm value. On data sets with several local extrema, the GA approach always avoids the less desirable solutions. Degenerate partitions are always avoided by the GA approach, which provides an effective method for optimizing clustering models whose objective function can be represented in terms of cluster centers. A series random initializations of fuzzy/hard c-means, where the partition associated with the lowest Jm value is chosen, can produce an equivalent solution to the genetic guided clustering approach given the same amount of processor time in some domains  相似文献   

13.
In this paper we propose a new metric to replace the Euclidean norm in c-means clustering procedures. On the basis of the robust statistic and the influence function, we claim that the proposed new metric is more robust than the Euclidean norm. We then create two new clustering methods called the alternative hard c-means (AHCM) and alternative fuzzy c-means (AFCM) clustering algorithms. These alternative types of c-means clustering have more robustness than c-means clustering. Numerical results show that AHCM has better performance than HCM and AFCM is better than FCM. We recommend AFCM for use in cluster analysis. Recently, this AFCM algorithm has successfully been used in segmenting the magnetic resonance image of Ophthalmology to differentiate the abnormal tissues from the normal tissues.  相似文献   

14.
A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic c-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy c-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of c-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed c-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.  相似文献   

15.
In this study, hard k-means and fuzzy c-means algorithms are utilized for the classification of fine grained soils in terms of shear strength and plasticity index parameters. In order to collect data, several laboratory tests are performed on 120 undisturbed soil samples, which are obtained from Antalya region. Additionally, for the evaluation of the generalization ability of clustering analysis, 20 fine grained soil samples collected from the other regions of Turkey are also classified using the same clustering algorithms. Fuzzy c-means algorithm exhibited better clustering performance over hard k-means classifier. As expected, clustering analysis produced worse outcomes for soils collected from different regions than those of obtained from a specific region. In addition to its precise classification ability, fuzzy c-means approach is also capable of handling the uncertainty existing in soil parameters. As a result, fuzzy c-means clustering can be successfully applied to classify regional fine grained soils on the basis of shear strength and plasticity index parameters.  相似文献   

16.
Image Segmentation Based on Adaptive Cluster Prototype Estimation   总被引:8,自引:0,他引:8  
An image segmentation algorithm based on adaptive fuzzy c-means (FCM) clustering is presented in this paper. In the conventional FCM clustering algorithm, cluster assignment is based solely on the distribution of pixel attributes in the feature space, and does not take into consideration the spatial distribution of pixels in an image. By introducing a novel dissimilarity index in the modified FCM objective function, the new adaptive fuzzy clustering algorithm is capable of utilizing local contextual information to impose local spatial continuity, thus exploiting the high inter-pixel correlation inherent in most real-world images. The incorporation of local spatial continuity allows the suppression of noise and helps to resolve classification ambiguity. To account for smooth intensity variation within each homogenous region in an image, a multiplicative field is introduced to each of the fixed FCM cluster prototype. The multiplicative field effectively makes the fixed cluster prototype adaptive to slow smooth within-cluster intensity variation, and allows homogenous regions with slow smooth intensity variation to be segmented as a whole. Experimental results with synthetic and real color images have shown the effectiveness of the proposed algorithm.  相似文献   

17.
Clustering for symbolic data type is a necessary process in many scientific disciplines, and the fuzzy c-means clustering for interval data type (IFCM) is one of the most popular algorithms. This paper presents an adaptive fuzzy c-means clustering algorithm for interval-valued data based on interval-dividing technique. This method gives a fuzzy partition and a prototype for each fuzzy cluster by optimizing an objective function. And the adaptive distance between the pattern and its cluster center varies with each algorithm iteration and may be either different from one cluster to another or the same for all clusters. The novel part of this approach is that it takes into account every point in both intervals when computing the distance between the cluster and its representative. Experiments are conducted on synthetic data sets and a real data set. To compare the comprehensive performance of the proposed method with other four existing methods, the corrected rand index, the value of objective function and iterations are introduced as the evaluation criterion. Clustering results demonstrate that the algorithm proposed in this paper has remarkable advantages.  相似文献   

18.
经典的模糊c均值聚类算法对非球型或椭球型分布的数据集进行聚类效果较差。将经典的模糊c均值聚类中的欧氏距离用Mahalanobis距离替代,利用Mahalanobis距离的优点,将其用于增量学习中,提出一种基于马氏距离的模糊增量聚类学习算法。实验结果表明该算法能较有效地解决模糊聚类方法中的缺陷,提高了训练精度。  相似文献   

19.
This article describes a multiobjective spatial fuzzy clustering algorithm for image segmentation. To obtain satisfactory segmentation performance for noisy images, the proposed method introduces the non-local spatial information derived from the image into fitness functions which respectively consider the global fuzzy compactness and fuzzy separation among the clusters. After producing the set of non-dominated solutions, the final clustering solution is chosen by a cluster validity index utilizing the non-local spatial information. Moreover, to automatically evolve the number of clusters in the proposed method, a real-coded variable string length technique is used to encode the cluster centers in the chromosomes. The proposed method is applied to synthetic and real images contaminated by noise and compared with k-means, fuzzy c-means, two fuzzy c-means clustering algorithms with spatial information and a multiobjective variable string length genetic fuzzy clustering algorithm. The experimental results show that the proposed method behaves well in evolving the number of clusters and obtaining satisfactory performance on noisy image segmentation.  相似文献   

20.
基于遗传FCM算法的文本聚类   总被引:4,自引:1,他引:3  
况夯  罗军 《计算机应用》2009,29(2):558-560
本文提出基于遗传FCM算法的文本聚类方法,首先采用LSI方法对文本特征进行降维,然后通过聚类有效性分析得到文本的类别数,最后再采用遗传FCM算法对文本进行聚类,这种方法较好的克服了FCM算法收敛于局部最优的缺陷,很好的解决了FCM算法对初值敏感的问题。实验表明提出的方法具有较好的聚类性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号