首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
K-Means聚类算法研究综述   总被引:2,自引:0,他引:2       下载免费PDF全文
K-均值(K-Means)算法是聚类分析中一种基于划分的算法,同时也是无监督学习算法。其具有思想简单、效果好和容易实现的优点,广泛应用于机器学习等领域。但是K-Means算法也有一定的局限性,比如:算法中聚类数目K值难以确定,初始聚类中心如何选取,离群点的检测与去除,距离和相似性度量等。从多个方面对K-Means算法的改进措施进行概括,并和传统K-Means算法进行比较,分析了改进算法的优缺点,指出了其中存在的问题。对K-Means算法的发展方向和趋势进行了展望。  相似文献   

2.
动态的K-均值聚类算法在图像检索中的应用   总被引:2,自引:2,他引:2  
聚类分析技术已经广泛应用于基于内容的图像信息挖掘领域,该技术提高了图像检索的速度和质量。K-均值算法和自适应算法是两个典型的聚类分析算法,但K-均值算法严重依赖于经验参数和阙值的设定;自适应算法得到的聚类个数太多,相应的就是类内的图像个数过少,效率不是很高。从选取初始聚类点是否具有确定性、迭代次数是否过多和聚类个数是否适当等方面考虑,提出了一种新的聚类算法,即动态的K-均值法。模拟实验的结果表明,该算法具有较好的准确性和效率,使检索的质量和速度都得到了很大的提高。  相似文献   

3.
聚类分析是一种无监督的机器学习方法,聚类结果完全取决于所用聚类算法,不同的算法会得到不同的聚类结果,因此面对待挖掘数据选择合适的算法很重要。如何判断哪个聚类算法最合适,或者哪个算法的聚类结果最优,就需要用到聚类评价方法。本文选择各类聚类算法中的经典算法对某汽车4S店顾客消费数据进行聚类分析,最后用两种评价指标对各聚类结果进行评价进而选择出最优的聚类算法。  相似文献   

4.
分析了K-means聚类算法在图像检索中的缺点,提出了一种改进的K-means聚类算法的图像检索方法。它首先计算图像特征库里面的所有颜色直方图特征之间的欧氏距离;然后根据“两个对象距离越近,相似度越大”[1]这一原理,找到符合条件的特征向量作为K-means聚类的初始类心进行聚类;最后进行图像检索。实验结果表明,本算法具有较高的检索准确率。  相似文献   

5.
刘强  夏士雄  周勇  刘兵 《计算机应用研究》2011,28(12):4437-4439
模糊聚类是一种应用广泛的数据分析和建模的无监督方法,但该算法受离群点影响较大,并且没有考虑样本数据中各维特征对聚类贡献程度的不同.针对这两个问题,提出了基于两种加权方式的聚类算法,该算法定义了一种新的样本加权的概念,减弱了离群点对聚类的干扰,同时为数据样本的每一维特征赋予一个权值,使聚类更加准确.仿真实验结果验证了该算...  相似文献   

6.
针对卷积神经网络应用于图像分类任务时需要大量有标签数据的问题,提出一种融合卷积神经网络和聚类分析的无监督分类模型,将无监督算法引入深度学习,并将该模型应用到图像分类领域,来弥补现有分类方式的不足。首先对经典卷积神经网络AlexNet从网络结构和模型训练两个方面进行优化;然后利用改进后的自适应快速峰值聚类算法指导聚类过程,该模型在学习整个网络参数的同时对卷积输出的特征进行聚类,这两个过程迭代进行,以达到对图像进行无监督分类的目的;为了验证所提出的无监督图像分类模型的可行性和有效性,选用了四个常用于图像分类领域的数据集分别进行了分类实验,并将结果与近年来在图像无监督分类任务上表现相对优越的几种算法进行了横向对比。结果表明提出的无监督分类模型在不同数据集上均较现有的几种无监督方法有着更出色的表现。  相似文献   

7.
随着卫星遥感技术的不断发展,基于内容的遥感图像检索技术越来越受到关注。目前该方向的研究主要集中在对遥感图像中不同特征的提取和融合方面,这些方法普遍忽略了这样一个事实:对于不同类型的检索目标,特征应该是不同的。另外,小样本问题也是遥感图像检索中一个较为突出的问题。基于以上两方面考虑,本文提出一种基于特征选择和半监督学习的遥感图像检索新方法,该方法主要包括4个方面:1)利用最小描述长度准则自动确定聚类数目;2)结合聚类方法和适当的聚类有效性指标选择最能表示检索目标的特征,在计算聚类有效性指数时,针对遥感图像检索特点对原有的Davies-Bouldin指数进行了改进;3)动态确定最优颜色特征和最优纹理特征之间的权重;4)根据最优颜色特征和最优纹理特征的权重自动确定半监督学习方法,并进行遥感图像的检索。实验结果表明,与相关反馈方法的检索效果相比,该算法在土壤侵蚀区域检索以及其他一般地表覆盖目标检索中均获得了相近的检索效果,但不需要用户多次反馈。  相似文献   

8.
章永来  周耀鉴 《计算机应用》2019,39(7):1869-1882
大数据时代,聚类这种无监督学习算法的地位尤为突出。近年来,对聚类算法的研究取得了长足的进步。首先,总结了聚类分析的全过程、相似性度量、聚类算法的新分类及其结果的评价等内容,将聚类算法重新划分为大数据聚类与小数据聚类两个大类,并特别对大数据聚类作了较为系统的分析与总结。此外,概述并分析了各类聚类算法的研究进展及其应用概况,并结合研究课题讨论了算法的发展趋势。  相似文献   

9.
K-均值聚类是一种被广泛应用的方法。本文提出了基于K-均值聚类的改进算法,并应用于图像分割。针对K-均值聚类算法对离群点的反应过强的缺点,通过替换中心点,比较代价函数,来达到改进划分结果的目的。实验结果表明,该方法能有效改善聚类中心,提高分类精度和准确性。  相似文献   

10.
李乐  王斐 《计算机应用研究》2021,38(5):1387-1392
针对现有基于K-means的半监督聚类算法存在的共同问题,即对离群点敏感、在非凸数据集与不平衡数据集上表现差,提出了一种基于层次策略的散布种子半监督中心聚类算法。首先通过基于影响空间的样本边缘因子将数据集分为核心层与边缘层,然后应用一种改进的K-medoids算法完成核心层聚类,最后采用一种递进半监督分配策略对边缘层进行分配得到最终聚类结果。算法通过层次策略解决了离群点干扰问题、半监督子簇聚类及合并策略实现了在不同分布数据集上有效聚类。通过与几种半监督聚类方法在人工数据集以及真实数据集上进行的对比实验证明,该算法能够解决现存问题,提升了聚类性能与鲁棒性。  相似文献   

11.
一种有效的用于范例提取的改进聚类算法   总被引:8,自引:0,他引:8  
针对传统范例提取算法随范例教增加而效率下降快的缺点,结合基于选择的CLARA聚类方法和NCL聚类算法的优点,给出了一种有效的无监督聚类学习算法.通过实验表明,该算法能在无监督下对范例进行准确归类,将它用于CBR的范例提取中,能大大地提高范例提取的速度和质量。  相似文献   

12.
Clustering high dimensional data has become a challenge in data mining due to the curse of dimensionality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calculates the local feature weights automatically in an EM-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weighting method, as a means to minimize a proposed clustering criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the possible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.  相似文献   

13.
This paper introduces a new outlier detection approach and discusses and extends a new concept, class separation through variance. We show that even for balanced and concentric classes differing only in variance, accumulating information about the outlierness of points in multiple subspaces leads to a ranking in which the classes naturally tend to separate. Exploiting this leads to a highly effective and efficient unsupervised class separation approach. Unlike typical outlier detection algorithms, this method can be applied beyond the ‘rare classes’ case with great success. The new algorithm FASTOUT introduces a number of novel features. It employs sampling of subspaces points and is highly efficient. It handles arbitrarily sized subspaces and converges to an optimal subspace size through the use of an objective function. In addition, two approaches are presented for automatically deriving the class of the data points from the ranking. Experiments show that FASTOUT typically outperforms other state-of-the-art outlier detection methods on high-dimensional data such as Feature Bagging, SOE1, LOF, ORCA and Robust Mahalanobis Distance, and competes even with the leading supervised classification methods for separating classes.  相似文献   

14.
综合颜色和形状特征聚类的图像检索   总被引:1,自引:0,他引:1  
张永库  李云峰  孙劲光 《计算机应用》2014,34(12):3549-3553
为了提高图像检索的速度和准确率,通过分析各种聚类算法在图像检索中的缺点,提出了一种新的划分聚类的图像检索方法。首先对HSV模型非均匀量化,利用改进的颜色聚合向量方法提取图像的颜色特征;然后基于改进的Hu不变矩提取图像的全局形状特征;最后,综合颜色和形状特征对图像基于贡献度聚类并建立特征索引库。利用上述方法在Corel图像库中进行图像检索。实验结果表明,与改进的K-means算法的图像检索算法相比,提出算法的查准率和查全率均有较大提高。  相似文献   

15.
基于马氏距离的FCM图像分割算法   总被引:1,自引:1,他引:0       下载免费PDF全文
基于模糊C均值聚类的图像分割是应用较为广泛的方法之一,但大多数模糊C均值聚类方法都是基于欧式距离,且存在运算时间过长等问题。提出了一种基于Mahalanobis距离的模糊C均值聚类图像分割算法。实验分析表明,提出的算法在保证分割质量的前提下,能较快提高分割速度。实验结果表明了该方法的有效性。  相似文献   

16.
The performance of many supervised and unsupervised learning algorithms is very sensitive to the choice of an appropriate distance metric. Previous work in metric learning and adaptation has mostly been focused on classification tasks by making use of class label information. In standard clustering tasks, however, class label information is not available. In order to adapt the metric to improve the clustering results, some background knowledge or side information is needed. One useful type of side information is in the form of pairwise similarity or dissimilarity information. Recently, some novel methods (e.g., the parametric method proposed by Xing et al.) for learning global metrics based on pairwise side information have been shown to demonstrate promising results. In this paper, we propose a nonparametric method, called relaxational metric adaptation (RMA), for the same metric adaptation problem. While RMA is local in the sense that it allows locally adaptive metrics, it is also global because even patterns not in the vicinity can have long-range effects on the metric adaptation process. Experimental results for semi-supervised clustering based on both simulated and real-world data sets show that RMA outperforms Xing et al.'s method under most situations. Besides applying RMA to semi-supervised learning, we have also used it to improve the performance of content-based image retrieval systems through metric adaptation. Experimental results based on two real-world image databases show that RMA significantly outperforms other methods in improving the image retrieval performance.  相似文献   

17.
An unsupervised competitive learning algorithm based on the classical k-means clustering algorithm is proposed. The proposed learning algorithm called the centroid neural network (CNN) estimates centroids of the related cluster groups in training date. This paper also explains algorithmic relationships among the CNN and some of the conventional unsupervised competitive learning algorithms including Kohonen's self-organizing map and Kosko's differential competitive learning algorithm. The CNN algorithm requires neither a predetermined schedule for learning coefficient nor a total number of iterations for clustering. The simulation results on clustering problems and image compression problems show that CNN converges much faster than conventional algorithms with compatible clustering quality while other algorithms may give unstable results depending on the initial values of the learning coefficient and the total number of iterations.  相似文献   

18.
Auditory scenes are temporal audio segments with coherent semantic content. Automatically classifying and grouping auditory scenes with similar semantics into categories is beneficial for many multimedia applications, such as semantic event detection and indexing. For such semantic categorization, auditory scenes are first characterized with either low-level acoustic features or some mid-level representations like audio effects, and then supervised classifiers or unsupervised clustering algorithms are employed to group scene segments into various semantic categories. In this paper, we focus on the problem of automatically categorizing audio scenes in unsupervised manner. To achieve more reasonable clustering results, we introduce the co-clustering scheme to exploit potential grouping trends among different dimensions of feature spaces (either low-level or mid-level feature spaces), and provide more accurate similarity measure for comparing auditory scenes. Moreover, we also extend the co-clustering scheme with a strategy based on the Bayesian information criterion (BIC) to automatically estimate the numbers of clusters. Evaluation performed on 272 auditory scenes extracted from 12-h audio data shows very encouraging categorization results. Co-clustering achieved a better performance compared to some traditional one-way clustering algorithms, both based on the low-level acoustic features and on the mid-level audio effect representations. Finally, we present our vision regarding the applicability of this approach on general multimedia data, and also show some preliminary results on content-based image clustering.  相似文献   

19.
As a data mining method, clustering, which is one of the most important tools in information retrieval, organizes data based on unsupervised learning which means that it does not require any training data. But, some text clustering algorithms cannot update existing clusters incrementally and, instead, have to recompute a new clustering from scratch. In view of above, this paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation, which starts with each item as a separate cluster. Term-based feature extraction is used for summarizing a cluster in the process. The Comparison Variation measure criterion is also adopted for judging whether the closest pair of clusters can be merged or a previous cluster can be split. And, our incremental clustering method is not sensitive to the input data order. Experimental results show that the performance of our method outperforms k-means, CLIQUE, single linkage clustering and complete linkage clustering, which indicate our new technique is efficient and feasible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号