首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
介绍了一种基于无向超图的多蚁群聚类组合算法,该算法将单蚁群聚类算法的结果聚类组合成多蚁群聚类算法,用无向超图表示,结合超图划分算法Hmetis得到最终的聚类结果。文中给出了实验数据集和实验结果,证明该算法可以提高聚类效果并且减少孤立点。  相似文献   

2.
求解K-means聚类更有效的算法   总被引:1,自引:0,他引:1  
聚类分析是数据挖掘及机器学习领域内的重点问题之一.K-means聚类由于其简羊买用,在聚类划分中是应用最广泛的一种方案.提出了在传统的K-means算法中初始点选取的新方案,对于K-means收敛计算时利用三角不等式,提出了加速收敛过程的改进方案.实验结果表明,改进后的新方法相对于传统K-means聚类所求的结果有较好的聚类划分.  相似文献   

3.
针对模糊文本聚类算法(FCM)对输入顺序以及初始点敏感的问题,提出了一种使用蚁群优化的模糊聚类算法(FACA)。该算法采用蚁群聚类算法(ACA)找到聚类的初始中心点,以解决模糊聚类的输入顺序以及初始点敏感等问题。模糊文本聚类算法的线性复杂度使其更便于在计算机实现。与经典的基本模糊聚类以及蚁群聚类在真实数据集上仿真相比较,结果表明经蚁群优化过的模糊聚类算法(FACA)效果更有效,更适合应用于大型的数据集。  相似文献   

4.
改进的k-平均聚类算法研究   总被引:2,自引:0,他引:2       下载免费PDF全文
孙士保  秦克云 《计算机工程》2007,33(13):200-201
聚类算法的好坏直接影响聚类的效果。该文讨论了经典的k-平均聚类算法,说明了它存在不能很好地处理符号数据和对噪声与孤立点数据敏感等不足,提出了一种基于加权改进的k-平均聚类算法,克服了k-平均聚类算法的缺点,并从理论上分析了该算法的复杂度。实验证明,用该方法实现的数据聚类与传统的基于平均值的方法相比较,能有效提高数据聚类效果。  相似文献   

5.
针对密度峰值聚类算法(DPC)不能自动确定聚类中心,并且聚类中心点与非聚类中心点在决策图上的显示不够明显的问题,设计了一种自动确定聚类中心的比较密度峰值聚类算法(ACPC)。该算法首先利用距离的比较量来代替原距离参数,使潜在的聚类中心在决策图中更加突出;然后通过二维区间估计方法进行对聚类中心的自动选取,从而实现聚类过程的自动化。仿真实验结果表明,在4个合成数据集上ACPC取得了更好的聚类效果;而在真实数据集上的Accuracy指标对比表明,在Iris数据集上,ACPC聚类结果可达到94%,与传统的DPC算法相比提高了27.3%,ACPC解决了交互式选取聚类中心的问题。  相似文献   

6.
樊仲欣  王兴  苗春生 《计算机应用》2019,39(4):1027-1031
为解决利用层次方法的平衡迭代规约和聚类(BIRCH)算法聚类结果依赖于数据对象的添加顺序,且对非球状的簇聚类效果不好以及受簇直径阈值的限制每个簇只能包含数量相近的数据对象的问题,提出一种改进的BIRCH算法。该算法用描述数据对象个体间连通性的连通距离和连通强度阈值替代簇直径阈值,还将簇合并的步骤加入到聚类特征树的生成过程中。在自定义及iris、wine、pendigits数据集上的实验结果表明,该算法比多阈值BIRCH、密度改进BIRCH等现有改进算法的聚类准确率更高,尤其在大数据集上比密度改进BIRCH准确率提高6个百分点,耗时降低61%。说明该算法能够适用于在线实时增量数据,可以识别非球形簇和体积不均匀簇,具有去噪功能,且时间和空间复杂度明显降低。  相似文献   

7.
K-means聚类算法简单快速,应用极为广泛,但是当处理海量数据时,时间效率仍然有待提高.当一个数据点远离一个聚类时,就没必要计算这两者之间的精确距离,以确定该数据点不属于这个类.应用三角不等式原理对其进行了改进,避免了冗余的距离计算.实验结果表明,改进之后在速度上有很大程度的提高,数据规模越大,改进效果越明显,且聚类效果保持了原算法的准确性.  相似文献   

8.
为了解决在面对海量数据时机器学习算法很难在有效时间内完成规定的任务,并且很难有效地处理高维度、海量数据等问题,提出了基于Hadoop分布式平台的谱聚类算法并行化研究。利用MapReduce编程模式,将传统的谱聚类算法进行重新编写;在该平台上用Canopy算法对数据进行预处理,以达到更好的聚类效果。实验结果表明了设计的分布式聚类算法在加速比等方面有良好的性能,并且在数据伸缩率方面效果明显,改进后的算法适合处理海量数据。  相似文献   

9.
多代表点特征树与空间聚类算法   总被引:1,自引:0,他引:1  
空间数据具有海量、复杂、连续、空间自相关、存在缺损与误差等的特点,要求空间聚类算法具有高效率,能处理各种复杂形状的簇,聚类结果与数据空间分布顺序无关,并且对离群点是健壮的等性能,已有的算法难以同时满足要求。本文提出了一个适合处理海量复杂空间数据的数据结构一多代表点特征树。基于多代表点特征树提出了适合挖掘海量复杂空间数据聚类算法CAMFT,该算法利用多代表点特征树对海量的数据进行压缩,结合随机采样的方法进一步增强算法处理海量数据的能力;同时,多代表点特征树能够保存复杂形状的聚类特征,适合处理复杂空间数据。实验表明了算法CAMFT能够快速处理带有离群点的复杂形状聚类的空间数据,结果与对象空间分布顺序无关,并且效率优于已有的同类聚类算法BLRCH与CURE。  相似文献   

10.
一个基于DBSCAN聚类算法的实现   总被引:4,自引:0,他引:4  
谭勇  荣秋生 《计算机工程》2004,30(13):119-121
高密度聚类作为数据挖掘中聚类算法的一种分析方法,它能找到样本比较密集的部分,并且概括出样本相对比较集中的类。分析了传统的聚类算法及局限性,讨论了一个基于高密度聚类算法的实现过程,使得算法可自动发现高维子空间,处理高维数据表格,得到较快的聚类速度和最佳的聚类效果。  相似文献   

11.
The automatic recognition of the modulation format of a detected signal, the intermediate step between signal detection and demodulation, is a major task of an intelligent receiver, with various civilian and military applications. Obviously, with no knowledge of the transmitted data and many unknown parameters at the receiver, such as the signal power, carrier frequency and phase offsets, timing information, etc., blind identification of the modulation is a difficult task. This becomes even more challenging in real world.In this paper I develop a novel algorithm using Two Threshold Sequential Algorithmic Scheme (TTSAS) algorithm and pattern recognition to identify the modulation types of the communication signals automatically. I have proposed and implemented a technique that casts modulation recognition into shape recognition. Constellation diagram is a traditional and powerful tool for design and evaluation of digital modulations. In this paper, modulation classification is performed using constellation of the received signal by fuzzy clustering and consequently hierarchical clustering algorithms are used for classification of Quadrature–Amplitude Modulation (QAM) and Phase Shift Keying (PSK) modulations and also modulated signal symbols constellation utilizing TTSAS clustering algorithm, and matching with standard templates, is used for classification of QAM modulation. TTSAS algorithm used here is implemented by the Hamming neural network. The simulation results show the capability of this method for modulation classification with high accuracy and appropriate convergence in the presence of noise.  相似文献   

12.
压缩频繁序列模式集是针对频繁序列模式的全集太大这个问题的一种解决方法.为了得到高质量的压缩效果,先对频繁序列模式聚簇,再从每个簇中挑选出有代表性的序列模式,使这些有代表性的序列模式的数目尽可能地少.一个贪婪算法和一个基于候选集的快速算法是压缩频繁序列模式集的有效算法.有代表性的序列模式集合是频繁序列模式的一种子集,实验结果表明它能取得很好的压缩效果.  相似文献   

13.
This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data.  相似文献   

14.
聚类分析是一种重要的数据挖掘方法。K-means聚类算法在数据挖掘领域具有非常重要的应用价值。针对K-means需要人工设定聚类个数并且易陷入局部极优的缺陷,提出了一种基于最近共享邻近节点的K-means聚类算法(KSNN)。KSNN在数据集中搜索中心点,依据中心点查找数据集个数,为K-means聚类提供参数。从而克服了K-means需要人工设定聚类个数的问题,同时具有较好的全局收敛性。实验证明KSNN算法比K-means、粒子群K-means(pso)以及多中心聚类算法(MCA)有更好的聚类效果。  相似文献   

15.
传统模糊聚类算法如模糊C-均值(FCM)算法中,用户必须预先指定聚类类别数C,且目标函数收敛速度过慢。为此,将粒度分析原理应用在FCM算法中,提出了基于粒度原理确定聚类类别数的方法,并采用密度函数法初始化聚类中心。实验结果表明,改进后的聚类算法能够得到合理有效的聚类数目,并且与随机初始化相比,迭代次数明显减少,收敛速度明显加快。  相似文献   

16.
In cluster analysis, one of the most challenging and difficult problems is the determination of the number of clusters in a data set, which is a basic input parameter for most clustering algorithms. To solve this problem, many algorithms have been proposed for either numerical or categorical data sets. However, these algorithms are not very effective for a mixed data set containing both numerical attributes and categorical attributes. To overcome this deficiency, a generalized mechanism is presented in this paper by integrating Rényi entropy and complement entropy together. The mechanism is able to uniformly characterize within-cluster entropy and between-cluster entropy and to identify the worst cluster in a mixed data set. In order to evaluate the clustering results for mixed data, an effective cluster validity index is also defined in this paper. Furthermore, by introducing a new dissimilarity measure into the k-prototypes algorithm, we develop an algorithm to determine the number of clusters in a mixed data set. The performance of the algorithm has been studied on several synthetic and real world data sets. The comparisons with other clustering algorithms show that the proposed algorithm is more effective in detecting the optimal number of clusters and generates better clustering results.  相似文献   

17.
Clustering is an important research topic that has practical applications in many fields. It has been demonstrated that fuzzy clustering, using algorithms such as the fuzzy C-means (FCM), has clear advantages over crisp and probabilistic clustering methods. Like most clustering algorithms, however, FCM and its derivatives need the number of clusters in the given data set as one of their initializing parameters. The main goal of this paper is to develop an effective fuzzy algorithm for automatically determining the number of clusters. After a brief review of the relevant literature, we present a new algorithm for determining the number of clusters in a given data set and a new validity index for measuring the “goodness” of clustering. Experimental results and comparisons are given to illustrate the performance of the new algorithm.  相似文献   

18.
In this paper, we present an agglomerative fuzzy $k$-means clustering algorithm for numerical data, an extension to the standard fuzzy $k$-means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters in a data set, which is a well known problem in $k$-means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20000 objects and 100 clusters, and the WINE data set of 178 objects, 17 dimensions and 3 clusters from UCI, have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.  相似文献   

19.
王勇  唐靖  饶勤菲  袁巢燕 《计算机应用》2014,34(5):1331-1335
针对K-means聚类算法通常无法事先设定聚类数,而人为设定初始聚类数目容易导致聚类结果不够稳定的问题,提出一种新的高效率的K-means最佳聚类数确定算法。该算法通过样本数据分层来得到聚类数搜索范围的上界,并设计了一种聚类有效性指标来评价聚类后类内与类间的相似性程度,从而在聚类数搜索范围内获得最佳聚类数。仿真实验结果表明,该算法能够快速、高效地获得最佳聚类数,对数据集聚类效果良好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号