首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 437 毫秒
1.
多层自动确定类别的谱聚类算法   总被引:1,自引:0,他引:1  
金慧珍  赵辽英 《计算机应用》2008,28(5):1229-1231
自动确定聚类数和海量数据的处理是谱聚类的关键问题。在自动确定聚类数谱聚类算法的基础上,提出了一种能处理大规模数据集的多层算法。该算法的核心思想是把大规模数据集根据一定的相关性逐级进行合并,使之成为小数据集,再对分组后的小数据集用自动确定类别的谱聚类算法聚类,最后逐层进行拆分并微调, 完成全部数据的聚类。实验证明该算法的聚类效果很好。  相似文献   

2.
聚类分组数的自动确定是谱聚类算法中一个亟待解决的问题.针对谱聚类算法聚类分组数的获取问题,提出一种基于人工免疫的自适应谱聚类算法.该算法通过模拟抗体的克隆选择机制和免疫系统的初次免疫应答、二次免疫应答机制,实现了数据样本聚类分组数的自动调整,解决了聚类算法需要人工输入聚类分组数的弊端.并分别在线性模拟数据、非凸模拟数据和UCI数据集上验证了算法的可行性、算法在非凸数据集上的优势以及算法的有效性.实验结果表明该算法可以自动获取正确的聚类分组数,提高聚类效果,减少达到全局最优解时的迭代次数,具有较高的稳定性.  相似文献   

3.
张妨妨  钱雪忠 《计算机应用》2012,32(9):2476-2479
针对传统GK聚类算法无法自动确定聚类数和对初始聚类中心比较敏感的缺陷,提出一种改进的GK聚类算法。该算法首先通过基于类间分离度和类内紧致性的权和的新有效性指标来确定最佳聚类数;然后,利用改进的熵聚类的思想来确定初始聚类中心;最后,根据判定出的聚类数和新的聚类中心进行聚类。实验结果表明,新指标能准确地判断出类间有交叠的数据集的最佳聚类数,且改进后的算法具有更高的聚类准确率。  相似文献   

4.
K-means算法最佳聚类数确定方法   总被引:10,自引:0,他引:10  
K-means聚类算法是以确定的类数k为前提对数据集进行聚类的,通常聚类数事先无法确定。从样本几何结构的角度设计了一种新的聚类有效性指标,在此基础上提出了一种新的确定K-means算法最佳聚类数的方法。理论研究和实验结果验证了以上算法方案的有效性和良好性能。  相似文献   

5.
《计算机科学与探索》2016,(11):1614-1622
密度峰聚类是一种新的基于密度的聚类算法,该算法不需要预先指定聚类数目,能够发现非球形簇。针对密度峰聚类算法需要人工确定聚类中心的缺陷,提出了一种自动确定聚类中心的密度峰聚类算法。首先,计算每个数据点的局部密度和该点到具有更高密度数据点的最短距离;其次,根据排序图自动确定聚类中心;最后,将剩下的每个数据点分配到比其密度更高且距其最近的数据点所属的类别,并根据边界密度识别噪声点,得到聚类结果。将新算法与原密度峰算法进行对比,在人工数据集和UCI数据集上的实验表明,新算法不仅能够自动确定聚类中心,而且具有更高的准确率。  相似文献   

6.
在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。  相似文献   

7.
一种基于粗糙集理论的谱聚类算法   总被引:1,自引:1,他引:0  
谱聚类算法利用特征向量构造简化的数据空间,在降低数据维数的同时,使得数据在子空间中的分布结构更加明显.现有谱聚类算法的聚类结果多为精确集,而真实数据集中重叠现象广泛存在.基于粗糙集理论提出了一种新的谱聚类算法,其主要思想是对谱聚类算法进行粗糙集扩展,使得聚类结果成为具有下近似和上近似定义的、类与类之间存在重叠区域的结构.实验表明,该算法与现有的谱聚类算法相比,稳定性和准确率都有一定的提高.  相似文献   

8.
针对大数据环境下高维数据聚类速度慢、准确率低的问题,提出了一种面向大数据的快速自动聚类算法(FACABD)。FACABD聚类算法利用谱聚类算法对大数据集进行归一化和列降维,提出了一种新的快速区域进化的粒子群算法(FRE-PSO),并利用该算法进行行降维;然后在降维处理后的数据基础上,引入聚类模糊隶属度基数,自动发现簇的数目,根据类簇数目,采用FRE-PSO算法结合模糊聚类算法快速完成自动聚类。在人工生成数据集和UCI机器学习数据集上的实验结果表明,该算法能够在数据驱动下快速自动聚类,有效地提高了运行速度和精度。  相似文献   

9.
李小红  罗敏 《计算机科学》2012,39(9):162-165
提出了一种新的基于图划分的聚类算法——GAGPBCUK算法。该算法解决了谱聚类算法参数敏感和聚类结果不准确等问题。3组仿真实验结果表明,GAGPBCUK算法不仅在识别和学习数据集中的隐含聚类数方面具有很好的性能,而且能够得到比谱聚类算法(NJW算法)更加有效的聚类结果。  相似文献   

10.
谱聚类算法已得到机器学习领域的广泛关注,其算法思想来源于谱图理论,通过矩阵的特征分解获得数据的低维嵌入,并用于后续聚类中。介绍了谱聚类方法的基本原理和算法思想,指出现有的谱聚类算法中存在初始化敏感、如何自动确定聚类分组数以及如何降低问题复杂度等问题,并针对存在的问题提出了相应的解决方法。  相似文献   

11.
李斌  狄岚  王少华  于晓瞳 《计算机应用》2016,36(7):1981-1987
传统的核聚类仅考虑了类内元素的关系而忽略了类间的关系,对边界模糊或边界存在噪声点的数据集进行聚类分析时,会造成边界点的误分问题。为解决上述问题,在核模糊C均值(KFCM)聚类算法的基础上提出了一种基于改进核模糊C均值类间极大化聚类(MKFCM)算法。该算法考虑了类内元素和类间元素的联系,引入了高维特征空间的类间极大惩罚项和调控因子,拉大类中心间的距离,使得边界处的样本得到了较好的划分。在各模拟数据集的实验中,该算法在类中心的偏移距离相对其他算法均有明显降低。在人造高斯数据集的实验中,该算法的精度(ACC)、归一化互信息(NMI)、芮氏指标(RI)指标分别提升至0.9132,0.7575,0.9138。  相似文献   

12.
As we are in the big data age, graph data such as user networks in Facebook and Flickr becomes large. How to reduce the visual complexity of a graph layout is a challenging problem. Clustering graphs is regarded as one of effective ways to address this problem. Most of current graph visualization systems, however, directly use existing clustering algorithms that are not originally developed for the visualization purpose. For graph visualization, a clustering algorithm should meet specific requirements such as the sufficient size of clusters, and automatic determination of the number of clusters. After identifying the requirements of clustering graphs for visualization, in this paper we present a new clustering algorithm that is particularly designed for visualization so as to reduce the visual complexity of a layout, together with a strategy for improving the scalability of our algorithm. Experiments have demonstrated that our proposed algorithm is capable of detecting clusters in a way that is required in graph visualization.  相似文献   

13.
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster density, we propose that the clustering validity index be used not only globally to find optimal partitions of input data, but also locally to determine which two neighboring clusters are to be merged in a hierarchical clustering of Self-Organizing Map (SOM). A new two-level SOM-based clustering algorithm using the clustering validity index is also proposed. Experimental results on synthetic and real data sets demonstrate that the proposed clustering algorithm is able to cluster data in a better way than classical clustering algorithms on an SOM.  相似文献   

14.
The self-organizing map (SOM) has been widely used in many industrial applications. Classical clustering methods based on the SOM often fail to deliver satisfactory results, specially when clusters have arbitrary shapes. In this paper, through some preprocessing techniques for filtering out noises and outliers, we propose a new two-level SOM-based clustering algorithm using a clustering validity index based on inter-cluster and intra-cluster density. Experimental results on synthetic and real data sets demonstrate that the proposed clustering algorithm is able to cluster data better than the classical clustering algorithms based on the SOM, and find an optimal number of clusters.  相似文献   

15.
Discovering interesting patterns or substructures in data streams is an important challenge in data mining. Clustering algorithms are very often applied to identify single substructures although they are designed to partition a data set. Another problem of clustering algorithms is that most of them are not designed for data streams. This paper discusses a recently introduced procedure that deals with both problems. The procedure explores ideas from cluster analysis, but was designed to identify single clusters without the necessity to partition the whole data set into clusters. The new extended version of the algorithm is an incremental clustering approach applicable to stream data. It identifies new clusters formed by the incoming data and updates the data space partition. Clustering of artificial and real data sets illustrates the abilities of the proposed method.  相似文献   

16.
In this research, we propose two new clustering algorithms, the improved competitive learning network (ICLN) and the supervised improved competitive learning network (SICLN), for fraud detection and network intrusion detection. The ICLN is an unsupervised clustering algorithm, which applies new rules to the standard competitive learning neural network (SCLN). The network neurons in the ICLN are trained to represent the center of the data by a new reward-punishment update rule. This new update rule overcomes the instability of the SCLN. The SICLN is a supervised version of the ICLN. In the SICLN, the new supervised update rule uses the data labels to guide the training process to achieve a better clustering result. The SICLN can be applied to both labeled and unlabeled data and is highly tolerant to missing or delay labels. Furthermore, the SICLN is capable to reconstruct itself, thus is completely independent from the initial number of clusters.To assess the proposed algorithms, we have performed experimental comparisons on both research data and real-world data in fraud detection and network intrusion detection. The results demonstrate that both the ICLN and the SICLN achieve high performance, and the SICLN outperforms traditional unsupervised clustering algorithms.  相似文献   

17.
In this paper, a new approach for fault detection and isolation that is based on the possibilistic clustering algorithm is proposed. Fault detection and isolation (FDI) is shown here to be a pattern classification problem, which can be solved using clustering and classification techniques. A possibilistic clustering based approach is proposed here to address some of the shortcomings of the fuzzy c-means (FCM) algorithm. The probabilistic constraint imposed on the membership value in the FCM algorithm is relaxed in the possibilistic clustering algorithm. Because of this relaxation, the possibilistic approach is shown in this paper to give more consistent results in the context of the FDI tasks. The possibilistic clustering approach has also been used to detect novel fault scenarios, for which the data was not available while training. Fault signatures that change as a function of the fault intensities are represented as fault lines, which have been shown to be useful to classify faults that can manifest with different intensities. The proposed approach has been validated here through simulations involving a benchmark quadruple tank process and also through experimental case studies on the same setup. For large scale systems, it is proposed to use the possibilistic clustering based approach in the lower dimensional approximations generated by algorithms such as PCA. Towards this end, finally, we also demonstrate the key merits of the algorithm for plant wide monitoring study using a simulation of the benchmark Tennessee Eastman problem.  相似文献   

18.
半监督的自动聚类   总被引:1,自引:0,他引:1  
潘章明 《计算机应用》2010,30(10):2614-2617
基于进化算法的自动聚类方法在处理聚类结构比较松散的数据集时,存在聚类准确性不高、收敛速度慢的缺陷,为此提出一种半监督的自动聚类算法。该算法从调整染色体的解码过程入手,首先从染色体中分离出聚类数和所有的质心,然后使用最近邻规则滤去部分偏离数据集分布区域的无效质心,最后嵌入先验信息辅助K-均值方法对剩余的质心聚类,进一步优化染色体的解码结果。实验结果表明,该算法对聚类结构紧密或松散的数据集均可给出较精确的聚类结果。  相似文献   

19.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号