首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 93 毫秒
1.
朱林  雷景生  毕忠勤  杨杰 《软件学报》2013,24(11):2610-2627
针对高维数据的聚类研究表明,样本在不同数据簇往往与某些特定的数据特征子集相对应.因此,子空间聚类技术越来越受到关注.然而,现有的软子空间聚类算法都是基于批处理技术的聚类算法,不能很好地应用于高维数据流或大规模数据的聚类研究中.为此,利用模糊可扩展聚类框架,与熵加权软子空间聚类算法相结合,提出了一种有效的熵加权流数据软子空间聚类算法——EWSSC(entropy-weighting streaming subspace clustering).该算法不仅保留了传统软子空间聚类算法的特性,而且利用了模糊可扩展聚类策略,将软子空间聚类算法应用于流数据的聚类分析中.实验结果表明,EWSSC 算法对于高维数据流可以得到与批处理软子空间聚类方法近似一致的实验结果.  相似文献   

2.
杨辉  彭晗  朱建勇  聂飞平 《计算机仿真》2021,38(8):328-332,343
谱聚类可以任意形状的数据进行聚类,在聚类集成中能够有效的提高基聚类的质量.以往的聚类集成算法中,聚类集成得到的结果并不是最终聚类结果,还需要利用聚类算法来获得最终聚类结果,在整个过程中会使得解由离散-连续-离散的转变.提出了一种基于谱聚类的双边聚类集成算法.算法首先在生成阶段使用谱聚类算法来获得基聚类,通过标准互信息来选取基聚类.将选出来基聚类和样本作为图的顶点,并对构建的图利用双边聚类算法对基聚类和样本同时聚类直接得到最终聚类结果.在实验中,将所提方法与一些聚类集成算法进行了比较,取得了较好的结果.  相似文献   

3.
分布式环境中聚类问题算法研究综述   总被引:1,自引:0,他引:1  
传统的集中式聚类是对集中存放在单个站点的数据集进行聚类,但不能解决数据分布存储环境下的聚类问题,而分布式聚类算法是从分布存储的数据集中提取分类模式,因此能满足此需求。针对分布式聚类算法进行综述和分析。首先对现有的分布式聚类算法进行了分类,然后对每类算法的基本思想和优缺点进行了比较,最后采用Iris和Wine两个数据集对几种分布式聚类算法从聚类精度和聚类时间两方面进行了比较。  相似文献   

4.
朱永红 《微机发展》2007,17(1):123-124
聚类算法是数据挖掘的核心技术。介绍了几类主要的传统聚类算法,给出了每类算法的基本概念、基本原理、各类表示聚类的算法以及这些算法的特征。然后再提出了一种新的聚类算法———覆盖聚类算法,给出了该算法的具体步骤,并对模糊聚类算法和该算法用实验的方式进行比较,证明了覆盖聚类算法的可行性和有效性。最后分析了当前聚类算法存在的问题和发展方向。  相似文献   

5.
聚类算法是数据挖掘的核心技术。介绍了几类主要的传统聚类算法,给出了每类算法的基本概念、基本原理、各类表示聚类的算法以及这些算法的特征。然后再提出了一种新的聚类算法——覆盖聚类算法,给出了该算法的具体步骤,并对模糊聚类算法和该算法用实验的方式进行比较,证明了覆盖聚类算法的可行性和有效性。最后分析了当前聚类算法存在的问题和发展方向。  相似文献   

6.
基于扩展和网格的多密度聚类算法   总被引:6,自引:1,他引:6  
邱保志  沈钧毅 《控制与决策》2006,21(9):1011-1014
提出了网格密度可达的聚类概念和边界处理技术,并在此基础上提出一种基于扩展的多密度网格聚类算法。该算法使用网格技术提高聚类的速度,使用边界处理技术提高聚类的精度,每次聚类均从最高的密度单元开始逐步向周围扩展形成聚类.实验结果表明,该算法能有效地对多密度数据集和均匀密度数据集进行聚类,具有聚类精度高等优点.  相似文献   

7.
聚类分析技术是数据挖据中的一种重要技术.本文介绍了数据挖掘对聚类的典型要求和聚类方法的分类,研究分析了聚类的主要算法,并从多个方面对这些算法的性能进行比较.  相似文献   

8.
传统的快速聚类算法大多基于模糊C均值算法(Fuzzy C-means,FCM),而FCM对初始聚类中心敏感,对噪音数据敏感并且容易收敛到局部极小值,因而聚类准确率不高。可能性C-均值聚类较好地解决了FCM对噪声敏感的问题,但容易产生一致性聚类。将FCM和可能性C-均值聚类结合的聚类算法较好地解决了一致性聚类问题。为进一步提高算法收敛速度和鲁棒性,提出一种基于核的快速可能性聚类算法。该方法引入核聚类的思想,同时使用样本方差对目标函数中参数η进行优化。标准数据集和人造数据集的实验结果表明这种基于核的快速可能性聚类算法提高了算法的聚类准确率,加快了收敛速度。  相似文献   

9.
章永来  周耀鉴 《计算机应用》2019,39(7):1869-1882
大数据时代,聚类这种无监督学习算法的地位尤为突出。近年来,对聚类算法的研究取得了长足的进步。首先,总结了聚类分析的全过程、相似性度量、聚类算法的新分类及其结果的评价等内容,将聚类算法重新划分为大数据聚类与小数据聚类两个大类,并特别对大数据聚类作了较为系统的分析与总结。此外,概述并分析了各类聚类算法的研究进展及其应用概况,并结合研究课题讨论了算法的发展趋势。  相似文献   

10.
一种改进的多视图聚类集成算法   总被引:1,自引:0,他引:1  
邓强  杨燕  王浩 《计算机科学》2017,44(1):65-70
近年来,针对大数据的数据挖掘技术和机器学习算法研究变得日趋重要。在聚类领域,随着多视图数据的大量出现,多视图聚类已经成为了一类重要的聚类方法。然而,大多数现有的多视图聚类算法受算法参数设置、数据样本等影响,具有聚类结果不稳定、参数需要反复调节等缺点。基于多视图K-means算法和聚类集成技术,提出了一种改进的多视图聚类集成算法,其提高了聚类的准确性、鲁棒性和稳定性。其次,由于单机环境下的多视图聚类算法难以对海量的数据进行处理,结合分布式处理技术,实现了一种分布式的多视图并行聚类算法。实验证明,并行算法在处理大数据时的时间效率有很大提升,适合于大数据环境下的多视图聚类分析。  相似文献   

11.
Clustering is the process of grouping objects that are similar, where similarity between objects is usually measured by a distance metric. The groups formed by a clustering method are referred as clusters. Clustering is a widely used activity with multiple applications ranging from biology to economics. Each clustering technique has some advantages and disadvantages. Some clustering algorithms may even require input parameters which strongly affect the result. In most cases, it is not possible to choose the best distance metric, the best clustering method, and the best input argument values for an input data set. Therefore, multiple clusterings can be obtained by several distance metrics, several clustering methods, and several input argument values. And, multiple clusterings can be combined into a new and better quality final clustering. We propose a family of combining multiple clustering algorithms that are memory efficient, scalable, robust, and intuitive. Our new algorithms offer tremendous speed gain and low memory requirements by working at cluster level, while producing very good quality final clusters. Extensive experimental evaluations on some very challenging artificially generated and real data sets from a diverse set of domains establish the usefulness of our methods.  相似文献   

12.
The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.  相似文献   

13.
聚类分析是数据挖掘领域中一个重要研究内容,谱聚类(Spectral Clustering, SC)由于具有计算简便,性能优越等特点,已经成为最流行的聚类算法之一。本文利用四类几何结构数据,对规范化割(Normalized Cut, NCUT)、稀疏子空间聚类(Sparse subspace clustering, SSC)和谱曲率聚类(Spectral Curvature Clustering, SCC)三种谱聚类算法进行了分析和比较。实验结果表明,针对本文实验数据三种算法的聚类结果各有差异,但每类数据都可以找到相对最有效的聚类算法,方便读者对算法的选择和使用。NCUT无法处理相交的数据,适用性较差,但对于不相交的二次曲线聚类精度较高,并且优于SSC和SCC算法;相比NCUT算法,SSC算法适用性较强,能够实现四类几何结构数据的聚类,但在聚类过程中常出现误分现象,导致聚类精度不高;与前两种算法相比,SCC算法具有适用性强,精度高等特点,能够实现四类几何结构数据有效聚类,尤其对于实验数据中“横”和“竖”两类点组成的十字,SCC算法能够得到较好的聚类结果,解决由于数据量大SSC算法无法处理的问题。此外,针对有数据间断的两条相交螺旋线聚类问题,本文在现有SCC算法基础上进行改进,结果表明,改进后算法能够有效地实现数据聚类,具有良好的实用性。最后,文章分析了现有SCC算法存在的不足,并指出进一步研究的方向。  相似文献   

14.
Clustering is an important data mining problem. However, most earlier work on clustering focused on numeric attributes which have a natural ordering to their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received more attention. A common issue in cluster analysis is that there is no single correct answer to the number of clusters, since cluster analysis involves human subjective judgement. Interactive visualization is one of the methods where users can decide a proper clustering parameters. In this paper, a new clustering approach called CDCS (Categorical Data Clustering with Subjective factors) is introduced, where a visualization tool for clustered categorical data is developed such that the result of adjusting parameters is instantly reflected. The experiment shows that CDCS generates high quality clusters compared to other typical algorithms.  相似文献   

15.
In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique.  相似文献   

16.
This paper presents a new partitioning algorithm, designated as the Adaptive C-Populations (ACP) clustering algorithm, capable of identifying natural subgroups and influential minor prototypes in an unlabeled dataset. In contrast to traditional Fuzzy C-Means clustering algorithms, which partition the whole dataset equally, adaptive clustering algorithms, such as that presented in this study, identify the natural subgroups in unlabeled datasets. In this paper, data points within a small, dense region located at a relatively large distance from any of the major cluster centers are considered to form a minor prototype. The aim of ACP is to adaptively separate these isolated minor clusters from the major clusters in the dataset. The study commences by introducing the mathematical model of the proposed ACP algorithm and demonstrates its convergence to a stable solution. The ability of ACP to detect minor prototypes is confirmed via its application to the clustering of three different datasets with different sizes and characteristics.  相似文献   

17.
聚类分析是数据挖掘中的一个重要研究内容。按照数据对象间的关系进行聚类在许多情况具有特殊的意义。提出一种相容关系数据对象的聚类算法。该算法首先对每个数据对象按字典排序,利用相容集的反单调性性质来产生极大相容簇,即通过相容集的连接产生更高层的相容集的候选,再通过剪枝的方法来得到更高层的相容集。该方法可以有效压缩算法的搜索空间,是现有相容关系聚类算法的有益改进和补充。  相似文献   

18.
In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters; this is known as the automatic clustering problem. Because of lack of prior domain knowledge, it is difficult to choose an appropriate number of clusters, especially when the data have many dimensions, when clusters differ widely in shape, size, and density, and when overlapping exists among groups. In the late 1990s, the automatic clustering problem gave rise to a new era in cluster analysis with the application of nature-inspired metaheuristics. Since then, researchers have developed several new algorithms in this field. This paper presents an up-to-date review of all major nature-inspired metaheuristic algorithms used thus far for automatic clustering. Also, the main components involved during the formulation of metaheuristics for automatic clustering are presented, such as encoding schemes, validity indices, and proximity measures. A total of 65 automatic clustering approaches are reviewed, which are based on single-solution, single-objective, and multiobjective metaheuristics, whose usage percentages are 3%, 69%, and 28%, respectively. Single-objective clustering algorithms are adequate to efficiently group linearly separable clusters. However, a strong tendency in using multiobjective algorithms is found nowadays to address non-linearly separable problems. Finally, a discussion and some emerging research directions are presented.  相似文献   

19.
层次聚类算法的改进及分析   总被引:2,自引:0,他引:2  
层次凝聚算法是一个非常有用的聚类算法,它在迭代地凝聚每次接近对直到所有的数据都属于同一个簇.但层次聚类也存在着几个缺点,如聚类时的时空复杂性高;聚类的簇效率低、误差较大等.经验研究表明,大部分HAC算法都有这样一个趋势:除了在谱系图的顶层,所有低层聚类的簇都是比较小的并且很接近于其他的簇,提出了一种改进算法能够减小时空复杂性并能验证其正确性,分析与实验都证明这种方法是非常有效的.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号