首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
朱林  雷景生  毕忠勤  杨杰 《软件学报》2013,24(11):2610-2627
针对高维数据的聚类研究表明,样本在不同数据簇往往与某些特定的数据特征子集相对应.因此,子空间聚类技术越来越受到关注.然而,现有的软子空间聚类算法都是基于批处理技术的聚类算法,不能很好地应用于高维数据流或大规模数据的聚类研究中.为此,利用模糊可扩展聚类框架,与熵加权软子空间聚类算法相结合,提出了一种有效的熵加权流数据软子空间聚类算法——EWSSC(entropy-weighting streaming subspace clustering).该算法不仅保留了传统软子空间聚类算法的特性,而且利用了模糊可扩展聚类策略,将软子空间聚类算法应用于流数据的聚类分析中.实验结果表明,EWSSC 算法对于高维数据流可以得到与批处理软子空间聚类方法近似一致的实验结果.  相似文献   

2.
杨辉  彭晗  朱建勇  聂飞平 《计算机仿真》2021,38(8):328-332,343
谱聚类可以任意形状的数据进行聚类,在聚类集成中能够有效的提高基聚类的质量.以往的聚类集成算法中,聚类集成得到的结果并不是最终聚类结果,还需要利用聚类算法来获得最终聚类结果,在整个过程中会使得解由离散-连续-离散的转变.提出了一种基于谱聚类的双边聚类集成算法.算法首先在生成阶段使用谱聚类算法来获得基聚类,通过标准互信息来选取基聚类.将选出来基聚类和样本作为图的顶点,并对构建的图利用双边聚类算法对基聚类和样本同时聚类直接得到最终聚类结果.在实验中,将所提方法与一些聚类集成算法进行了比较,取得了较好的结果.  相似文献   

3.
自适应仿射传播聚类   总被引:42,自引:4,他引:42  
王开军  张军英  李丹  张新娜  郭涛 《自动化学报》2007,33(12):1242-1246
适合处理大类数的仿射传播聚类有两个尚未解决的问题: 一是很难确定偏向参数取何值能够使算法产生最优的聚类结果; 另一个是当震荡发生后算法不能自动消除震荡并收敛. 为了解决这两个问题, 提出了自适应仿射传播聚类方法, 具体技术包括: 自适应扫描偏向参数空间来搜索聚类个数空间以寻找最优聚类结果、自适应调整阻尼因子来消除震荡以及当调整阻尼因子方法失效时的自适应逃离震荡技术. 与原算法相比, 自适应仿射传播聚类方法性能更优, 能够自动消除震荡和寻找最优聚类结果. 对模拟和真实数据集的实验结果表明, 自适应仿射传播聚类方法十分有效, 其聚类质量优于或不低于原算法.  相似文献   

4.
结合密度聚类和模糊聚类的特点,提出一种基于密度的模糊代表点聚类算法.首先利用密度对数据点成为候选聚类中心点的可能性进行处理,密度越高的点成为聚类中心点的可能性越大;然后利用模糊方法对聚类中心点进行确定;最后通过合并聚类中心点确定最终的聚类中心.所提出算法具有很好的自适应性,能够处理不同形状的聚类问题,无需提前规定聚类个数,能够自动确定真实存在的聚类中心点,可解释性好.通过结合不同聚类方法的优点,最终实现对数据的有效划分.此外,所提出的算法对于聚类数和初始化、处理不同形状的聚类问题以及应对异常值等方面具有较好的鲁棒性.通过在人工数据集和UCI真实数据集上进行实验,表明所提出算法具有较好的聚类性能和广泛的适用性.  相似文献   

5.
多视角聚类通过利用多视角之间的互补性和一致性信息来提高聚类的性能.近年来受到越来越多的关注.为了及时掌握目前基于图的多视角聚类算法的研究现状与最新技术,对大量的、最新的多视角图聚类进行调查、归纳整理、分类及总结.根据多视角聚类涉及的算法机制和数学原理,并进一步分为基于图、基于网络和基于谱的聚类方法.不仅详细介绍了每一类...  相似文献   

6.
分布式环境中聚类问题算法研究综述   总被引:1,自引:0,他引:1  
传统的集中式聚类是对集中存放在单个站点的数据集进行聚类,但不能解决数据分布存储环境下的聚类问题,而分布式聚类算法是从分布存储的数据集中提取分类模式,因此能满足此需求。针对分布式聚类算法进行综述和分析。首先对现有的分布式聚类算法进行了分类,然后对每类算法的基本思想和优缺点进行了比较,最后采用Iris和Wine两个数据集对几种分布式聚类算法从聚类精度和聚类时间两方面进行了比较。  相似文献   

7.
聚类是发现数据分布和隐含模式的一项重要技术,但单一聚类算法却很难达到预期的效果.在缺乏样本集先验知识的前提下,目前的分类融合技术很难应用到聚类技术中,导致聚类融合技术起步很晚.近几年的研究发现,聚类融合方法对提高聚类算法的稳定性和高效性发挥了重要的作用.文中对近年来聚类融合的方法和国内外研究现状进行了简单综述,并且以基于投票的聚类融合算法为例,实验证明了其比单一聚类算法的优越性,展望了聚类融合算法的未来.  相似文献   

8.
戈国华  肖海波  张敏 《福建电脑》2007,(4):89-89,126
聚类是数据挖掘中常用的数据分析技术.本文详细介绍了FCM聚类算法的理论和实现步骤.并用Matlab演示了FCM用于数据聚类.结果表明FCM算法是一种高效的数据聚类算法,有很广泛的应用.  相似文献   

9.
大规模数据集聚类中的数据分区及应用研究   总被引:1,自引:0,他引:1  
针对大型数据库提出了许多聚类方法,但是这些算法往往计算量较大、对主存的要求较高;而且当数据分布不均匀时,算法的聚类质量会受影响.因此为了提高聚类算法的效率和准确性,采用了数据分区技术首先对数据进行预处理,分区后的数据具有更少的数据量和更均匀的数据分布.  相似文献   

10.
研究文本聚类问题.传统的文本聚类算法存在着假设各特征词对聚类结果影响相同,聚类准确率较低的缺陷.还有一些算法通过加权的方法,能赋予重要特征词较大的权重,却造成了算法时间复杂度的增加.为解决上述问题,提出了一种新的属性加权模糊C均值文本聚类算法.算法能在迭代过程中标注出每一特征词的权重,却不影响算法的执行效率.使得类内距离之和较小的属性,权值较大;反之则权值较小.经多次仿真证明,提出的文本聚类算法在运算速度、准确率和标注不同属性的重要程度方面都有一定的优势.为文档自动文摘、数字图书馆服务和文档集合自动整理等系统的设计提供了可靠的依据.  相似文献   

11.
聚类分析技术是数据挖据中的一种重要技术。本文介绍了数据挖掘对聚类的典型要求和聚类方法的分类,研究分析了聚类的主要算法.并从多个方面对这些算法的性能进行比较。  相似文献   

12.
Clustering is the process of grouping objects that are similar, where similarity between objects is usually measured by a distance metric. The groups formed by a clustering method are referred as clusters. Clustering is a widely used activity with multiple applications ranging from biology to economics. Each clustering technique has some advantages and disadvantages. Some clustering algorithms may even require input parameters which strongly affect the result. In most cases, it is not possible to choose the best distance metric, the best clustering method, and the best input argument values for an input data set. Therefore, multiple clusterings can be obtained by several distance metrics, several clustering methods, and several input argument values. And, multiple clusterings can be combined into a new and better quality final clustering. We propose a family of combining multiple clustering algorithms that are memory efficient, scalable, robust, and intuitive. Our new algorithms offer tremendous speed gain and low memory requirements by working at cluster level, while producing very good quality final clusters. Extensive experimental evaluations on some very challenging artificially generated and real data sets from a diverse set of domains establish the usefulness of our methods.  相似文献   

13.
The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.  相似文献   

14.
聚类分析是数据挖掘领域中一个重要研究内容,谱聚类(Spectral Clustering, SC)由于具有计算简便,性能优越等特点,已经成为最流行的聚类算法之一。本文利用四类几何结构数据,对规范化割(Normalized Cut, NCUT)、稀疏子空间聚类(Sparse subspace clustering, SSC)和谱曲率聚类(Spectral Curvature Clustering, SCC)三种谱聚类算法进行了分析和比较。实验结果表明,针对本文实验数据三种算法的聚类结果各有差异,但每类数据都可以找到相对最有效的聚类算法,方便读者对算法的选择和使用。NCUT无法处理相交的数据,适用性较差,但对于不相交的二次曲线聚类精度较高,并且优于SSC和SCC算法;相比NCUT算法,SSC算法适用性较强,能够实现四类几何结构数据的聚类,但在聚类过程中常出现误分现象,导致聚类精度不高;与前两种算法相比,SCC算法具有适用性强,精度高等特点,能够实现四类几何结构数据有效聚类,尤其对于实验数据中“横”和“竖”两类点组成的十字,SCC算法能够得到较好的聚类结果,解决由于数据量大SSC算法无法处理的问题。此外,针对有数据间断的两条相交螺旋线聚类问题,本文在现有SCC算法基础上进行改进,结果表明,改进后算法能够有效地实现数据聚类,具有良好的实用性。最后,文章分析了现有SCC算法存在的不足,并指出进一步研究的方向。  相似文献   

15.
Clustering is an important data mining problem. However, most earlier work on clustering focused on numeric attributes which have a natural ordering to their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received more attention. A common issue in cluster analysis is that there is no single correct answer to the number of clusters, since cluster analysis involves human subjective judgement. Interactive visualization is one of the methods where users can decide a proper clustering parameters. In this paper, a new clustering approach called CDCS (Categorical Data Clustering with Subjective factors) is introduced, where a visualization tool for clustered categorical data is developed such that the result of adjusting parameters is instantly reflected. The experiment shows that CDCS generates high quality clusters compared to other typical algorithms.  相似文献   

16.
In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique.  相似文献   

17.
This paper presents a new partitioning algorithm, designated as the Adaptive C-Populations (ACP) clustering algorithm, capable of identifying natural subgroups and influential minor prototypes in an unlabeled dataset. In contrast to traditional Fuzzy C-Means clustering algorithms, which partition the whole dataset equally, adaptive clustering algorithms, such as that presented in this study, identify the natural subgroups in unlabeled datasets. In this paper, data points within a small, dense region located at a relatively large distance from any of the major cluster centers are considered to form a minor prototype. The aim of ACP is to adaptively separate these isolated minor clusters from the major clusters in the dataset. The study commences by introducing the mathematical model of the proposed ACP algorithm and demonstrates its convergence to a stable solution. The ability of ACP to detect minor prototypes is confirmed via its application to the clustering of three different datasets with different sizes and characteristics.  相似文献   

18.
聚类分析是数据挖掘中的一个重要研究内容。按照数据对象间的关系进行聚类在许多情况具有特殊的意义。提出一种相容关系数据对象的聚类算法。该算法首先对每个数据对象按字典排序,利用相容集的反单调性性质来产生极大相容簇,即通过相容集的连接产生更高层的相容集的候选,再通过剪枝的方法来得到更高层的相容集。该方法可以有效压缩算法的搜索空间,是现有相容关系聚类算法的有益改进和补充。  相似文献   

19.
In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters; this is known as the automatic clustering problem. Because of lack of prior domain knowledge, it is difficult to choose an appropriate number of clusters, especially when the data have many dimensions, when clusters differ widely in shape, size, and density, and when overlapping exists among groups. In the late 1990s, the automatic clustering problem gave rise to a new era in cluster analysis with the application of nature-inspired metaheuristics. Since then, researchers have developed several new algorithms in this field. This paper presents an up-to-date review of all major nature-inspired metaheuristic algorithms used thus far for automatic clustering. Also, the main components involved during the formulation of metaheuristics for automatic clustering are presented, such as encoding schemes, validity indices, and proximity measures. A total of 65 automatic clustering approaches are reviewed, which are based on single-solution, single-objective, and multiobjective metaheuristics, whose usage percentages are 3%, 69%, and 28%, respectively. Single-objective clustering algorithms are adequate to efficiently group linearly separable clusters. However, a strong tendency in using multiobjective algorithms is found nowadays to address non-linearly separable problems. Finally, a discussion and some emerging research directions are presented.  相似文献   

20.
层次聚类算法的改进及分析   总被引:2,自引:0,他引:2  
层次凝聚算法是一个非常有用的聚类算法,它在迭代地凝聚每次接近对直到所有的数据都属于同一个簇.但层次聚类也存在着几个缺点,如聚类时的时空复杂性高;聚类的簇效率低、误差较大等.经验研究表明,大部分HAC算法都有这样一个趋势:除了在谱系图的顶层,所有低层聚类的簇都是比较小的并且很接近于其他的簇,提出了一种改进算法能够减小时空复杂性并能验证其正确性,分析与实验都证明这种方法是非常有效的.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号