期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast agglomerative clustering using information of k-nearest neighbors

Chih-Tang Chang Jim Z.C. Lai M.D. JengAuthor vitae 《Pattern recognition》2010,43(12):3958-3968

In this paper, we develop a method to lower the computational complexity of pairwise nearest neighbor (PNN) algorithm. Our approach determines a set of candidate clusters being updated after each cluster merge. If the updating process is required for some of these clusters, k-nearest neighbors are found for them. The number of distance calculations for our method is O(N²), where N is the number of data points. To further reduce the computational complexity of the proposed algorithm, some available fast search approaches are used. Compared to available approaches, our proposed algorithm can reduce the computing time and number of distance calculations significantly. Compared to FPNN, our method can reduce the computing time by a factor of about 26.8 for the data set from a real image. Compared with PMLFPNN, our approach can reduce the computing time by a factor of about 3.8 for the same data set. 相似文献

2.

An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list

Jim Z.C. Lai 《Information Sciences》2011,181(9):1722-3410

In this paper, a new algorithm is developed to reduce the computational complexity of Ward’s method. The proposed approach uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge. Double linked algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm by obtaining an approximate solution of hierarchical agglomerative clustering. In this paper, we propose a method to resolve the problem of a non-optimal solution for DLA while keeping the corresponding advantage of low computational complexity. The computational complexity of the proposed method DKNNA + FS (dynamic k-nearest-neighbor algorithm with a fast search) in terms of the number of distance calculations is O(N²), where N is the number of data points. Compared to FPNN with a fast search (FPNN + FS), the proposed method using the same fast search algorithm (DKNNA + FS) can reduce the computing time by a factor of 1.90-2.18 for the data set from a real image. In comparison with FPNN + FS, DKNNA + FS can reduce the computing time by a factor of 1.92-2.02 using the data set generated from three images. Compared to DLA with a fast search (DLA + FS), DKNNA + FS can decrease the average mean square error by 1.26% for the same data set. 相似文献

3.

快速聚类算法在个性化服务中的应用

张剑李卫钟义信郭燕慧《计算机工程与应用》2004,40(12):10-11,219

许多实际应用已经证明,k-means算法能够有效地得到好的聚类结果。但是,k-means直接算法的时间复杂度和模式复杂度对数据量的大小非常敏感,无法满足一些高性能的应用场合,如个性化服务中对用户数据进行的群组分析。对此,笔者提出了一种新颖的基于k-d树的聚类算法。这种算法采用空间数据结构—k-d树组织所有的样本数据,可以高效地搜索到离某个给定的聚类中心最近的全部模式。实验结果表明,该方案可以显著提高k-means直接算法的运算速度,在距离运算量和总的运算时间上,可把性能提高1～2个数量级。相似文献

4.

Improvement of the k-means clustering filtering algorithm

Jim Z.C. Lai Author Vitae Author Vitae 《Pattern recognition》2008,41(12):3677-3681

In this paper, we present a modified filtering algorithm (MFA) by making use of center variations to speed up clustering process. Our method first divides clusters into static and active groups. We use the information of cluster displacements to reject unlikely cluster centers for all nodes in the kd-tree. We reduce the computational complexity of filtering algorithm (FA) through finding candidates for each node mainly from the set of active cluster centers. Two conditions for determining the set of candidate cluster centers for each node from active clusters are developed. Our approach is different from the major available algorithm, which passes no information from one stage of iteration to the next. Theoretical analysis shows that our method can reduce the computational complexity, in terms of the number of distance calculations, of FA at each stage of iteration by a factor of FC/AC, where FC and AC are the numbers of total clusters and active clusters, respectively. Compared with the FA, our algorithm can effectively reduce the computing time and number of distance calculations. It is noted that our proposed algorithm can generate the same clusters as that produced by hard k-means clustering. The superiority of our method is more remarkable when a larger data set with higher dimension is used. 相似文献

5.

Fast global k-means clustering using cluster membership and inequality

Jim Z.C. Lai Author Vitae Author Vitae 《Pattern recognition》2010,43(5):1954-1963

In this paper, we present a fast global k-means clustering algorithm by making use of the cluster membership and geometrical information of a data point. This algorithm is referred to as MFGKM. The algorithm uses a set of inequalities developed in this paper to determine a starting point for the jth cluster center of global k-means clustering. Adopting multiple cluster center selection (MCS) for MFGKM, we also develop another clustering algorithm called MFGKM+MCS. MCS determines more than one starting point for each step of cluster split; while the available fast and modified global k-means clustering algorithms select one starting point for each cluster split. Our proposed method MFGKM can obtain the least distortion; while MFGKM+MCS may give the least computing time. Compared to the modified global k-means clustering algorithm, our method MFGKM can reduce the computing time and number of distance calculations by a factor of 3.78-5.55 and 21.13-31.41, respectively, with the average distortion reduction of 5,487 for the Statlog data set. Compared to the fast global k-means clustering algorithm, our method MFGKM+MCS can reduce the computing time by a factor of 5.78-8.70 with the average reduction of distortion of 30,564 using the same data set. The performances of our proposed methods are more remarkable when a data set with higher dimension is divided into more clusters. 相似文献

6.

基于EK-medoids聚类和邻域距离的特征选择方法

孙印杰张新乐孙林《计算机应用研究》2019,36(8)

针对传统聚类算法中只注重数据间的距离关系,而忽视数据全局性分布结构的问题,提出一种基于EK-medoids聚类和邻域距离的特征选择方法。首先,用稀疏重构的方法计算数据样本之间的有效距离,构建基于有效距离的相似性矩阵;然后,将相似性矩阵应用到K-medoids聚类算法中,获取新的聚类中心,进而提出EK-medoids聚类算法,可有效对原始数据集进行聚类;最后,根据划分结果所构成簇的邻域距离给出确定数据集中的属性重要度定义,应用启发式搜索方法设计一种EK-medoids聚类和邻域距离的特征选择算法,降低了聚类算法的时间复杂度。实验结果表明,该算法不仅有效地提高了聚类结果的精度,而且也可选择出分类精度较高的特征子集。相似文献

7.

基于密度峰值和网格的自动选定聚类中心算法 总被引：1，自引：0，他引：1

夏庆亚《计算机科学》2017,44(Z11):403-406

针对快速搜索和发现密度峰值的聚类算法(DPC)中数据点之间计算复杂,最终聚类的中心个数需要通过决策图手动选取等问题,提出基于密度峰值和网格的自动选定聚类中心的改进算法GADPC。首先结合Clique网格聚类算法的思想,不再针对点对象进行操作,而是将点映射到网格,并将网格作为聚类对象,从而减少了DPC算法中对数据点之间的距离计算和聚类次数;其次通过改进后的聚类中心个数判定准则更精确地自动选定聚类中心个数;最后对网格边缘点和噪声点,采用网格内点对象和相邻网格间的相似度进行了处理。实验通过采用UEF(University of Eastern Finland)提供的数据挖掘使用的人工合成数据集和UCI自然数据集进行对比,其聚类评价指标(Rand Index)表明,改进的算法在计算大数据集时聚类质量不低于DPC和K-means算法,而且提高了DPC算法的处理效率。相似文献

8.

面向位置大数据的快速密度聚类算法 总被引：1，自引：0，他引：1

于彦伟贾召飞曹磊赵金东刘兆伟刘惊雷《软件学报》2018,29(8):2470-2484

本文面向位置大数据聚类,提出了一种简单但高效的快速密度聚类算法CBSCAN,以快速发现位置大数据中任意形状的聚类簇模式和噪声.首先,定义了Cell网格概念,并提出了基于Cell的距离分析理论,利用该距离分析,无需距离计算,可快速确定高密度区域的核心点和密度相连关系;其次,给出了网格簇定义,将基于位置点的密度簇映射成基于网格的密度簇,利用排他网格与相邻网格的密度关系,可快速确定网格簇的包含网格;第三,利用基于Cell的距离分析理论和网格簇概念,实现了一个快速密度聚类算法,将DBSCAN基于数据点的密度扩展聚类转换成基于Cell的密度扩展聚类,大大减少高密度区域的距离计算,利用位置数据的内在特性提高了聚类效率;最后,在基准测试数据上验证了所提算法的聚类效果,在位置大数据上的实验结果统计显示,相比DBSCAN、PR-Tree索引和Grid索引优化的DBSCAN,CBSCAN分别平均提升了525倍、30倍和11倍效率. 相似文献

9.

基于遗传算法聚类的变异体约简

曾凡平黄玉涵张美超潘能刚《计算机应用》2011,31(5):1314-1317

对导致变异测试高代价的原因之一--测试过程中容易产生数目庞大的变异体进行了研究,提出基于遗传算法聚类的变异体约简方法。把具有相似特征的变异体置于同一簇中,再从每个簇中随机选择一个作为代表,从而实现变异体的约简。实验表明:1)该方法可在不降低构造出的测试用例集的测试充分度的前提下,约简变异体;2)与K-means算法和凝聚型层次聚类算法相比,该方法能够在自动产生合适的聚类数目的同时,具有更优的约简效果。相似文献

10.

基于谱聚类的全局中心快速更新聚类算法

邹臣嵩 刘松 《计算机与现代化》2018,(10):6

针对高维数据在聚类过程中存在迭代次数多、运算耗时长等问题,提出一种改进的聚类算法,首先采用谱聚类对样本降维,再选取k个首尾相连且距离乘积最大的数据对象作为初始聚类中心,在簇中心更新过程中,选取与簇均值距离最近的数据对象作为簇中心,并将其他数据对象按最小距离划分至相应簇中,反复迭代,直至收敛。实验结果表明,新算法的Rand指数、Jaccard系数和Adjusted Rand Index等聚类指标全部优于K-means算法及其他3种改进聚类算法,在运行效率方面,新算法的聚类耗时更短、迭代次数更少。相似文献

11.

最大距离法选取初始簇中心的K-means文本聚类算法的研究 总被引：1，自引：0，他引：1

翟东海鱼江高飞于磊丁锋《计算机应用研究》2014,31(3):713-715

由于初始簇中心的随机选择, K-means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K-means算法所存在的以上问题, 提出了最大距离法选取初始簇中心的K-means文本聚类算法。该算法基于这样的事实：距离最远的样本点最不可能分到同一个簇中。为使该算法能应用于文本聚类, 构造了一种将文本相似度转换为文本距离的方法, 同时也重新构造了迭代中的簇中心计算公式和测度函数。在实例验证中, 对分属于五个类别的1 500篇文本组成的文本集进行了文本聚类分析, 其结果表明, 与原始的K-means聚类算法以及其他的两种改进的K-means聚类算法相比, 新提出的文本聚类算法在降低了聚类总耗时的同时, F度量值也有了明显提高。相似文献

12.

一种基于时间延迟机制的WSNs非均匀分簇算法

王志勇孙顺远徐保国《传感器与微系统》2014,(4):146-149,157

为减少无线传感器网络分簇路由协议中节点竞争簇首时多余的能耗,解决簇首能耗不均的问题,提出一种基于时间延迟机制的非均匀分簇算法。该算法使能量较多的节点被优先选为簇首,并提出了簇首竞争半径的计算方法,确保其数目稳定且位置均匀分布。成簇过程中,节点根据最小消费函数选择簇首,簇内成员加入时考虑簇首能量、二者距离以及簇首和汇聚节点角度等因素来均衡簇首能耗。仿真结果表明:算法能有效地均衡节点能耗,延长网络寿命,分别比CHTD和EEUC算法延长了35.1%和12.9%。相似文献

13.

Metric and trigonometric pruning for clustering of uncertain data in 2D geometric space

Wang Kay Ngai Ben Kao Reynold Cheng Michael Chau Sau Dan Lee David W. Cheung Kevin Y. Yip 《Information Systems》2011

We study the problem of clustering data objects with location uncertainty. In our model, a data object is represented by an uncertainty region over which a probability density function (pdf) is defined. One method to cluster such uncertain objects is to apply the UK-means algorithm [1], an extension of the traditional K-means algorithm, which assigns each object to the cluster whose representative has the smallest expected distance from it. For arbitrary pdf, calculating the expected distance between an object and a cluster representative requires expensive integration of the pdf. We study two pruning methods: pre-computation (PC) and cluster shift (CS) that can significantly reduce the number of integrations computed. Both pruning methods rely on good bounding techniques. We propose and evaluate two such techniques that are based on metric properties (Met) and trigonometry (Tri). Our experimental results show that Tri offers a very high pruning power. In some cases, more than 99.9% of the expected distance calculations are pruned. This results in a very efficient clustering algorithm. ¹ 相似文献

14.

全局中心聚类算法在课程序化中的应用

段桂芹刘松邹臣嵩《计算机与数字工程》2020,48(3):528-533

针对K-means在聚类过程中存在的随机性强、准确率不稳定等问题,提出了一种改进聚类算法,首先选取k个首尾相连且距离乘积最大的数据对象作为初始聚类中心,在簇中心迭代过程中,选取簇内距离和最小的样本作为簇中心,再将其他样本划分至相应簇中,反复迭代,直至收敛。在UCI数据集上的仿真实验结果表明:新算法与K-means算法和其他两种改进算法相比,不仅能够降低运算耗时,在准确率、Jaccard系数、F值等多项聚类指标上也有较大的提升,在实际应用中,使用新算法对现代学徒制的职业能力进行了聚类分析,解决了课程间的序化问题。相似文献

15.

Effective fuzzy c-means clustering algorithms for data clustering problems 总被引：3，自引：0，他引：3

S.R. Kannan S. RamathilagamP.C. Chung 《Expert systems with applications》2012,39(7):6292-6300

Clustering is a well known technique in identifying intrinsic structures and find out useful information from large amount of data. One of the most extensively used clustering techniques is the fuzzy c-means algorithm. However, computational task becomes a problem in standard objective function of fuzzy c-means due to large amount of data, measurement uncertainty in data objects. Further, the fuzzy c-means suffer to set the optimal parameters for the clustering method. Hence the goal of this paper is to produce an alternative generalization of FCM clustering techniques in order to deal with the more complicated data; called quadratic entropy based fuzzy c-means. This paper is dealing with the effective quadratic entropy fuzzy c-means using the combination of regularization function, quadratic terms, mean distance functions, and kernel distance functions. It gives a complete framework of quadratic entropy approaching for constructing effective quadratic entropy based fuzzy clustering algorithms. This paper establishes an effective way of estimating memberships and updating centers by minimizing the proposed objective functions. In order to reduce the number iterations of proposed techniques this article proposes a new algorithm to initialize the cluster centers.In order to obtain the cluster validity and choosing the number of clusters in using proposed techniques, we use silhouette method. First time, this paper segments the synthetic control chart time series directly using our proposed methods for examining the performance of methods and it shows that the proposed clustering techniques have advantages over the existing standard FCM and very recent ClusterM-k-NN in segmenting synthetic control chart time series. 相似文献

16.

基于均衡化函数的快速K-means算法

下载免费PDF全文

施培蓓钱雪忠汪中《计算机工程与应用》2008,44(3):189-191

聚类分析的应用很广泛,传统的K-means算法要求事先给定k值,限制了很多实际的应用,由于聚类的质量主要考察类内的紧凑性和类间的距离,提出了均衡化的评价函数,使用最近邻搜索算法减少算法的计算量,不仅自动生成聚类的数目,同时均衡了类内差异和类间差异对于聚类结果的影响,实验结果证明改进的K-means算法的有效性。相似文献

17.

一种基于边松弛的大规模WSN分簇定位算法

王静邓平《传感技术学报》2013,26(5)

针对大规模WSN定位问题中,基于半定规划的分簇定位算法在分簇不均匀及节点密度较大时,部分簇会出现定位计算复杂度过高的问题,提出了一种新的基于边松弛的分簇定位算法-EES-Cluster.该算法通过对每一个网络簇子图进行边的松弛预处理,减少了边的数目;在网络分簇数目较少时,能有效降低定位过程的计算复杂度,同时较好地保持较高的定位精度,减小簇头节点信息融合的功耗.仿真实验及分析表明,EES-Cluster算法能有效降低分簇定位算法的计算复杂度,提高大规模WSN的定位效率. 相似文献

18.

一种基于密度和网格的簇心可确定聚类算法

何熊熊管俊轶叶宣佐詹亦钊《控制与决策》2017,32(5):913-919

以网格化数据集来减少聚类过程中的计算复杂度,提出一种基于密度和网格的簇心可确定聚类算法.首先网格化数据集空间,以落在单位网格对象里的数据点数表示该网格对象的密度值,以该网格到更高密度网格对象的最近距离作为该网格的距离值;然后根据簇心网格对象同时拥有较高的密度和较大的距离值的特征,确定簇心网格对象,再通过一种基于密度的划分方式完成聚类;最后,在多个数据集上对所提出算法与一些现有聚类算法进行聚类准确性与执行时间的对比实验,验证了所提出算法具有较高的聚类准确性和较快的执行速度. 相似文献

19.

基于MapReduce并行化计算的大数据聚类算法

张文杰蒋烈辉《计算机应用研究》2020,37(1):53-56

面对大数据规模庞大且计算复杂等问题,基于MapReduce框架采用两阶段渐进式的聚类思想,提出了改进的K-means并行化计算的大数据聚类方法。第一阶段,该算法通过Canopy算法初始化划分聚类中心,从而迅速获取粗精度的聚类中心点;第二阶段,基于MapReduce框架提出了并行化计算方案,使每个数据点围绕其邻近的Canopy中心进行细化的聚类或合并,从而对大数据实现快速、准确地聚类分析。在MapReduce并行框架上进行算法验证,实验结果表明,所提算法能够有效地提升并行计算效率,减少计算时间,并提升大数据的聚类精度。相似文献

20.

A novel encoding algorithm for vector quantization using transformed codebook

Jim Z.C. Lai Author Vitae Author Vitae 《Pattern recognition》2009,42(11):3065-3070

In this paper, a novel encoding algorithm for vector quantization is presented. Our method uses a set of transformed codewords and partial distortion rejection to determine the reproduction vector of an input vector. Experimental results show that our algorithm is superior to other methods in terms of the computing time and number of distance calculations. Compared with available approaches, our method can reduce the computing time and number of distance calculations significantly. Compared with the available best method of reducing the number of distance computations, our approach can reduce the number of distance calculations by 32.3-67.1%. Compared with the best encoding algorithm for vector quantization, our method can also further reduce the computing time by 19.7-23.9%. The performance of our method is better when a larger codebook is used and is weakly correlated to codebook size. 相似文献