首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
为降低无线传感器网络(WSNs)在节点众多时算法复杂度,提高定位精度,提出一种基于K-means聚类点密度的WSNs加权质心定位算法(KCPD-WCLA).首先,对空间中随机大量布设的锚节点进行分组,利用三边测量定位法在二维平面上得到许多接近真实值的结果;然后将K-means聚类算法引入到WSNs的定位问题中,对K个聚类点密度加以考虑,利用加权质心定位算法(WCLA)得到定位结果.理论分析与仿真结果表明:计算复杂度明显降低,定位精度比多边定位算法(MLA)和WCLA有显著提高.  相似文献   

2.
3.
求解K-means聚类更有效的算法   总被引:1,自引:0,他引:1  
聚类分析是数据挖掘及机器学习领域内的重点问题之一.K-means聚类由于其简羊买用,在聚类划分中是应用最广泛的一种方案.提出了在传统的K-means算法中初始点选取的新方案,对于K-means收敛计算时利用三角不等式,提出了加速收敛过程的改进方案.实验结果表明,改进后的新方法相对于传统K-means聚类所求的结果有较好的聚类划分.  相似文献   

4.
Harmony K-means algorithm for document clustering   总被引:2,自引:0,他引:2  
Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable.  相似文献   

5.
Efficient disk-based K-means clustering for relational databases   总被引:7,自引:0,他引:7  
K-means is one of the most popular clustering algorithms. We introduce an efficient disk-based implementation of K-means. The proposed algorithm is designed to work inside a relational database management system. It can cluster large data sets having very high dimensionality. In general, it only requires three scans over the data set. It is optimized to perform heavy disk I/O and its memory requirements are low. Its parameters are easy to set. An extensive experimental section evaluates quality of results and performance. The proposed algorithm is compared against the Standard K-means algorithm as well as the Scalable K-means algorithm.  相似文献   

6.
7.
Mao  YiMin  Gan  DeJin  Mwakapesa  D. S.  Nanehkaran  Y. A.  Tao  Tao  Huang  XueYu 《The Journal of supercomputing》2022,78(4):5181-5202

The partitioning-based k-means clustering is one of the most important clustering algorithms. However, in big data environment, it faces the problems of random selection of initial cluster centers randomly, expensive communication overhead among MapReduce nodes and data skewing in data partitions, and others. To solve these problems, this paper proposes a parallel clustering algorithm based on grid density and local sensitive hash function (MR-PGDLSH) which takes into account the advantages of MapReduce and LSH (locality sensitive hash function). In the MR-PGDLSH, firstly the GDS (grid density strategy) is designed to obtain the relatively reasonable initial cluster centers. Then, a DP-LSH (data partition based on locality sensitive hash function) is proposed to divide the data set into multiple segments. The relevant data objects are mapped to the same sub-data set. The similarity function is designed to generate clusters, thereby reducing frequent communication overhead between nodes. Next, the AGS (adaptive grouping strategy) is applied to distribute the amount of data on each node evenly, which solves the problem of data skew on the node. Finally, the MR-PGDLSH is applied to mine the cluster centers in parallel, which obtains the final clustering results. Both theoretical analysis and experimental results have shown that the MR-PGDLSH is superior to the existing clustering algorithms.

  相似文献   

8.
9.
一种改进的K—means聚类算法   总被引:1,自引:0,他引:1  
K—means算法是最常用的一种基于划分的聚类算法,但该算法需要事先指定K值、随机选择初始聚类中心等的缺陷,从而影响了K—means聚类结果的稳定性。针对K—means算法中的初始聚类中心是随机选择这一缺点进行改进,利用提出的新算法确定初始聚类中心,然后进行聚类,得出最终的聚类结果。实验证明,该改进算法比随机选择初始聚类中心的算法性能得到了提高,并且具有更高的准确性及稳定性。  相似文献   

10.
A modified K-means algorithm for circular invariant clustering   总被引:3,自引:0,他引:3  
Several important pattern recognition applications are based on feature vector extraction and vector clustering. Directional patterns are commonly represented by rotation-variant vectors F/sub d/ formed from features uniformly extracted in M directions. It is often desirable that pattern recognition algorithms are invariant under pattern rotation. This paper introduces a distance measure and a k-means-based algorithm, namely, circular k-means (CK-means) to cluster vectors containing directional information, such as F/sub d/, in a circular-shift invariant manner. A circular shift of F/sub d/ corresponds to pattern rotation, thus, the algorithm is rotation invariant. An efficient Fourier domain representation of the proposed measure is presented to reduce computational complexity. A split and merge approach (SMCK-means), suited to the proposed CK-means technique, is proposed to reduce the possibility of converging at local minima and to estimate the correct number of clusters. Experiments performed for textural images illustrate the superior performance of the proposed algorithm for clustering directional vectors F/sub d/, compared to the alternative approach that uses the original k-means and rotation-invariant feature vectors transformed from F/sub d/.  相似文献   

11.
基于不确定数据进行数据挖掘和知识发现的研究由于更加符合客观实际而逐渐成为近年来研究的热点.而在K-means算法聚类的过程中,样本空间各维度对聚类效果贡献的价值不同也成为现实应用中不可回避的问题.为了得到更加客观、真实的聚类结果,在经典K-means算法的基础上引入了属性的权值并重新构造了针对不确定数据集的聚类算法,并通过实验证明了该算法的有效性.  相似文献   

12.
针对K-means算法中的初始聚类中心是随机选择这一缺点进行改进,利用提出的新算法选出初始聚类中心,并进行聚类。这种算法比随机选择初始聚类中心的算法性能有所提高,具有更高的准确性。  相似文献   

13.
针对初始聚类中心对传统K-means算法的聚类结果有较大影响的问题,提出一种依据样本点类内距离动态调整中心点类间距离的初始聚类中心选取方法,由此得到的初始聚类中心点尽可能分散且具代表性,能有效避免K-means算法陷入局部最优。通过UCI数据集上的数据对改进算法进行实验,结果表明改进的算法提高了聚类的准确性。  相似文献   

14.
引入事务的恢复机制改进K—means算法,改进后的算法允许在运行过程中的任何时刻停机,重新启动后可在停机前运算成果的基础上继续运算,直至算法结束。改进后的算法使得普通机器条件下针对大数据集运用K—means算法成为可能。改进后的算法在长达400h的聚类运算中得到了检验。  相似文献   

15.
In this paper, a new clustering algorithm based on genetic algorithm (GA) with gene rearrangement (GAGR) is proposed, which in application may effectively remove the degeneracy for the purpose of a more efficient search. A new crossover operator that exploits a measure of similarity between chromosomes in a population is also presented. Adaptive probabilities of crossover and mutation are employed to prevent the convergence of the GAGR to a local optimum. Using the real-world data sets, we compare the performance of our GAGR clustering algorithm with K-means algorithm and other GA methods. An application of the GAGR clustering algorithm in unsupervised classification of multispectral remote sensing images is also provided. Experiment results demonstrate that the GAGR clustering algorithm has high performance, effectiveness and flexibility.  相似文献   

16.
The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k-cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.  相似文献   

17.
We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k. Editor: Olivier Bousquet and Andre Elisseeff A preliminary version of this work appeared in the proceedings of COLT’04 (Ben-David, 2004). This work is supported in part by the Multidisciplinary University Research Initiative (MURI) under the Office of Naval Research Contract N00014-00-1-0564.  相似文献   

18.
Workflow design, mashup configuration, and composite service formation are examples where the capabilities of multiple simple services combined achieve a complex functionality. In this paper, we address the problem of limiting the number of required services that fulfill the required capabilities while exploiting the functional specialization of individual services. Our approach strikes a balance between finding one service that matches all required capabilities and having one service for each required capability. Specifically, we introduce a weighted fuzzy clustering algorithm that detects implicit service capability groups. The clustering algorithm considers capability importance and service fitness to support those capabilities. Evaluation based on a real-world data set successfully demonstrates the effectiveness of and applicability for service aggregation.  相似文献   

19.
加速大数据聚类K-means算法的改进   总被引:1,自引:0,他引:1  
为有效处理大规模数据聚类的问题,提出一种先抽样再用最大最小距离进行K-means并行化聚类的方法。基于抽样的方法避免了聚类陷入局部解中,基于最大最小距离法使得初始聚类中心趋于最优化。大量实验结果表明,无论是在单机环境还是集群环境下,该方法受初始聚类中心的影响降低,提高了聚类的准确性,减少了聚类的迭代次数,降低了聚类的时间。  相似文献   

20.
为改善K均值聚类存在的对初始聚心敏感、全局搜索能力弱和凭经验确定聚类数等不足,提出一种基于GSA算法的改进K均值聚类。采用粒子编码策略,把聚类中心集合视作种群粒子,引入GSA搜索聚类质量最好的初始聚类中心,设均方误差为适应度函数,引导全局搜索方向,设置种群成熟度因子避免算法陷入局部最优,引入聚类质量评价指标获取最佳聚类数。通过在4种UCI数据集上做仿真测试,验证了改进后K均值聚类具有较高的正确率和更好的稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号