首页 | 本学科首页   官方微博 | 高级检索  
     

基于向量内积不等式的分布式k均值聚类算法
引用本文:倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式k均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497.
作者姓名:倪巍伟  陆介平  孙志挥
作者单位:东南大学计算机科学与工程系,南京,210096
基金项目:国家自然科学基金项目(70371015);教育部高等学校博士学科点专项科研基金项目(20040286009)
摘    要:聚类分析是数据挖掘领域的一项重要研究课题.随着数据量的急剧增加,针对大数据集的聚类分析成为一个难点.虽然k均值算法具有易实现、复杂度与数据集大小成线性关系的优点,将其应用于大数据集时仍然存在效率低的问题.分布式聚类是解决这一问题的有效方法.在已有分布式聚类算法k—DMeans基础上,结合向量内积不等式关系对算法加以优化,提出分布式聚类算法k—DCBIP.理论分析和实验结果表明,算法k—DCBIP优于k-DMeans,可以有效地解决大数据集聚类问题,算法是有效可行的.

关 键 词:分布式聚类  数据点的模  向量内积  向量内积不等式
收稿时间:2005-03-15
修稿时间:2005-03-152005-06-09

An Effective Distributed k-Means Clustering Algorithm Based on the Pretreatment of Vectors' Inner-Product
Ni Weiwei,Lu Jieping,Sun Zhihui.An Effective Distributed k-Means Clustering Algorithm Based on the Pretreatment of Vectors'''' Inner-Product[J].Journal of Computer Research and Development,2005,42(9):1493-1497.
Authors:Ni Weiwei  Lu Jieping  Sun Zhihui
Abstract:Clustering is an important research in data mining. Clustering in large data sets becomes a nut with the accumulating of the data. Despite its simplicity and its linear time, a serial k-Means algorithm's time complexity remains expensive when it is applied to a large data set. Distributed clustering is an effective method to solve this problem. In this paper, the knowledge of vectors' inner product inequation is adopted to improve efficiency of the existing parallel k-Means algorithm(k-DMeans), and an effective distributed k-Means clustering algorithm k-DCBIP is proposed. Theoretical analysis and experimental results testify that k-DCBIP outperforms the algorithm k-DMeans, and it is effective and efficient.
Keywords:distributed clustering  mode of a data point  vectors' inner product  vectors' inner product ineguation  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号