基于向量内积不等式的分布式k均值聚类算法 An Effective Distributed k-Means Clustering Algorithm Based on the Pretreatment of Vectors' Inner-Product期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于向量内积不等式的分布式k均值聚类算法

引用本文：	倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式k均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497.

作者姓名：	倪巍伟陆介平孙志挥

作者单位：	东南大学计算机科学与工程系,南京,210096

基金项目：	国家自然科学基金项目（70371015）;教育部高等学校博士学科点专项科研基金项目（20040286009）

摘要：	聚类分析是数据挖掘领域的一项重要研究课题．随着数据量的急剧增加，针对大数据集的聚类分析成为一个难点．虽然k均值算法具有易实现、复杂度与数据集大小成线性关系的优点，将其应用于大数据集时仍然存在效率低的问题．分布式聚类是解决这一问题的有效方法．在已有分布式聚类算法k—DMeans基础上，结合向量内积不等式关系对算法加以优化，提出分布式聚类算法k—DCBIP．理论分析和实验结果表明，算法k—DCBIP优于k-DMeans，可以有效地解决大数据集聚类问题，算法是有效可行的．
关键词：	分布式聚类数据点的模向量内积向量内积不等式
收稿时间：	2005-03-15
修稿时间：	2005-03-152005-06-09
An Effective Distributed k-Means Clustering Algorithm Based on the Pretreatment of Vectors' Inner-Product

Ni Weiwei,Lu Jieping,Sun Zhihui.An Effective Distributed k-Means Clustering Algorithm Based on the Pretreatment of Vectors'''' Inner-Product[J].Journal of Computer Research and Development,2005,42(9):1493-1497.

Authors:	Ni Weiwei Lu Jieping Sun Zhihui

Abstract:	Clustering is an important research in data mining. Clustering in large data sets becomes a nut with the accumulating of the data. Despite its simplicity and its linear time, a serial k-Means algorithm's time complexity remains expensive when it is applied to a large data set. Distributed clustering is an effective method to solve this problem. In this paper, the knowledge of vectors' inner product inequation is adopted to improve efficiency of the existing parallel k-Means algorithm(k-DMeans), and an effective distributed k-Means clustering algorithm k-DCBIP is proposed. Theoretical analysis and experimental results testify that k-DCBIP outperforms the algorithm k-DMeans, and it is effective and efficient.

Keywords:	distributed clustering mode of a data point vectors' inner product vectors' inner product ineguation
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏