基于相似中心的k-cmeans文本聚类算法 k-cmeans text clustering algorithm based on similar centroid期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于相似中心的k-cmeans文本聚类算法

引用本文：	许厚金,刘永炎,邓成玉,刘永山.基于相似中心的k-cmeans文本聚类算法[J].计算机工程与设计,2010,31(8).

作者姓名：	许厚金刘永炎邓成玉刘永山

作者单位：	1. 燕山大学,信息科学与工程学院,河北,秦皇岛,066004 2. 张家口教育学院,数学系,河北,张家口,075000

基金项目：	工信部2007电子信息产业发展基金

摘要：	针对k-means聚类算法只能保证收敛到局部最优,导致聚类结果对初始聚类中心敏感的问题,提出了一种基于相似中心的文本聚类算法.首先,度量文档之间的相似性,然后按照文档之间的相似性递减排序,选择序列最前面的k个文档作为初始聚类中心,对于每个剩余的文档(没有被选为初始簇中心的文档)根据其与存在的簇中心的相似性,将其分配到相似性最大的簇中,更新簇均值,连续迭代,直至均值不变,从而得到更加稳定的聚类结果.实验结果表明,提出的算法在宏平均聚类精度和宏平均召回率上有显著提高,产生了质量较好的聚类效果.
关键词：	聚类 k-cmeans算法相似性度量宏平均聚类精度宏平均召回率
k-cmeans text clustering algorithm based on similar centroid

XU Hou-jin,LIU Yong-yan,DENG Cheng-yu,LIU Yong-shan.k-cmeans text clustering algorithm based on similar centroid[J].Computer Engineering and Design,2010,31(8).

Authors:	XU Hou-jin LIU Yong-yan DENG Cheng-yu LIU Yong-shan

Affiliation:	XU Hou-jin1,LIU Yong-yan2,DENG Cheng-yu1,LIU Yong-shan1 (1. Institute of Information Science , Engineering,Yanshan University,Qinhuangdao 066004,China,2. Department of Mathematics,Zhangjiakou Education College,Zhangjiakou 075000,China)

Abstract:	The k-means clustering algorithm can only guarantee convergence to a local optimum, which led to the results of clustering is sensitive for initial clustering center, an improved centroid-based text clustering algorithm is proposed. First, the similarity between documents is calculated, then centers at the first k documents of the sequence is selected, which is sorted by similarity descending, according to similarity between every document which is not selected as initial cluster center and existent cluster...

Keywords:	clustering k-cmeans algorithm similarity measurement marco average clustering precision marco average recall rate
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏