首页 | 本学科首页   官方微博 | 高级检索  
     

类关联词约束的K-Means半监督文本聚类方法
引用本文:韩红旗,朱东华,汪雪锋. 类关联词约束的K-Means半监督文本聚类方法[J]. 微计算机信息, 2010, 0(15)
作者姓名:韩红旗  朱东华  汪雪锋
作者单位:北京理工大学管理与经济学院;华北水利水电学院管理与经济学院;
摘    要:提出了一种利用类关联词和K-Means聚类算法实现对文本文档进行分类的方法。类关联词是与类主题相关、能反映类主题的单词或短语。根据文档中包含的类关联词,形成初始聚类中心。在聚类算法过程中,类关联词提供的信息被用来约束待分类文档与聚类中心的相似度比较,加快了算法的执行。实验证明了算法的有效性。

关 键 词:文本聚类  文本分类  类关联词  K-Means  

Semi-supervised K-Means Text Clustering Algorithm Using Class Associated Words
HAN Hong-qi ZHU Dong-hua WANG Xue-feng. Semi-supervised K-Means Text Clustering Algorithm Using Class Associated Words[J]. Control & Automation, 2010, 0(15)
Authors:HAN Hong-qi ZHU Dong-hua WANG Xue-feng
Affiliation:HAN Hong-qi ZHU Dong-hua WANG Xue-feng(School of Management , Economics,Beijing Institute of Technology,Beijing,100081,China)(School of Management , Economics,North China University of Water Conservancy , Electric Power,Zhengzhou 450011,China)
Abstract:An improved K-Means algorithm is presented to classify text documents using class associated words.Class associated words are words or phrases which represent the subject of classes.The initial clustering centroids are produced with the prior knowledge of class associated words.Class associated words in the documents can be used to supervise clustering and improve the algorithm performance.Experiment results show the algorithm is effective.
Keywords:text clustering  text classification  class associated words  K-Means  
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号