首页 | 本学科首页   官方微博 | 高级检索  
     

解决文本聚类集成问题的两个谱算法
引用本文:徐森,卢志茂,顾国昌.解决文本聚类集成问题的两个谱算法[J].自动化学报,2009,35(7):997-1002.
作者姓名:徐森  卢志茂  顾国昌
作者单位:1.哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001
基金项目:国家自然科学基金(60603092);;国家教育部博士点基金(20070217043)资助~~
摘    要:聚类集成中的关键问题是如何根据不同的聚类器组合为最终的更好的聚类结果. 本文引入谱聚类思想解决文本聚类集成问题, 然而谱聚类算法需要计算大规模矩阵的特征值分解问题来获得文本的低维嵌入, 并用于后续聚类. 本文首先提出了一个集成算法, 该算法使用代数变换将大规模矩阵的特征值分解问题转化为等价的奇异值分解问题, 并继续转化为规模更小的特征值分解问题; 然后进一步研究了谱聚类算法的特性, 提出了另一个集成算法, 该算法通过求解超边的低维嵌入, 间接得到文本的低维嵌入. 在TREC和Reuters文本数据集上的实验结果表明, 本文提出的两个谱聚类算法比其他基于图划分的集成算法鲁棒, 是解决文本聚类集成问题行之有效的方法.

关 键 词:聚类分析    聚类集成    谱聚类    文本聚类
收稿时间:2008-6-12
修稿时间:2008-11-25

Two Spectral Algorithms for Ensembling Document Clusters
XU Sen LU Zhi-Mao GU Guo-Chang.College of Computer Science , Technology,Harbin Engineering University,Harbin .College of Information , Communication Engineering,Harbin.Two Spectral Algorithms for Ensembling Document Clusters[J].Acta Automatica Sinica,2009,35(7):997-1002.
Authors:XU Sen LU Zhi-Mao GU Guo-ChangCollege of Computer Science  Technology  Harbin Engineering University  Harbin College of Information  Communication Engineering  Harbin
Affiliation:1.College of Computer Science and Technology, Harbin Engineering University, Harbin 150001;2.College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001
Abstract:A critical problem in cluster ensemble is how to combine multiple clusters to yield a superior result. In this paper, the idea of spectral clustering algorithm is brought into the document cluster ensemble problem. Since spectral clustering algorithm needs to solve eigenvalue decomposition problem of a large scale matrix to get the low dimensional embedding of documents for later clustering, a fast spectral algorithm is first proposed, in which the large scale matrix eigenvalue decomposition problem is transformed to an equivalent singular value decomposition problem and then to a much smaller matrix eigenvalue decomposition problem. The characteristic of spectral clustering algorithm is further investigated and another spectral algorithm is proposed, in which the low dimensional embedding of documents are obtained indirectly by those of hyperedges. Experiments on TREC and Reuters document sets show that both proposed spectral algorithms outperform other cluster ensemble techniques based on graph partitioning, and can effectively solve document cluster ensemble problem.
Keywords:Clustering analysis  cluster ensemble  spectral clustering  document clustering
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号