首页 | 本学科首页   官方微博 | 高级检索  
     

改进的概率潜在语义分析下的文本聚类算法
引用本文:张玉芳,朱俊,熊忠阳.改进的概率潜在语义分析下的文本聚类算法[J].计算机应用,2011,31(3):674-676.
作者姓名:张玉芳  朱俊  熊忠阳
作者单位:重庆大学 计算机学院
基金项目:中国博士后科学基金资助项目,重庆市科委基金资助项目
摘    要:概率潜在语义分析(PLSA)模型用期望最大化(EM)算法进行参数训练,由于算法参数的随机初始化,致使聚类的效果过度拟合且过分依赖于参数初始值。将潜在语义分析(LSA)模型参数概率化,用以初始化概率潜在语义分析模型的参数,得到的改进算法有效解决了参数随机初始化问题。经实验验证,所提出的方法对文本聚类的归一化互信息(NMI)和准确度都有明显提高。

关 键 词:文本聚类  概率潜在语义分析  参数初始化  潜在语义分析  
收稿时间:2010-09-06
修稿时间:2010-10-27

Improved text clustering algorithm of probabilistic latent with semantic analysis
ZHANG Yu-fang,ZHU Jun,XIONG Zhong-yang.Improved text clustering algorithm of probabilistic latent with semantic analysis[J].journal of Computer Applications,2011,31(3):674-676.
Authors:ZHANG Yu-fang  ZHU Jun  XIONG Zhong-yang
Affiliation:College of Computer Science, Chongqing University, Chongqing 400044, China
Abstract:Trained by the Expectation Maximization (EM) algorithm, whose model parameters are randomly initialized, the performance of Probabilistic Latent Semantic Analysis (PLSA) model is quite dependent on the initialization of the model, and the result of iteration is not a global maximum, but a local one. The authors derived probabilities from Latent Semantic Analysis (LSA), and then used it to initialize the parameters of PLSA model in documents clustering. The improved PLSA could effectively solve the puzzle of random initializing of EM. It is shown that the improved algorithm has a distinct improvement in Normalized Mutual Information (NMI) and accuracy.
Keywords:document clustering                                                                                                                        Probabilistic Latent Semantic Analysis (PLSA)                                                                                                                        model parameters initialization                                                                                                                        Latent Semantic Analysis (LSA)
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号