首页 | 本学科首页   官方微博 | 高级检索  
     

融合DSTM和USTM方法的主题模型
引用本文:江雨燕,李 平,王 清,李常训.融合DSTM和USTM方法的主题模型[J].计算机科学与探索,2014(5):630-639.
作者姓名:江雨燕  李 平  王 清  李常训
作者单位:安徽工业大学 管理科学与工程学院,安徽 马鞍山243002
基金项目:The National Natural Science Foundation of China under Grant No.71172219,the Natural Science Foundation of Anhui Province of China under Grant No. KJ2011Z039
摘    要:当前监督或半监督隐藏狄利克雷分配(latent Dirichlet allocation,LDA)模型多数采用DSTM(down-stream supervised topic model)或USTM(upstream supervised topic model)方式加入额外信息,使得模型具有较高的主题提取和数据降维能力,然而无法处理包含多种额外信息的学术文档数据。通过对LDA及其扩展模型的研究,提出了一种将DSTM和USTM结合的概率主题模型ART(author & reference topic)。ART模型分别以USTM和DSTM方式构建了文档作者和引用文献的生成过程,因此可以对既包含作者信息又包含引用文献信息的文档进行有效的分析处理。在实验过程中采用Stochastic EM Sampling 方法对模型参数进行了学习,并将实验结果与Labeled LDA和DMR模型进行了对比。实验结果表明,ART模型不仅拥有高效的文档主题提取和聚类能力,同时还拥有优良的文档作者判别和引用文献排序能力。

关 键 词:隐藏狄利克雷分配(LDA)  监督主题模型  文档聚类  作者预测  latent  Dirichlet  allocation  (LDA)

Topic Model Combining DSTM and USTM Methods
JIANG Yuyan,LI Ping,WANG Qing,LI Changxun.Topic Model Combining DSTM and USTM Methods[J].Journal of Frontier of Computer Science and Technology,2014(5):630-639.
Authors:JIANG Yuyan  LI Ping  WANG Qing  LI Changxun
Affiliation:School of Management Science and Engineering, Anhui University of Technology, Ma' anshan, Anhui 243002, China
Abstract:Most of supervised and semi-supervised latent Dirichlet allocation (LDA) models add metadata based on DSTM (downstream supervised topic model) or USTM (upstream supervised topic model) methods, which can improve the capabilities of topics extraction and dimension reduction. However those models can not analyze academic documents which have more than one kind of metadata. Based on the research on the LDA model and its modifications, this paper proposes a new LDA model namely author&references topic (ART) model. The ART model defines the generation process of authors and references by USTM and DSTM which makes the model be able to analyze docu-ments both with authors and references information. In the experiment, Stochastic EM Sampling method is used to learn the parameters of ART model and the ART model is compared with Labeled LDA and DMR models. The experimental results show that the ART model not only has efficient capabilities of academic documents topic extraction and clustering, but also can give an accurate prediction of authors for a new document.
Keywords:supervised topic model  documents clustering  predicting authors
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号