首页 | 本学科首页   官方微博 | 高级检索  
     

融合链接文本的增量联合主题模型*
引用本文:马慧芳,王博.融合链接文本的增量联合主题模型*[J].计算机应用研究,2012,29(4):1289-1293.
作者姓名:马慧芳  王博
作者单位:1. 西北师范大学数学与信息科学学院计算机系,兰州,730070
2. 解放军南京政治学院上海校区军事信息管理系,上海,200433
基金项目:西北师范大学青年教师科研能力提升计划资助项目(NWNU-LKQN-10-1,SKQNGG10018)
摘    要:在基于链接的概率隐含语义分析的基础上提出一种融合文本链接的增量方法进行主题建模。首先在原有网页集上进行主题建模;然后随着网页的结构和内容动态变化,利用一种合理的更新机制更新模型参数,从而高效快速地处理在线网页流的动态变化。此外,提出一个自适应非对称学习方法融合文本与链接模态的隐含主题。对于每个网页,它在两种模态上的主题分布通过加权进行融合,而权值由该网页的特征词分布的熵值确定。由于融合之后的概率结构合理地关联了链接模态和文本模态的信息,故能得到很好的建模效果。两种类型的数据集上的实验结果显示该算法可以有效地节省时间,并对网页分类有较大性能的提高,此外还提供了由本文模型生成的主题显示结果。

关 键 词:主题模型  增量学习  链接—概率隐含语义分析  自适应非对称学习  自适应增量链接—概率隐含语义分析

Joint incremental topic modeling by fusing text and link
MA Hui-fang,WANG Bo.Joint incremental topic modeling by fusing text and link[J].Application Research of Computers,2012,29(4):1289-1293.
Authors:MA Hui-fang  WANG Bo
Affiliation:1.Dept.of Computer Science,College of Mathematics & Information Science,Northwest Normal University,Lanzhou 730070,China;2.Dept.of Military Information Management,Shanghai Branch of PLA Nanjing Institute of Politics,Shanghai 200433,China)
Abstract:This paper proposed an incremental algorithm integrating both content and link for topic modeling based on link-PLSA.Firstly,it performed topic modeling on the initial dataset.And then presented a reasonable technique of updating parameter of model to effectively integrate the newly arriving documents and linked into the original model.Furthermore,it proposed an adaptive asymmetric learning approach to fuse the latent topics of both content and link modality.For each webpage,it fused the distribution over topics of each model by multiplying different weights,which determined by the entropy of the distribution of words.A better topic modeling could be achieved as the probabilistic structure associates content and link modalities properly.Empirical experiments on two data sets with different link structure show that the approach is time saving and indicate that the model leads to systematic improvements in the quality of classification.Besides,this paper presented some interesting visualizations generated by the model.
Keywords:topic models  incremental learning  link-PLSA  adaptive asymmetric learning  adaptive link-IPLSA
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号