首页 | 本学科首页   官方微博 | 高级检索  
     

应用hLDA进行多文档主题建模关键因素研究
引用本文:衡 伟,于 佳,李 蕾,刘咏彬.应用hLDA进行多文档主题建模关键因素研究[J].中文信息学报,2013,27(6):117-128.
作者姓名:衡 伟  于 佳  李 蕾  刘咏彬
作者单位:北京邮电大学 计算机学院 智能科学技术中心,北京 100876
基金项目:国家自然科学基金资助项目(71231002,61202247);北京邮电大学青年科研创新计划专项;北京市科学技术情报研究所项目“科技情报辅助系统”;中央高校基本科研业务费专项资金(2013RC0304);教育部信息网络工程研究中心。
摘    要:hLDA(层次潜在狄利克雷分配)在层次主题建模中的良好效果已经得到广泛验证。为了实现半监督或无监督,通常采用交叉验证或抽样超参来确定参数。但由于语料特征、建模需求等不确定因素,参数调节方法、建模效果和效率都是实际应用中的难点。该文首先结合贝叶斯线索和范围线索构成的统一分析框架,研究hLDA主题建模中的关键影响因素,然后给出一个切实有效的建模策略及流程,最终结合ACL MultiLing 2013多文档摘要语料进行实际建模效果评估。

关 键 词:层次潜在狄利克雷分配  层次主题建模  统一分析框架  

Research on Key Factors in Multi-document Topic Modeling Application with HLDA
HENG Wei,YU Jia,LI Lei,LIU Yongbin.Research on Key Factors in Multi-document Topic Modeling Application with HLDA[J].Journal of Chinese Information Processing,2013,27(6):117-128.
Authors:HENG Wei  YU Jia  LI Lei  LIU Yongbin
Affiliation:Center for Intelligence Science and Technology, School of Computer Science,
Beijing University of Posts and Telecommunications, Beijing 100876 ,China
Abstract:The results of hLDA (hierarchical Latent Dirichlet Allocation) in the hierarchical topic modeling have been widely validated. In order to achieve semi-supervised or unsupervised learning, cross-validation or sampling super parameters are usually used to determine the true parameters. However, corpus features, modeling demand and some other factors are uncertain. Hence, parameter adjustment, modeling effectiveness and efficiency are difficulty to achieve in practical applications. This paper builds a unified analytical framework by combining Bayesian theory and boundary information, analyzes the key factors in its topic modeling, then gives a series of practical and effective modeling strategies and processes, and finally evaluates the modeling results with multi-document summary corpus from ACL MultiLing 2013.
Key wordsHierarchical LDA; Hierarchical Topic Modeling; Unified Analytical Framework
Keywords:Hierarchical LDA  Hierarchical Topic Modeling  Unified Analytical Framework  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号