首页 | 本学科首页   官方微博 | 高级检索  
     

基于层次狄利克雷过程的交互式主题建模
引用本文:严宇宇,陶煜波,林海.基于层次狄利克雷过程的交互式主题建模[J].软件学报,2016,27(5):1114-1126.
作者姓名:严宇宇  陶煜波  林海
作者单位:CAD & CG国家重点实验室(浙江大学), 浙江杭州 310058,CAD & CG国家重点实验室(浙江大学), 浙江杭州 310058,CAD & CG国家重点实验室(浙江大学), 浙江杭州 310058
基金项目:国家自然科学基金(61472354);国家高技术研究发展计划(863)(2012AA12A404)
摘    要:随着信息技术的快速发展,大量的文本数据产生、被收集和存储.主题模型是文本分析的重要工具之一,被广泛地应用于分析大规模文本集.然而,主题模型通常无法直观而有效地结合用户的领域专业知识对模型结果进行修正.针对这一问题,提出了一个交互式可视分析系统,帮助用户对主题模型进行交互修正.首先对层次狄利克雷过程进行了改进,使其支持单词约束;然后,使用矩阵视图对主题模型进行展示,并使用语义相关的词云布局帮助用户寻找单词约束,用户通过添加单词约束迭代优化主题模型;最后,通过案例分析及用户研究来评价该系统的可用性.

关 键 词:文本可视化  主题模型  文本分析  层次狄利克雷过程
收稿时间:2015/7/24 0:00:00
修稿时间:2015/11/9 0:00:00

Interactive Topic Modeling Based on Hierarchical Dirichlet Process
YAN Yu-Yu,TAO Yu-Bo and LIN Hai.Interactive Topic Modeling Based on Hierarchical Dirichlet Process[J].Journal of Software,2016,27(5):1114-1126.
Authors:YAN Yu-Yu  TAO Yu-Bo and LIN Hai
Affiliation:State Key Laboratory of CAD & CG(Zhejiang University), Hangzhou 310058, China,State Key Laboratory of CAD & CG(Zhejiang University), Hangzhou 310058, China and State Key Laboratory of CAD & CG(Zhejiang University), Hangzhou 310058, China
Abstract:With the rapid development of information technology, large amounts of text data have been produced, collected and stored. Topic modeling is one of the important tools in text analysis, and is widely used for large text collection analysis. However, the topic model usually cannot be combined with users'' domain knowledge intuitively and effectively during the topic modeling process. In order to solve this problem, this paper proposes an interactive visual analysis system to help users refine generated topic models. First, the hierarchical Dirichlet process is modified to support the word constraints. Then, the generated topic models is displayed via a matrix view to visually reveal the underlying relationship between words and topics, and semantic-preserving word clouds is used to help users find word constraints effectively. User can interactively refine the topic models by adding word constraints. Finally, the applicability of this new system is demonstrated via case studies and user studies.
Keywords:text visualization  topic model  text analysis  hierarchical Dirichlet process
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号