首页 | 本学科首页   官方微博 | 高级检索  
     

Self-Adaptive Topic Model: A Solution to the Problem of "Rich Topics Get Richer"
作者姓名:FANG Ying  HUANG Heyan  JIAN Ping  XIN Xin  FENG Chong
作者单位:[1]School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, R R. China; [2]Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing, Beijing 100081, P. R. China; [3]School of computer & technology, ShangQiu Normal College, ShangQiu HeNan, 476000, P. R. China
基金项目:ACKNOWLEDGMENTS This work is supported by grants National 973 project (No.2013CB29606), Natural Science Foundation of China (No.61202244), research fund of ShangQiu Normal Colledge (No. 2013GGJS013). N1PS corpus is supported by SourceForge. We thank the anonymous reviewers for their helpful comments.
摘    要:
The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible.

关 键 词:自适应模型  adaptive方法  型号  狄利克雷  动态数据  演变分析  LDA  语料库
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号