首页 | 本学科首页   官方微博 | 高级检索  
     


Probabilistic topic models for sequence data
Authors:Nicola Barbieri  Giuseppe Manco  Ettore Ritacco  Marco Carnuccio  Antonio Bevacqua
Affiliation:1. Yahoo Research, Av. Diagonal 177, Barcelona, Spain
2. Institute for High Performance Computing and Networks (ICAR), Italian National Research Council, via Bucci 41c, 87036, Rende, CS, Italy
3. Department of Electronics, Informatics and Systems, University of Calabria, via Bucci 41c, 87036, Rende, CS, Italy
Abstract:Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one. We extend the popular Latent Dirichlet Allocation model by exploiting three different conditional Markovian assumptions: (i) the token generation depends on the current topic and on the previous token; (ii) the topic associated with each observation depends on topic associated with the previous one; (iii) the token generation depends on the current and previous topic. For each of these modeling assumptions we present a Gibbs Sampling procedure for parameter estimation. Experimental evaluation over real-word data shows the performance advantages, in terms of recall and precision, of the sequence-modeling approaches.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号