首页 | 本学科首页   官方微博 | 高级检索  
     


Topic model validation
Authors:Eduardo H RamirezAuthor Vitae  Ramon BrenaAuthor Vitae
Affiliation:a Tecnologico de Monterrey, Campus Monterrey, Monterrey, Mexico
b DISCo, Università degli Studi di Milano-Bicocca, Viale Sarca 336, 20126 Milano, Italy
Abstract:In this paper the problem of performing external validation of the semantic coherence of topic models is considered. The Fowlkes-Mallows index, a known clustering validation metric, is generalized for the case of overlapping partitions and multi-labeled collections, thus making it suitable for validating topic modeling algorithms. In addition, we propose new probabilistic metrics inspired by the concepts of recall and precision. The proposed metrics also have clear probabilistic interpretations and can be applied to validate and compare other soft and overlapping clustering algorithms. The approach is exemplified by using the Reuters-21578 multi-labeled collection to validate LDA models, then using Monte Carlo simulations to show the convergence to the correct results. Additional statistical evidence is provided to better understand the relation of the metrics presented.
Keywords:Topic models  Soft clustering  Fowlkes-Mallows index  Monte Carlo
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号