Adaptive context trees and text clustering |
| |
Authors: | Vert J-P |
| |
Affiliation: | Dept. of Math. & Applications, Ecole Normale Superieure, Paris; |
| |
Abstract: | In the finite-alphabet context we propose four alternatives to fixed-order Markov models to estimate a conditional distribution. They consist in working with a large class of variable-length Markov models represented by context trees, and building an estimator of the conditional distribution with a risk of the same order as the risk of the best estimator for every model simultaneously, in a conditional Kullback-Leibler sense. Such estimators can be used to model complex objects like texts written in natural language and define a notion of similarity between them. This idea is illustrated by experimental results of unsupervised text clustering |
| |
Keywords: | |
|
|