Prediction suffix trees for supervised classification of sequences |
| |
Authors: | Christine Largeron-Letno |
| |
Affiliation: | EURISE-Université Jean Monnet, 23 rue du Dr Michelon, 42023, Saint-Etienne Cedex 2, France |
| |
Abstract: | This paper presents a statistical test and algorithms for patterns extraction and supervised classification of sequential data. First it defines the notion of prediction suffix tree (PST). This type of tree can be used to efficiently describe variable order chain. It performs better than the Markov chain of order L and at a lower storage cost. We propose an improvement of this model, based on a statistical test. This test enables us to control the risk of encountering different patterns in the model of the sequence to classify and in the model of its class. Applications to biological sequences are presented to illustrate this procedure. We compare the results obtained with different models (Markov chain of order L, Variable order model and the statistical test, with or without smoothing). We set out to show how the choice of the parameters of the models influences performance in these applications. Obviously these algorithms can be used in other fields in which the data are naturally ordered. |
| |
Keywords: | Prediction suffix tree Patterns extraction Supervised classification Variable order chain Markov model Chronobiological and DNA sequences |
本文献已被 ScienceDirect 等数据库收录! |
|