Learning local languages and their application to DNA sequenceanalysis |
| |
Authors: | Yokomori T Kobayashi S |
| |
Affiliation: | Dept. of Math., Waseda Univ., Tokyo; |
| |
Abstract: | This paper presents an efficient algorithm for learning in the limit a special type of regular languages, called strictly locally testable languages from positive data, and its application to identifying the protein α-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns its deterministic finite state automaton in the limit from only positive data. This provides one with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein α-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95% correct identification for positive data, and 96% for negative data |
| |
Keywords: | |
|
|