Stochastic Grammatical Inference of Text Database Structure |
| |
Authors: | Young-Lai Matthew Tompa Frank WM. |
| |
Affiliation: | (1) Computer Science Department, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1;(2) Computer Science Department, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1 |
| |
Abstract: | For a document collection in which structural elements are identified with markup, it is often necessary to construct a grammar retrospectively that constrains element nesting and ordering. This has been addressed by others as an application of grammatical inference. We describe an approach based on stochastic grammatical inference which scales more naturally to large data sets and produces models with richer semantics. We adopt an algorithm that produces stochastic finite automata and describe modifications that enable better interactive control of results. Our experimental evaluation uses four document collections with varying structure. |
| |
Keywords: | stochastic grammatical inference text database structure |
本文献已被 SpringerLink 等数据库收录! |
|