Unsupervised grammar induction using history based approach |
| |
Authors: | Heshaam Feili Gholamreza Ghassem-Sani |
| |
Affiliation: | aDepartment of Computer Engineering, Sharif University of Technology, Azadi Avenue, Tehran, Iran |
| |
Abstract: | Grammar induction, also known as grammar inference, is one of the most important research areas in the domain of natural language processing. Availability of large corpora has encouraged many researchers to use statistical methods for grammar induction. This problem can be divided into three different categories of supervised, semi-supervised, and unsupervised, based on type of the required data set for the training phase. Most current inductive methods are supervised, which need a bracketed data set for their training phase; but the lack of this kind of data set in many languages, encouraged us to focus on unsupervised approaches. Here, we introduce a novel approach, which we call history-based inside-outside (HIO), for unsupervised grammar inference, by using part-of-speech tag sequences as the only source of lexical information. HIO is an extension of the inside-outside algorithm enriched by using some notions of history based approaches. Our experiments on English and Persian languages show that by adding some conditions to the rule assumptions of the induced grammar, one can achieve acceptable improvement in the quality of the output grammar. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|