Corpus based part-of-speech tagging |
| |
Authors: | Chengyao Lv Huihua Liu Yuanxing Dong Yunliang Chen |
| |
Affiliation: | 1.School of Foreign Language,China University of Geosciences,Wuhan,China;2.School of Computer Science,China University of Geosciences,Wuhan,China |
| |
Abstract: | In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|