Annotation of sentence structure |
| |
Authors: | Markéta Lopatková Petr Homola Natalia Klyueva |
| |
Affiliation: | (1) Charles University in Prague, Faculty of Mathematics and Physics, Prague, Czech Republic |
| |
Abstract: | The focus of this article is on the creation of a collection of sentences manually annotated with respect to their sentence
structure. We show that the concept of linear segments—linguistically motivated units, which may be easily detected automatically—serves
as a good basis for the identification of clauses in Czech. The segment annotation captures such relationships as subordination,
coordination, apposition and parenthesis; based on segmentation charts, individual clauses forming a complex sentence are
identified. The annotation of a sentence structure enriches a dependency-based framework with explicit syntactic information
on relations among complex units like clauses. We have gathered a collection of 3,444 sentences from the Prague Dependency
Treebank, which were annotated with respect to their sentence structure (these sentences comprise 10,746 segments forming
6,341 clauses). The main purpose of the project is to gain a development data—promising results for Czech NLP tools (as a
dependency parser or a machine translation system for related languages) that adopt an idea of clause segmentation have been
already reported. The collection of sentences with annotated sentence structure provides the possibility of further improvement
of such tools. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|