Elementary Dependency Trees for Identifying Corpus-Specific Semantic Classes |
| |
Authors: | B Habert C Fabre |
| |
Affiliation: | (1) UMR 9952 & LIMSI -- CNRS, Ecole Normale Supérieure de Fontenay-St Cloud, 31 av. Lombart, F-92260 Fontenay-aux-Roses (E-mail;(2) ERSS -- CNRS, 5 allées Antonio Machado, F-31058 Toulouse cédex (E-mail |
| |
Abstract: | Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate
the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic
dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories
and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with
an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which
resist to statistical clustering methods but seems more appropriate for specialized texts.
This revised version was published online in July 2006 with corrections to the Cover Date. |
| |
Keywords: | clustering semantic acquisition noun phrase extraction |
本文献已被 SpringerLink 等数据库收录! |