Making sense of collocations |
| |
Authors: | Leo Wanner Bernd Bohnet Mark Giereth |
| |
Affiliation: | aICREA and Pompeu Fabra University, Passeig de Circumvallació, 8, Barcelona 08003, Spain;bIntelligent Systems Institute, University of Stuttgart, Germany |
| |
Abstract: | Lexico-semantic collocations (LSCs) are a prominent type of multiword expressions. Over the last decade, the automatic compilation of LSCs from text corpora has been addressed in a significant number of works. However, very often, the output of an LSC-extraction program is a plain list of LSCs. Being useful as raw material for dictionary construction, plain lists of LSCs are of a rather limited use in NLP-applications. For NLP, LSCs must be assigned syntactic and, especially, semantic information. Our goal is to develop an “off-the-shelf” LSC-acquisition program that annotates each LSC identified in the corpus with its syntax and semantics. In this article, we address the annotation task as a classification task,viewing it as a machine learning problem. The LSC-typology we use are the lexical functions from the Explanatory Combinatorial Lexicology; as lexico-semantic resource, EuroWordnet has been used. The applied machine learning technique is a variant of the nearest neighbor-family, which is defined over lexico-semantic features of the elements of LSCs. The technique has been tested on Spanish verb–noun bigrams. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|