Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation |
| |
Authors: | Helena M. Caseli Maria das Graças V. Nunes Mikel L. Forcada |
| |
Affiliation: | 1. NILC – ICMC, University of S?o Paulo, S?o Carlos, SP, Brazil 2. Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, 03071, Alacant, Spain
|
| |
Abstract: | ![]() The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks). |
| |
Keywords: | Machine translation Automatic induction Transfer rule Bilingual dictionary Shallow transfer |
本文献已被 SpringerLink 等数据库收录! |
|