首页 | 本学科首页   官方微博 | 高级检索  
     


Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
Authors:Mireia Farrús  Marta R Costa-jussà  José B Mariño  Marc Poch  Adolfo Hernández  Carlos Henríquez  José A R Fonollosa
Affiliation:1.TALP Research Center, Department of Signal Theory and Communications,Universitat Politècnica de Catalunya,Barcelona,Spain;2.Office of Learning Technologies, Universitat Oberta de Catalunya,Barcelona,Spain;3.Voice and Language Department,Barcelona Media Innovation Center,Barcelona,Spain;4.Universitat Pompeu Fabra ,Barcelona,Spain
Abstract:This work aims to improve an N-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish–Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号