An image-based automatic Arabic translation system期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

An image-based automatic Arabic translation system

Authors:	Yi Chang^{[Author Vitae]} Datong Chen [Author Vitae]Author Vitae] Jie Yang [Author Vitae]

Affiliation:	School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA

Abstract:	In this paper, we present a system that automatically translates Arabic text embedded in images into English. The system consists of three components: text detection from images, character recognition, and machine translation. We formulate the text detection as a binary classification problem and apply gradient boosting tree (GBT), support vector machine (SVM), and location-based prior knowledge to improve the F1 score of text detection from 78.95% to 87.05%. The detected text images are processed by off-the-shelf optical character recognition (OCR) software. We employ an error correction model to post-process the noisy OCR output, and apply a bigram language model to reduce word segmentation errors. The translation module is tailored with compact data structure for hand-held devices. The experimental results show substantial improvements in both word recognition accuracy and translation quality. For instance, in the experiment of Arabic transparent font, the BLEU score increases from 18.70 to 33.47 with use of the error correction module.

Keywords:	Text detection Image classification OCR Error correction
本文献已被 ScienceDirect 等数据库收录！