IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach |
| |
Authors: | Moftah Elzobi Ayoub Al-Hamadi Zaher Al Aghbari Laslo Dings |
| |
Affiliation: | 1. Institute for Electronics, Signal Processing and Communications (IESK), Magdeburg, Germany 2. Computer Science Department, University of Sharjah, Sharjah, UAE
|
| |
Abstract: | Even though a lot of researches have been conducted in order to solve the problem of unconstrained handwriting recognition, an effective solution is still a serious challenge. In this article, we address two Arabic handwriting recognition-related issues. Firstly, we present IESK-arDB, a new multi-propose off-line Arabic handwritten database. It is publicly available and contains more than 4,000 word images, each equipped with binary version, thinned version as well as a ground truth information stored in separate XML file. Additionally, it contains around 6,000 character images segmented from the database. A letter frequency analysis showed that the database exhibits letter frequencies similar to that of large corpora of digital text, which proof the database usefulness. Secondly, we proposed a multi-phase segmentation approach that starts by detecting and resolving sub-word overlaps, then hypothesizing a large number of segmentation points that are later reduced by a set of heuristic rules. The proposed approach has been successfully tested on IESK-arDB. The results were very promising, indicating the efficiency of the suggested approach. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|