首页 | 本学科首页   官方微博 | 高级检索  
     


IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach
Authors:Moftah Elzobi  Ayoub Al-Hamadi  Zaher Al Aghbari  Laslo Dings
Affiliation:1. Institute for Electronics, Signal Processing and Communications (IESK), Magdeburg, Germany
2. Computer Science Department, University of Sharjah, Sharjah, UAE
Abstract:Even though a lot of researches have been conducted in order to solve the problem of unconstrained handwriting recognition, an effective solution is still a serious challenge. In this article, we address two Arabic handwriting recognition-related issues. Firstly, we present IESK-arDB, a new multi-propose off-line Arabic handwritten database. It is publicly available and contains more than 4,000 word images, each equipped with binary version, thinned version as well as a ground truth information stored in separate XML file. Additionally, it contains around 6,000 character images segmented from the database. A letter frequency analysis showed that the database exhibits letter frequencies similar to that of large corpora of digital text, which proof the database usefulness. Secondly, we proposed a multi-phase segmentation approach that starts by detecting and resolving sub-word overlaps, then hypothesizing a large number of segmentation points that are later reduced by a set of heuristic rules. The proposed approach has been successfully tested on IESK-arDB. The results were very promising, indicating the efficiency of the suggested approach.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号