首页 | 本学科首页   官方微博 | 高级检索  
     


Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths
Authors:Nikos Nikolaou  Michael Makridis  Basilis Gatos  Nikolaos Stamatopoulos  Nikos Papamarkos
Affiliation:1. Department of Electrical and Computer Engineering, Democritus University of Thrace, 67 100 Xanthi, Greece;2. Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, 153 10 Athens, Greece
Abstract:In this paper, we strive towards the development of efficient techniques in order to segment document pages resulting from the digitization of historical machine-printed sources. This kind of documents often suffer from low quality and local skew, several degradations due to the old printing matrix quality or ink diffusion, and exhibit complex and dense layout. To face these problems, we introduce the following innovative aspects: (i) use of a novel Adaptive Run Length Smoothing Algorithm (ARLSA) in order to face the problem of complex and dense document layout, (ii) detection of noisy areas and punctuation marks that are usual in historical machine-printed documents, (iii) detection of possible obstacles formed from background areas in order to separate neighboring text columns or text lines, and (iv) use of skeleton segmentation paths in order to isolate possible connected characters. Comparative experiments using several historical machine-printed documents prove the efficiency of the proposed technique.
Keywords:Text line segmentation  Word segmentation  Character segmentation  Historical machine-printed documents  Run Length Smoothing Algorithm
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号