Character pattern extraction from documents with complex backgrounds |
| |
Authors: | Hideaki Goto Hirotomo Aso |
| |
Affiliation: | (1) Information Synergy Center, Tohoku University, Aoba, Aramaki, Aoba-ku, Sendai-shi, 980–8578 Japan; e-mail: hgot@isc.tohoku.ac.jp , JP;(2) Graduate School of Engineering, Tohoku University, Aoba 05, Aramaki, Aoba-ku, Sendai-shi, Japan , JP |
| |
Abstract: | Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with
various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs,
computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images
with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the
processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns
can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel
labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document
images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color
pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns
from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the
character strokes is more than about 1.5 pixels.
Received July 23, 2001 / Accepted November 5, 2001 |
| |
Keywords: | : Character pattern extraction – Multilevel thresholding – Region growing – Complex background – Document image analysis |
本文献已被 SpringerLink 等数据库收录! |
|