Text segmentation using gabor filters for automatic document processing |
| |
Authors: | Anil K Jain Sushil Bhattacharjee |
| |
Affiliation: | (1) Pattern Recognition and Image Processing Processing Laboratory, Michigan State University, 48824-1027 E. Lansing, MI, USA |
| |
Abstract: | There is a considerable interest in designing automatic systems that will scan a given paper document and store it on electronic
media for easier storage, manipulation, and access. Most documents contain graphics and images in addition to text. Thus,
the document image has to be segmented to identify the text regions, so that OCR techniques may be applied only to those regions.
In this paper, we present a simple method for document image segmentation in which text regions in a given document image
are automatically identified. The proposed segmentation method for document images is based on a multichannel filtering approach
to texture segmentation. The text in the document is considered as a textured region. Nontext contents in the document, such
as blank spaces, graphics, and pictures, are considered as regions with different textures. Thus, the problem of segmenting
document images into text and nontext regions can be posed as a texture segmentation problem. Two-dimensional Gabor filters
are used to extract texture features for each of these regions. These filters have been extensively used earlier for a variety
of texture segmentation tasks. Here we apply the same filters to the document image segmentation problem. Our segmentation
method does not assume any a priori knowledge about the content or font styles of the document, and is shown to work even
for skewed images and handwritten text. Results of the proposed segmentation method are presented for several test images
which demonstrate the robustness of this technique.
This work was supported by the National Science Foundation under NSF grant CDA-88-06599 and by a grant from E. 1. Du Pont
De Nemours & Company. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|