A knowledge-based system for extracting text-lines from mixed and overlapping text/graphics compound document images |
| |
Authors: | Yen-Lin ChenZeng-Wei Hong Cheng-Hung Chuang |
| |
Affiliation: | a Department of Computer Science and Information Engineering, National Taipei University of Technology, 1, Sec. 3, Chung-hsiao E. Rd., Taipei 10608, Taiwan b Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Rd., Wufeng, Taichung 41354, Taiwan |
| |
Abstract: | This paper presents a new knowledge-based system for extracting and identifying text-lines from various real-life mixed text/graphics compound document images. The proposed system first decomposes the document image into distinct object planes to separate homogeneous objects, including textual regions of interest, non-text objects such as graphics and pictures, and background textures. A knowledge-based text extraction and identification method obtains the text-lines with different characteristics in each plane. The proposed system offers high flexibility and expandability by merely updating new rules to cope with various types of real-life complex document images. Experimental and comparative results prove the effectiveness of the proposed knowledge-based system and its advantages in extracting text-lines with a large variety of illumination levels, sizes, and font styles from various types of mixed and overlapping text/graphics complex compound document images. |
| |
Keywords: | Document image analysis Knowledge-based systems Text extraction Region segmentation Complex compound document images |
本文献已被 ScienceDirect 等数据库收录! |
|