首页 | 本学科首页   官方微博 | 高级检索  
     


A knowledge-based system for extracting text-lines from mixed and overlapping text/graphics compound document images
Authors:Yen-Lin ChenZeng-Wei Hong  Cheng-Hung Chuang
Affiliation:a Department of Computer Science and Information Engineering, National Taipei University of Technology, 1, Sec. 3, Chung-hsiao E. Rd., Taipei 10608, Taiwan
b Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Rd., Wufeng, Taichung 41354, Taiwan
Abstract:This paper presents a new knowledge-based system for extracting and identifying text-lines from various real-life mixed text/graphics compound document images. The proposed system first decomposes the document image into distinct object planes to separate homogeneous objects, including textual regions of interest, non-text objects such as graphics and pictures, and background textures. A knowledge-based text extraction and identification method obtains the text-lines with different characteristics in each plane. The proposed system offers high flexibility and expandability by merely updating new rules to cope with various types of real-life complex document images. Experimental and comparative results prove the effectiveness of the proposed knowledge-based system and its advantages in extracting text-lines with a large variety of illumination levels, sizes, and font styles from various types of mixed and overlapping text/graphics complex compound document images.
Keywords:Document image analysis  Knowledge-based systems  Text extraction  Region segmentation  Complex compound document images
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号