首页 | 本学科首页   官方微博 | 高级检索  
     


Character pattern extraction from documents with complex backgrounds
Authors:Hideaki Goto  Hirotomo Aso
Affiliation:(1) Information Synergy Center, Tohoku University, Aoba, Aramaki, Aoba-ku, Sendai-shi, 980–8578 Japan; e-mail: hgot@isc.tohoku.ac.jp , JP;(2) Graduate School of Engineering, Tohoku University, Aoba 05, Aramaki, Aoba-ku, Sendai-shi, Japan , JP
Abstract:Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs, computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the character strokes is more than about 1.5 pixels. Received July 23, 2001 / Accepted November 5, 2001
Keywords:: Character pattern extraction –  Multilevel thresholding –  Region growing –  Complex background –  Document image analysis
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号