首页 | 本学科首页   官方微博 | 高级检索  
     


A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model
Authors:Hwan-Chul Park  Se-Young Ok  Young-Jung Yu  Hwan-Gue Cho
Affiliation:(1) R&D Center, PAXVR, Seocho Jeil B/D, 1624-2, Seocho-Dong, Seocho-Ku, Seoul 137-878, Korea, KR;(2) LG Innotek, Yongin-shi, Kyunggi-do, Korea, KR;(3) Graphics Application Lab., Department of Computer Science, Pusan National University, Kum-Jung-Ku, Pusan 609-735, Korea, KR
Abstract:Automatic character recognition and image understanding of a given paper document are the main objectives of the computer vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation, and density) of characters and propose a characteristic value for classification using the run-length frequency of the image component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D space according to the area of the bounding box and positional information from the document. We conducted tests with more than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental documents. Received August 3, 2001 / Accepted August 8, 2001
Keywords:: Document analysis –   Text extraction –   3D Neighborhood graph –   Word grouping
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号