首页 | 本学科首页   官方微博 | 高级检索  
     


Table form document analysis based on the document structure grammar
Authors:Akira Amano  Naoki Asada  Masayuki Mukunoki  Masahito Aoyama
Affiliation:(1) Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, Japan;(2) Department of Information Sciences, Hiroshima City University, 3-4-1 Ozukahigashi, Asaminami, Hiroshima, Japan
Abstract:Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely geometrical layout and lexical information. To handle these documents automatically, logical structure information is necessary. In this paper, we first analyze the elements of the form documents from a communication point of view and retrieve the grammatical elements that appear in them. Then, we present a document structure grammar which governs the logical structure of the form documents. Finally, we propose a structure analysis system of the table form documents based on the grammar. By using grammar notation, we can easily modify and keep it consistent, as the rules are relatively simple. Another advantage of using grammar notation is that it can be used for generating documents only from logical structure. In our system, documents are assumed to be composed of a set of boxes and they are classified as seven box types. Then the box relations between the indication box and its associated entry box are analyzed based on the semantic and geometric knowledge defined in the document structure grammar. Experimental results have shown that the system successfully analyzed several kinds of table forms.
Keywords:Form processing  Document models  Document analysis systems
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号