Intelligent forms processing system |
| |
Authors: | Richard Casey David Ferguson K Mohiuddin Eugene Walach |
| |
Affiliation: | (1) IBM Professional Services, 520 Capitol Mall, Sacramento, CA, USA;(2) IBM Israel Scientific Center, Haifa, Israel;(3) Integrated Data Management, K52/803, Almaden Research Center, 650 Harry Road, 95120-6099 San Jose, CA, USA |
| |
Abstract: | This paper describes an intelligent forms processing system (IFPS) which provides capabilities for automatically indexing
form documents for storage/retrieval to/from a document library and for capturing information from scanned form images using
intelligent character recognition (ICR). The system also provides capabilities for efficiently storing form images. IFPS consists
of five major processing components: (1) An interactive document analysis stage that analyzes a blank form in order to define
a model of each type of form to be accepted by the system; the parameters of each model are stored in a form library. (2)
A form recognition module that collects features of an input form in order to match it against one represented in the form
library; the primary features used in this step are the pattern of lines defining data areas on the form. (3) A data extraction
component that registers the selected model to the input form, locates data added to the form in fields of interest, and removes
the data image to a separate image area. A simple mask defining the center of the data region suffices to initiate the extraction
process; search routines are invoked to track data that extends beyond the masks. Other special processing is called on to
detect lines that intersect the data image and to delete the lines with minimum distortion to the rest of the image. (4) An
ICR unit that converts the extracted image data to symbol code for input to data base or other conventional processing systems.
Three types of ICR logic have been implemented in order to accommodate monospace typing, proportionally spaced machine text,
and handprinted alphanumerics. (5) A forms dropout module that removes the fixed part of a form and retains only the data
filled in for storage. The stored data can be later combined with the fixed form to reconstruct the original form. This provides
for extremely efficient storage of form images, thus making possible the storage of very large number of forms in the system.
IFPS is implemented as part of a larger image management system called Image and Records Management system (IRM). It is being
applied in forms data management in several state government applications. |
| |
Keywords: | Document analysis optical character recognition image management systems image compression automatic forms processing |
本文献已被 SpringerLink 等数据库收录! |
|