Segmentation and classification of mixed text/graphics/image documents |
| |
Authors: | Kuo-Chin Fan Chi-Hwa Liu Yuan-Kai Wang |
| |
Affiliation: | Institute of Computer Science and Electronic Engineering, National Central University, Chung-Li, Taiwan, ROC |
| |
Abstract: | In this paper, a feature-based document analysis system is presented which utilizes domain knowledge to segment and classify mixed text/graphics/image documents. In our approach, we first perform a run-length smearing operation followed by the stripe merging procedure to segment the blocks embedded in a document. The classification task is then performed based on the domain knowledge induced from the primitives associated with each type of medium. Proper use of domain knowledge is proved to be effective in accelerating the segmentation speed and decreasing the classification error. The experimental study reveals the feasibility of the new technique in segmenting and classifying mixed text/graphics/image documents. |
| |
Keywords: | Document segmentation Block classification Projection feature Connectivity histogram |
本文献已被 ScienceDirect 等数据库收录! |