Text region extraction in a document image based on the Delaunay tessellation |
| |
Authors: | Yi XiaoAuthor Vitae Hong YanAuthor Vitae |
| |
Affiliation: | a School of Electrical and Information Engineering, University of Sydney, NSW 2006, Australia b Department of Electronic Engineering, City University of Hong Kong, Kowlon, Hong Kong |
| |
Abstract: | In this paper, Delaunay triangulation is applied for the extraction of text areas in a document image. By representing the location of connected components in a document image with their centroids, the page structure is described as a set of points in two-dimensional space. When imposing Delaunay triangulation on these points, the text regions in the Delaunay triangulation will have distinguishing triangular features from image and drawing regions. For analysis, the Delaunay triangles are divided into four classes. The study reveals that specific triangles in text areas can be clustered together and identified as text body. Using this method, text regions in a document image containing fragments can also be recognized accurately. Experiments show the method is also very efficient. |
| |
Keywords: | Delaunay triangulation Page segmentation Document image analysis |
本文献已被 ScienceDirect 等数据库收录! |
|