Attributed point matching for automatic groundtruth generation |
| |
Authors: | Doe-Wan Kim Tapas Kanungo |
| |
Affiliation: | (1) Information Sciences Institute/USC, 3811 North Fairfax Dr., Suite 200, Arlington, VA 22030, USA; e-mail: dwkim@isi.edu IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA; e-mail: kanungo@us.ibm.com , US |
| |
Abstract: | Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition
(OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned
document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically
registered the original image to the rescanned one using four corner points and then transformed the original groundtruth
using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing
the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance
between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout
complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using
the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth
for microfilmed and FAXed versions of the University of Washington dataset documents.
Received: July 24, 2001 / Accepted: May 20, 2002 |
| |
Keywords: | : Image registration – Attributed point matching – Branch-and-bound – Automatic groundtruth generation – Microfilm
– FAX |
本文献已被 SpringerLink 等数据库收录! |
|