Modeling broken characters recognition as a set-partitioning problem |
| |
Authors: | Chaivatna Sumetphong Supachai Tangwongsan |
| |
Affiliation: | 1. Department of Geography and Geology, Geology Section, University of Turku, FI-20014 Turku, Finland;2. Department of Biology, University of Turku, FI-20014 Turku, Finland;1. Theoretical Separation Science Laboratory, Kroungold Analytical Inc., 1299 Butler Pike, Blue Bell, PA 19422, USA;2. Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455-0132, USA;3. Advanced Materials Technology, Inc., Suite 1-K, Quillen Building, 3521 Silverside Rd, Wilmington, DE 19810, USA;4. Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE 19716, USA;1. School of Economics and Management, Harbin Institute of Technology, Shenzhen 518055, China;2. School of Business and Management, Hong Kong University of Science and Technology, Kowloon, Hong Kong |
| |
Abstract: | This paper presents a novel technique for recognizing broken characters found in degraded text documents by modeling it as a set-partitioning problem (SPP). The proposed technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm that we call Heuristic Incremental Integer Programming (HIIP). The algorithm employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. The objective function is formulated as probability functions that reflect common OCR measurements – pattern resemblance, sizing conformity and distance between connected components. We applied the HIIP technique to Thai and English degraded text documents and achieved accuracy rates over 90%. We also compared HIIP against three competing algorithms and achieved higher comparative accuracy in each case. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|