Handwritten Chinese text line segmentation by clustering with distance metric learning |
| |
Authors: | Fei Yin Cheng-Lin Liu[Author vitae] |
| |
Affiliation: | National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, PR China |
| |
Abstract: | Separating text lines in unconstrained handwritten documents remains a challenge because the handwritten text lines are often un-uniformly skewed and curved, and the space between lines is not obvious. In this paper, we propose a novel text line segmentation algorithm based on minimal spanning tree (MST) clustering with distance metric learning. Given a distance metric, the connected components (CCs) of document image are grouped into a tree structure, from which text lines are extracted by dynamically cutting the edges using a new hypervolume reduction criterion and a straightness measure. By learning the distance metric in supervised learning on a dataset of pairs of CCs, the proposed algorithm is made robust to handle various documents with multi-skewed and curved text lines. In experiments on a database with 803 unconstrained handwritten Chinese document images containing a total of 8,169 lines, the proposed algorithm achieved a correct rate 98.02% of line detection, and compared favorably to other competitive algorithms. |
| |
Keywords: | Handwritten text line segmentation Clustering Minimal spanning tree (MST) Distance metric learning Hypervolume reduction |
本文献已被 ScienceDirect 等数据库收录! |
|