首页 | 本学科首页   官方微博 | 高级检索  
     


Handwritten Chinese text line segmentation by clustering with distance metric learning
Authors:Fei Yin  Cheng-Lin Liu[Author vitae]
Affiliation:National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, PR China
Abstract:Separating text lines in unconstrained handwritten documents remains a challenge because the handwritten text lines are often un-uniformly skewed and curved, and the space between lines is not obvious. In this paper, we propose a novel text line segmentation algorithm based on minimal spanning tree (MST) clustering with distance metric learning. Given a distance metric, the connected components (CCs) of document image are grouped into a tree structure, from which text lines are extracted by dynamically cutting the edges using a new hypervolume reduction criterion and a straightness measure. By learning the distance metric in supervised learning on a dataset of pairs of CCs, the proposed algorithm is made robust to handle various documents with multi-skewed and curved text lines. In experiments on a database with 803 unconstrained handwritten Chinese document images containing a total of 8,169 lines, the proposed algorithm achieved a correct rate 98.02% of line detection, and compared favorably to other competitive algorithms.
Keywords:Handwritten text line segmentation  Clustering  Minimal spanning tree (MST)  Distance metric learning  Hypervolume reduction
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号