首页 | 本学科首页   官方微博 | 高级检索  
     


Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images
Authors:Xiabi Liu  Hui Fu  Yunde Jia
Affiliation:1. Laboratoire de Spectrochimie Infrarouge et Raman Bâtiment C5 - UMR 8516 CNRS-Université de Lille1, Sciences et Technologies, 59655 Villeneuve d''Ascq Cedex, France;2. Institut des Sciences Moléculaires, UMR n° 5255 CNRS-Université Bordeaux, 351, Cours de la Libération, 33405 Talence Cedex, France;3. Institute of Solution Chemistry RAS, Ivanovo, Russia
Abstract:This paper proposes an approach based on the statistical modeling and learning of neighboring characters to extract multilingual texts in images. The case of three neighboring characters is represented as the Gaussian mixture model and discriminated from other cases by the corresponding ‘pseudo-probability’ defined under Bayes framework. Based on this modeling, text extraction is completed through labeling each connected component in the binary image as character or non-character according to its neighbors, where a mathematical morphology based method is introduced to detect and connect the separated parts of each character, and a Voronoi partition based method is advised to establish the neighborhoods of connected components. We further present a discriminative training algorithm based on the maximum–minimum similarity (MMS) criterion to estimate the parameters in the proposed text extraction approach. Experimental results in Chinese and English text extraction demonstrate the effectiveness of our approach trained with the MMS algorithm, which achieved the precision rate of 93.56% and the recall rate of 98.55% for the test data set. In the experiments, we also show that the MMS provides significant improvement of overall performance, compared with influential training criterions of the maximum likelihood (ML) and the maximum classification error (MCE).
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号