首页 | 本学科首页   官方微博 | 高级检索  
     


Sequential alignment attention model for scene text recognition
Affiliation:1. School of Computer Science and Engineering, South China University of Technology, Guangzhou, China;2. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Abstract:Scene text recognition has been a hot research topic in computer vision due to its various applications. The state-of-the-art solutions usually depend on the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. Unfortunately, there often exists severe misalignment between feature areas and text labels in real-world scenarios. To address this problem, this paper proposes a sequential alignment attention model to enhance the alignment between input images and output character sequences. In this model, an attention gated recurrent unit (AGRU) is first devised to distinguish the text and background regions, and further extract the localized features focusing on sequential text regions. Furthermore, CTC guided decoding strategy is integrated into the popular attention-based decoder, which not only helps to boost the convergence of the training but also enhances the well-aligned sequence recognition. Extensive experiments on various benchmarks, including the IIIT5k, SVT, and ICDAR datasets, show that our method substantially outperforms the state-of-the-art methods.
Keywords:Scene text recognition  Attention-gated recurrent unit  Attention mechanism  Connectionist temporal classification
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号