期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

End-to-end scene text recognition using tree-structured models

Cunzhao ShiAuthor Vitae Chunheng WangBaihua XiaoAuthor Vitae Song GaoAuthor VitaeJinlong HuAuthor Vitae 《Pattern recognition》2014

Detecting and recognizing text in natural images are quite challenging and have received much attention from the computer vision community in recent years. In this paper, we propose a robust end-to-end scene text recognition method, which utilizes tree-structured character models and normalized pictorial structured word models. For each category of characters, we build a part-based tree-structured model (TSM) so as to make use of the character-specific structure information as well as the local appearance information. The TSM could detect each part of the character and recognize the unique structure as well, seamlessly combining character detection and recognition together. As the TSMs could accurately detect characters from complex background, for text localization, we apply TSMs for all the characters on the coarse text detection regions to eliminate the false positives and search the possible missing characters as well. While for word recognition, we propose a normalized pictorial structure (PS) framework to deal with the bias caused by words of different lengths. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods both for text localization and word recognition. 相似文献

2.

结合MSCRs与MSERs的自然场景文本检测

下载免费PDF全文

易尧华申春辉刘菊华卢利琼《中国图象图形学报》2017,22(2):154-160

目的目前,基于MSERs(maximally stable extremal regions)的文本检测方法是自然场景图像文本检测的主流方法。但是自然场景图像中部分文本的背景复杂多变,MSERs算法无法将其准确提取出来,降低了该类方法的鲁棒性。本文针对自然场景图像文本背景复杂多变的特点,将MSCRs(maximally stable color regions)算法用于自然场景文本检测,提出一种结合MSCRs与MSERs的自然场景文本检测方法。方法首先采用MSCRs算法与MSERs算法提取候选字符区域;然后利用候选字符区域的纹理特征训练随机森林字符分类器,对候选字符区域进行分类,从而得到字符区域;最后,依据字符区域的彩色一致性和几何邻接关系对字符进行合并,得到最终文本检测结果。结果本文方法在ICDAR 2013上的召回率、准确率和F值分别为71.9%、84.1%和77.5%,相对于其他方法的召回率和F值均有所提高。结论本文方法对自然场景图像文本检测具有较强的鲁棒性,实验结果验证了本文方法的有效性。相似文献

3.

多方向自然场景文本检测

何思楠郭永金张利《计算机应用研究》2018,35(7)

针对自然场景图像背景复杂和文本方向不确定的问题,提出一种多方向自然场景文本检测的方法。首先利用颜色增强的最大稳定极值区域(C-MSER)方法对图像中的字符候选区域进行提取,并利用启发式规则和LIBSVM分类器对非字符区域进行消除;然后设计位置颜色模型将被误滤除的字符找回,并利用字符区域中心进行拟合估计文本行倾斜角度;最后通过一个CNN分类器得到精确的结果。该算法在两个标准数据集上(ICDAR2011和ICDAR2013)上进行了测试,f-score分别为0.81和0.82,证明了该方法的有效性。相似文献

4.

一种直接高效的自然场景汉字逼近定位方法

下载免费PDF全文

赵凡张琳闻治泉杨林林蔺广逢《计算机工程与应用》2021,57(6):159-167

为了提高经典目标检测算法对自然场景文本定位的准确性,以及克服传统字符检测模型由于笔画间存在非连通性引起的汉字错误分割问题,提出了一种直接高效的自然场景汉字逼近定位方法。采用经典的EAST算法对场景图像中的文字进行检测。对初检的文字框进行调整使其更紧凑和更完整地包含文字,主要由提取各连通笔画成分、汉字分割和文字形状逼近三部分组成。矫正文字区域和识别文字内容。实验结果表明,提出的算法在保持平均帧率为3.1 帧/s的同时,对ICDAR2015、ICDAR2017-MLT和MSRA-TD500三个多方向数据集上文本定位任务中的F-score分别达到83.5%、72.8%和81.1%;消融实验验证了算法中各模块的有效性。在ICDAR2015数据集上的检测和识别综合评估任务中的性能也验证了该方法相比一些最新方法取得了更好的性能。相似文献

5.

Scene text detection by adaptive feature selection with text scale-aware loss

Wu Qin Luo Wenli Chai Zhilei Guo Guodong 《Applied Intelligence》2022,52(1):514-529

Since convolutional neural networks(CNNs) were applied to scene text detection, the accuracy of text detection has been improved a lot. However, limited by the receptive fields of regular CNNs and due to the large scale variations of texts in images, current text detection methods may fail to detect some texts well when dealing with more challenging text instances, such as arbitrarily shaped texts and extremely small texts. In this paper, we propose a new segmentation based scene text detector, which is equipped with deformable convolution and global channel attention. In order to detect texts of arbitrary shapes, our method replaces traditional convolutions with deformable convolutions, the sampling locations of deformable convolutions are deformed with augmented offsets so that it can better adapt to any shapes of texts, especially curved texts. To get more representative features for texts, an Adaptive Feature Selection module is introduced to better exploit text content through global channel attention. Meanwhile, a scale-aware loss, which adjusts the weights of text instances with different sizes, is formulated to solve the text scale variation problem. Experiments on several standard benchmarks, including ICDAR2015, SCUT-CTW1500, ICDAR2017-MLT and MSRA-TD500 verify the superiority of the proposed method.

相似文献

6.

A Character Flow Framework for Multi-Oriented Scene Text Detection

下载免费PDF全文

Wen-Jun Yang Bei-Ji Zou Kai-Wen Li Shu Liu 《计算机科学技术学报》2021,36(3):465-477

Scene text detection plays a significant role in various applications,such as object recognition,document management,and visual navigation.The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts.However,a large number of non-text pixels exist in the labels during the model training,leading to text mis-segmentation.In this paper,we propose a novel multi-oriented scene text detection framework,which includes two main modules:character instance segmentation (one instance corresponds to one character),and character flow construction (one character flow corresponds to one word).We use feature pyramid network(FPN) to predict character and non-character instances with arbitrary directions.A joint network of FPN and bidirectional long short-term memory (BLSTM) is developed to explore the context information among isolated characters,which are finally grouped into character flows.Extensive experiments are conducted on ICDAR2013,ICDAR2015,MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach.The F-measures are 92.62％,88.02％,83.69％ and 77.81％,respectively. 相似文献

7.

Integrating multiple character proposals for robust scene text extraction

SeongHun Lee Jin Hyung Kim 《Image and vision computing》2013

Text contained in scene images provides the semantic context of the images. For that reason, robust extraction of text regions is essential for successful scene text understanding. However, separating text pixels from scene images still remains as a challenging issue because of uncontrolled lighting conditions and complex backgrounds. In this paper, we propose a two-stage conditional random field (TCRF) approach to robustly extract text regions from the scene images. The proposed approach models the spatial and hierarchical structures of the scene text, and it finds text regions based on the scene text model. In the first stage, the system generates multiple character proposals for the given image by using multiple image segmentations and a local CRF model. In the second stage, the system selectively integrates the generated character proposals to determine proper character regions by using a holistic CRF model. Through the TCRF approach, we cast the scene text separation problem as a probabilistic labeling problem, which yields the optimal label configuration of pixels that maximizes the conditional probability of the given image. Experimental results indicate that our framework exhibits good performance in the case of the public databases. 相似文献

8.

基于色彩空间的最大稳定极值区域的自然场景文本检测

范一华邓德祥颜佳《计算机应用》2018,38(1):264-269

针对传统的最大稳定极值区域（MSER）方法无法很好地提取低对比度图像文本区域的问题,提出一种新的基于边缘增强的场景文本检测方法。首先,通过方向梯度值（HOG）有效地改进MSER方法,增强MSER方法对低对比度图像的鲁棒性,并在色彩空间分别求取最大稳定极值区域;其次,利用贝叶斯模型进行分类,主要采用笔画宽度、边缘梯度方向、拐角点三个平移旋转不变性特征剔除非字符区域;最后,利用字符的几何特性将字符整合成文本行,在公共数据集国际分析与文档识别（ICDAR）2003和ICDAR 2013评估了算法性能。实验结果表明,基于色彩空间的边缘增强的MSER方法能够解决背景复杂和不能从对比度低的场景图像中正确提取文本区域的问题。基于贝叶斯模型的分类方法在小样本的情况下能够更好地筛选字符,实现较高的召回率。相比传统的MSER进行文本检测的方法,所提方法提高了系统的检测率和实时性。相似文献

9.

基于神经网络的自然场景方向文本检测器

周铂焱杨鹏《计算机与数字工程》2020,48(1):163-166

场景文本检测是场景文本识别中重要的一步,也是一个具有挑战性的问题。不同于一般的目标检测,场景文本检测的主要挑战在于自然场景图像中的文本具有任意方向,小的尺寸,以及多种宽高比。论文在TextBoxes[8]的基础上进行改进,提出了一个适用于任意方向文本的检测器,命名为OSTD(Oriented Scene Text Detector),可以有效且准确地检测自然场景中任意方向的文本。论文在公共数据集上对提出OSTD的进行评估。所有实验结果都表明,无论在准确性,还是实时性方面OSTD都是极具竞争力的方法。在1024×1024的ICDAR2015 Incidental Text数据集[16]上,OSTD的F-Measure=0.794,FPS=10.7。相似文献

10.

Unsupervised refinement of color and stroke features for text binarization

Anand Mishra Karteek Alahari C. V. Jawahar 《International Journal on Document Analysis and Recognition》2017,20(2):105-121

Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images. 相似文献

11.

A graph-based approach for segmenting touching lines in historical handwritten documents

David Fernández-Mota Josep Lladós Alicia Fornés 《International Journal on Document Analysis and Recognition》2014,17(3):293-312

Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data. 相似文献

12.

复杂场景文本段识别

王孝男张利何思楠《计算机应用研究》2019,36(9)

针对背景复杂或者存在字符黏连时文本段图片无法准确切分的情况进行了研究,提出了一种复杂场景文本段识别方法。该方法利用图像和文字序列的相关性设计双向递归神经网络对图像特征序列进行编码,然后设计集成的连接时间分类（CTC）和注意力（attention）模块对编码特征进行解码输出。该算法在多个数据集（公开数据集ICDAR2013和ICDAR2003以及验证码数据集）上进行测试,得到识别准确率分别为90.2%、87.4%和92.5%,从而证明了该算法的有效性。实验结果对文本段识别和应用有重要意义。相似文献

13.

Journey of scene text components recognition: Progress and open issues

Sengupta Payel Mollah Ayatullah Faruk 《Multimedia Tools and Applications》2021,80(4):6079-6104

In computer vision, scene text component recognition is an important problem in end-to-end scene text reading systems. It involves two major sub-problems - segmentation of such components into scene characters and classification of segmented characters into known character classes. Significant attention and increasingly focused research efforts are being put forth and reasonable progress in this field has already been made, though a diversity of challenges like background complexity, variety of text appearances, noise, blur, distortion and various other degradation and deformation issues are still left to address. In this paper, we present (i) a detail survey of scene component segmentation and/or recognition methods reported so far in literature, (ii) related datasets available for quantitative evaluation and benchmarking segmentation and/or recognition performance, (iii) comparative results and analysis over the reported methods, and (iv) discussion on open areas to be looked into in order to achieve the desired goal of end-to-end scene text recognition. Moreover, this paper provides an acceptable reference for researcher in the area of scene text components segmentation and recognition.

相似文献

14.

Recognizing characters in scene images 总被引：11，自引：0，他引：11

Ohya J. Shio A. Akamatsu S. 《IEEE transactions on pattern analysis and machine intelligence》1994,16(2):214-220

An effective algorithm for character recognition in scene images is studied. Scene images are segmented into regions by an image segmentation method based on adaptive thresholding. Character candidate regions are detected by observing gray-level differences between adjacent regions. To ensure extraction of multisegment characters as well as single-segment characters, character pattern candidates are obtained by associating the detected regions according to their positions and gray levels. A character recognition process selects patterns with high similarities by calculating the similarities between character pattern candidates and the standard patterns in a dictionary and then comparing the similarities to the thresholds. A relaxational approach to determine character patterns updates the similarities by evaluating the interactions between categories of patterns, and finally character patterns and their recognition results are obtained. Highly promising experimental results have been obtained using the method on 100 images involving characters of different sizes and formats under uncontrolled lighting 相似文献

15.

一种基于连通分量的文本区域定位方法

姚金良翁璐斌王小华《模式识别与人工智能》2012,25(2):325-331

文本区域定位对复杂背景图像中的字符识别和检索具有重要意义。已有方法取得高的定位准确率和召回率,但效率较低,难以应用于实际的系统中。文中提出一种基于连通分量过滤和K-means聚类的文本区域定位方法。该方法首先对图像进行自适应分割,对字符颜色层提取连通分量。然后提取连通分量的特征,并用Adaboost分类器过滤非字符连通分量。最后,对候选的字符连通分量根据其位置和颜色层进行K-means聚类来定位文本区域。实验结果显示该方法具有与当前方法相当的准确率和召回率,同时具有较低的计算复杂度。相似文献

16.

A robust arbitrary text detection system for natural scene images

《Expert systems with applications》2014,41(18):8027-8048

Text detection in the real world images captured in unconstrained environment is an important yet challenging computer vision problem due to a great variety of appearances, cluttered background, and character orientations. In this paper, we present a robust system based on the concepts of Mutual Direction Symmetry (MDS), Mutual Magnitude Symmetry (MMS) and Gradient Vector Symmetry (GVS) properties to identify text pixel candidates regardless of any orientations including curves (e.g. circles, arc shaped) from natural scene images. The method works based on the fact that the text patterns in both Sobel and Canny edge maps of the input images exhibit a similar behavior. For each text pixel candidate, the method proposes to explore SIFT features to refine the text pixel candidates, which results in text representatives. Next an ellipse growing process is introduced based on a nearest neighbor criterion to extract the text components. The text is verified and restored based on text direction and spatial study of pixel distribution of components to filter out non-text components. The proposed method is evaluated on three benchmark datasets, namely, ICDAR2005 and ICDAR2011 for horizontal text evaluation, MSRA-TD500 for non-horizontal straight text evaluation and on our own dataset (CUTE80) that consists of 80 images for curved text evaluation to show its effectiveness and superiority over existing methods. 相似文献

17.

Text recognition in multimedia documents: a study of two neural-based OCRs using and avoiding character segmentation

Khaoula Elagouni Christophe Garcia Franck Mamalet Pascale Sébillot 《International Journal on Document Analysis and Recognition》2014,17(1):19-31

Text embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based optical character recognition (OCR) systems that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods. 相似文献

18.

基于FCN的多方向自然场景文字检测方法

下载免费PDF全文

杨剑锋王润民何璇李秀梅钱盛友《计算机工程与应用》2020,56(2):164-170

传统的自然场景文字检测方法所采用的手工设计特征在应对复杂自然场景时缺乏鲁棒性。针对复杂自然场景中的多方向文字检测问题,提出了一种新的基于深度学习文字检测方法,采用全卷积网络（Fully Convolutional Networks,FCN）并融合多尺度文字特征图,结合语义分割的方法分割文字候选区域,利用分割得到的文字候选区域直接获取文字候选检测框并进行扩大补偿处理,对文字候选检测框进行后处理得到最终检测结果。该方法在ICDAR2013、ICDAR2015标准数据集进行了测评,实验结果表明该方法相比一些最新方法取得了更好的性能。相似文献

19.

一种端到端的自然场景文本检测与识别模型

陈鹏李鸣张宇王志鹏《测控技术》2022,41(7):17-22

提出了一种结合卷积神经网络和递归神经网络的有效的端到端场景文本识别方法。首先使用特征金字塔(FPN)提取图像的多尺度特征,然后将引入残差网络(ResNet)的深度双向递归网络(Bi-LSTM)对这些特征进行编码,获得文本序列特征,进而引入注意力机制(Attention)对文本序列特征进行解码达到识别效果。在ICDAR2013、ICDAR2015数据集实验验证了该算法的有效性,该方法不仅降低了训练难度,而且提升了网络的收敛速度,提高了文本识别准确率。该方法的有效性在ICDAR2013、ICDAR2015数据集上得到了充分验证。相似文献

20.

An object-based and heterogeneous segment filter convolutional neural network for high-resolution remote sensing image classiﬁcation

Xin Pan Jian Zhao Jun Xu 《International journal of remote sensing》2019,40(15):5892-5916

In recent years, object-based segmentation methods and shallow-model classification algorithms have been widely integrated for remote sensing image supervised classification. However, as the image resolution increases, remote sensing images contain increasingly complex characteristics, leading to higher intraclass heterogeneity and interclass homogeneity and thus posing substantial challenges for the application of segmentation methods and shallow-model classification algorithms. As important methods of deep learning technology, convolutional neural networks (CNNs) can hierarchically extract higher-level spatial features from images, providing CNNs with a more powerful recognition ability for target detection and scene classification in high-resolution remote sensing images. However, the input of the traditional CNN is an image patch, the shape of which is scarcely consistent with a given segment. This inconsistency may lead to errors when directly using CNNs in object-based remote sensing classification: jagged errors may appear along the land cover boundaries, and some land cover areas may overexpand or shrink, leading to many obvious classification errors in the resulting image. To address the above problem, this paper proposes an object-based and heterogeneous segment filter convolutional neural network (OHSF-CNN) for high-resolution remote sensing image classi?cation. Before the CNN processes an image patch, the OHSF-CNN includes a heterogeneous segment filter (HSF) to process the input image. For the segments in the image patch that are obviously different from the segment to be classified, the HSF can differentiate them and reduce their negative influence on the CNN training and decision-making processes. Experimental results show that the OHSF-CNN not only can take full advantage of the recognition capabilities of deep learning methods but also can effectively avoid the jagged errors along land cover boundaries and the expansion/shrinkage of land cover areas originating from traditional CNN structures. Moreover, compared with the traditional methods, the proposed OHSF-CNN can achieve higher classification accuracy. Furthermore, the OHSF-CNN algorithm can serve as a bridge between deep learning technology and object-based segmentation algorithms thereby enabling the application of object-based segmentation methods to more complex high-resolution remote sensing images. 相似文献