首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reading text in natural images has focused again the attention of many researchers during the last few years due to the increasing availability of cheap image-capturing devices in low-cost products like mobile phones. Therefore, as text can be found on any environment, the applicability of text-reading systems is really extensive. For this purpose, we present in this paper a robust method to read text in natural images. It is composed of two main separated stages. Firstly, text is located in the image using a set of simple and fast-to-compute features highly discriminative between character and non-character objects. They are based on geometric and gradient properties. The second part of the system carries out the recognition of the previously detected text. It uses gradient features to recognize single characters and Dynamic Programming (DP) to correct misspelled words. Experimental results obtained with different challenging datasets show that the proposed system exceeds state-of-the-art performance, both in terms of localization and recognition.  相似文献   

2.
目的 目前基于卷积神经网络(CNN)的文本检测方法对自然场景中小尺度文本的定位非常困难。但自然场景图像中文本目标与其他目标存在很强的关联性,即自然场景中的文本通常伴随特定物体如广告牌、路牌等同时出现,基于此本文提出了一种顾及目标关联的级联CNN自然场景文本检测方法。方法 首先利用CNN检测文本目标及包含文本的关联物体目标,得到文本候选框及包含文本的关联物体候选框;再扩大包含文本的关联物体候选框区域,并从原始图像中裁剪,然后以该裁剪图像作为CNN的输入再精确检测文本候选框;最后采用非极大值抑制方法融合上述两步生成的文本候选框,得到文本检测结果。结果 本文方法能够有效地检测小尺度文本,在ICDAR-2013数据集上召回率、准确率和F值分别为0.817、0.880和0.847。结论 本文方法顾及自然场景中文本目标与包含文本的物体目标的强关联性,提高了自然场景图像中小尺度文本检测的召回率。  相似文献   

3.
针对现有优秀的anchor-free文本检测方法只挖掘了文本框几何特性而没有考虑文本框位置特性且缺乏有效的过滤机制,提出了挖掘文本框位置特性的anchor-free自然场景文本检测方法.该方法以ResNet50作为卷积神经网络的主干网络,将多个不同尺寸的特征层融合后预测文本框的几何特性和位置特性,最后辅之以二层过滤机制得到最终的检测文本框.在公开的数据集ICDAR2013和ICDAR2011上F值分别达到了0.870和0.861,证明了该方法的有效性.  相似文献   

4.
针对自然场景中维吾尔文检测难度大的问题,改进单深层神经网络对自然场景中维吾尔文进行检测。该网络结构由维吾尔文特征提取组件和多层特征融合的文本检测组件组成,以端到端的方式训练学习预测维吾尔文文本框的位置以及置信度。维吾尔文特征提取组件利用卷积神经网络提取自然场景维吾尔文图像中的多尺度和多层级维吾尔文特征。多层特征融合的文本检测组件则使用维吾尔文特征提取组件提取的特征,预测文本框的位置和维吾尔文类别的置信度。分析发现与中英文检测不同,维吾尔文文本具有更特殊的特征,针对这种特性设计了多宽高比和多尺寸大小的默认框并调整了部分卷积核的大小。经在自然场景中具有维吾尔文的图片集实验表明,改进的单深层神经网络方法考虑了图像的多尺度和多层级特征对检测精度的影响,算法的准确率和◢F◣值分别达到了0.723 4和0.611 5,提高了检测的准确率。  相似文献   

5.
We analyze some spatial frequency-based features used for text region detection in natural scene images, and redefine the DCT-based feature. We employ Fisher’s discriminant analysis to improve the DCT-based feature and to achieve higher accuracy. An unsupervised thresholding method for discriminating text and non-text regions is introduced and tested as well. Experimental results show that a wide high frequency band, covering some lower-middle frequency components, is generally more suitable for scene text detection despite the original definition of the DCT-based feature.  相似文献   

6.
Video text detection and segmentation for optical character recognition   总被引:1,自引:0,他引:1  
In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004  相似文献   

7.
This paper proposes a two-stage system for text detection in video images. In the first stage, text lines are detected based on the edge map of the image leading in a high recall rate with low computational time expenses. In the second stage, the result is refined using a sliding window and an SVM classifier trained on features obtained by a new Local Binary Pattern-based operator (eLBP) that describes the local edge distribution. The whole algorithm is used in a multiresolution fashion enabling detection of characters for a broad size range. Experimental results, based on a new evaluation methodology, show the promising overall performance of the system on a challenging corpus, and prove the superior discriminating ability of the proposed feature set against the best features reported in the literature.  相似文献   

8.
We report on the development and implementation of a robust algorithm for extracting text in digitized color video. The algorithm first computes maximum gradient difference to detect potential text line segments from horizontal scan lines of the video. Potential text line segments are then expanded or combined with potential text line segments from adjacent scan lines to form text blocks, which are then subject to filtering and refinement. Color information is then used to more precisely locate text pixels within the detected text blocks. The robustness of the algorithm is demonstrated by using a variety of color images digitized from broadcast television for testing. The algorithm also performs well on images after JPEG compression and decompression, and on images corrupted with different types of noise.  相似文献   

9.
Recently, segmentation-based scene text detection has drawn a wide research interest due to its flexibility in describing scene text instance of arbitrary shapes such as curved texts. However, existing methods usually need complex post-processing stages to process ambiguous labels, i.e., the labels of the pixels near the text boundary, which may belong to the text or background. In this paper, we present a framework for segmentation-based scene text detection by learning from ambiguous labels. We use the label distribution learning method to process the label ambiguity of text annotation, which achieves a good performance without using additional post-processing stage. Experiments on benchmark datasets demonstrate that our method produces better results than state-of-the-art methods for segmentation-based scene text detection.  相似文献   

10.
Multimedia Tools and Applications - Text detection in natural scene images is a challenging problem in computer vision. To robust detect various texts in complex scenes, a hierarchical recursive...  相似文献   

11.
许多自然场景图像中都包含丰富的文本,它们对于场景理解有着重要的作用。随着移动互联网技术的飞速发展,许多新的应用场景都需要利用这些文本信息,例如招牌识别和自动驾驶等。因此,自然场景文本的分析与处理也越来越成为计算机视觉领域的研究热点之一,该任务主要包括文本检测与识别。传统的文本检测和识别方法依赖于人工设计的特征和规则,且模型设计复杂、效率低、泛化性能差。随着深度学习的发展,自然场景文本检测、自然场景文本识别以及端到端的自然场景文本检测与识别都取得了突破性的进展,其性能和效率都得到了显著提高。本文介绍了该领域相关的研究背景,对基于深度学习的自然场景文本检测、识别以及端到端自然场景文本检测与识别的方法进行整理分类、归纳和总结,阐述了各类方法的基本思想和优缺点。并针对隶属于不同类别下的方法,进一步论述和分析这些主要模型的算法流程、适用场景和技术发展路线。此外,列举说明了部分主流公开数据集,对比了各个模型方法在代表性数据集上的性能情况。最后总结了目前不同场景数据下的自然场景文本检测、识别及端到端自然场景文本检测与识别算法的局限性以及未来的挑战和发展趋势。  相似文献   

12.
Text contained in scene images provides the semantic context of the images. For that reason, robust extraction of text regions is essential for successful scene text understanding. However, separating text pixels from scene images still remains as a challenging issue because of uncontrolled lighting conditions and complex backgrounds. In this paper, we propose a two-stage conditional random field (TCRF) approach to robustly extract text regions from the scene images. The proposed approach models the spatial and hierarchical structures of the scene text, and it finds text regions based on the scene text model. In the first stage, the system generates multiple character proposals for the given image by using multiple image segmentations and a local CRF model. In the second stage, the system selectively integrates the generated character proposals to determine proper character regions by using a holistic CRF model. Through the TCRF approach, we cast the scene text separation problem as a probabilistic labeling problem, which yields the optimal label configuration of pixels that maximizes the conditional probability of the given image. Experimental results indicate that our framework exhibits good performance in the case of the public databases.  相似文献   

13.
俸亚特  文益民 《计算机应用》2021,41(12):3551-3557
针对越南场景文字检测训练数据缺乏及越南文字声调符号检测不全的问题,在改进的实例分割网络Mask R-CNN的基础上,提出一种针对越南场景文字的检测算法。为了准确地分割带声调符号的越南场景文字,该算法仅使用P2特征层来分割文字区域,并将文字区域的掩码矩阵大小从14×14调整为14×28以更好地适应文字区域。针对用常规非极大值抑制(NMS)算法不能剔除重复文字检测框的问题,设计了一个针对文字区域的文本区域过滤模块并添加在检测模块之后,以有效地剔除冗余检测框。使用模型联合训练的方法训练网络,训练过程包含两部分:第一部分为特征金字塔网络(FPN)和区域生成网络(RPN)的训练,训练使用的数据集为大规模公开的拉丁文字数据,目的是增强模型在不同场景下提取文字的泛化能力;第二部分为候选框坐标回归模块和区域分割模块的训练,此部分模型参数使用像素级标注的越南场景文字数据进行训练,使模型能对包括声调符号的越南文字区域进行分割。大量交叉验证实验和对比实验结果表明,与Mask R-CNN相比,所提算法在不同的交并比(IoU)阈值下都具有更好的准确率与召回率。  相似文献   

14.
Pattern Analysis and Applications - The most important intricacy when processing natural scene text images is the existence of fog, smoke or haze. These intrusion elements decrease the contrast and...  相似文献   

15.
16.
在基于维吾尔文特殊字体的基础上,提出了一种维吾尔文视频文字定位方法。该方法首先利用RGB彩色边缘检测算子获得水平、垂直、右上方和左上方的边缘图,然后根据加权后的边缘图提取图像的纹理特征,用改进的模糊C均值聚类算法检测出候选的文本区域,根据文本区域的启发式规则,去除虚假的文本区域,最后由维吾尔文本的基线特征判定检测出的区域是否为维吾尔文本区域。实验结果表明,这种方法在简单背景和复杂背景视频图像中均具有较好的效果。  相似文献   

17.
A multi-population genetic algorithm for robust and fast ellipse detection   总被引:2,自引:0,他引:2  
This paper discusses a novel and effective technique for extracting multiple ellipses from an image, using a genetic algorithm with multiple populations (MPGA). MPGA evolves a number of subpopulations in parallel, each of which is clustered around an actual or perceived ellipse in the target image. The technique uses both evolution and clustering to direct the search for ellipses—full or partial. MPGA is explained in detail, and compared with both the widely used randomized Hough transform (RHT) and the sharing genetic algorithm (SGA). In thorough and fair experimental tests, using both synthetic and real-world images, MPGA exhibits solid advantages over RHT and SGA in terms of accuracy of recognition—even in the presence of noise or/and multiple imperfect ellipses in an image—and speed of computation.  相似文献   

18.
一种基于Homogeneity的文本检测新方法   总被引:1,自引:0,他引:1  
视频图像中的文本包含了丰富的语义层次上的内容描述信息,为基于语义的图像检索提供重要的索引信息资源.提出了一种基于Homogeneity和支持向量机(support vector machine)的视频图像中文本检测方法,首先将图像由空间域映射到Homogeneity域中,然后对映射到Homogeneity空间中的图像进行特征提取,利用SVM判别文本区域.实验表明此文本检测方法优于用基于边缘特征的文本检测方法.  相似文献   

19.
目的 目前,基于MSERs(maximally stable extremal regions)的文本检测方法是自然场景图像文本检测的主流方法。但是自然场景图像中部分文本的背景复杂多变,MSERs算法无法将其准确提取出来,降低了该类方法的鲁棒性。本文针对自然场景图像文本背景复杂多变的特点,将MSCRs(maximally stable color regions)算法用于自然场景文本检测,提出一种结合MSCRs与MSERs的自然场景文本检测方法。方法 首先采用MSCRs算法与MSERs算法提取候选字符区域;然后利用候选字符区域的纹理特征训练随机森林字符分类器,对候选字符区域进行分类,从而得到字符区域;最后,依据字符区域的彩色一致性和几何邻接关系对字符进行合并,得到最终文本检测结果。结果 本文方法在ICDAR 2013上的召回率、准确率和F值分别为71.9%、84.1%和77.5%,相对于其他方法的召回率和F值均有所提高。结论 本文方法对自然场景图像文本检测具有较强的鲁棒性,实验结果验证了本文方法的有效性。  相似文献   

20.
基于小波变换的拓片文字边缘检测*   总被引:1,自引:0,他引:1  
针对拓片得到的文字图像具有模糊细节多、效果差等特征,以及传统算法对其边缘检测的精度不高,根据拓片文字边缘独立于尺度传播的特性,提出了一种基于二进小波变换的拓片文字图像边缘提取和增强算法。首先用二进小波对拓片文字图像进行多尺度分解,再结合小波变换模值跨尺度传递的不同特性,进行多尺度下的图像边缘提取、增强和细化。实验表明,该算法克服了传统算法的不足,弱化了单尺度下噪声抑制与边缘细节提取精度之间的矛盾,从而具有更好的实用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号