共查询到20条相似文献,搜索用时 0 毫秒
1.
目的 目前基于卷积神经网络(CNN)的文本检测方法对自然场景中小尺度文本的定位非常困难。但自然场景图像中文本目标与其他目标存在很强的关联性,即自然场景中的文本通常伴随特定物体如广告牌、路牌等同时出现,基于此本文提出了一种顾及目标关联的级联CNN自然场景文本检测方法。 方法 首先利用CNN检测文本目标及包含文本的关联物体目标,得到文本候选框及包含文本的关联物体候选框;再扩大包含文本的关联物体候选框区域,并从原始图像中裁剪,然后以该裁剪图像作为CNN的输入再精确检测文本候选框;最后采用非极大值抑制方法融合上述两步生成的文本候选框,得到文本检测结果。 结果 本文方法能够有效地检测小尺度文本,在ICDAR-2013数据集上召回率、准确率和F值分别为0.817、0.880和0.847。 结论 本文方法顾及自然场景中文本目标与包含文本的物体目标的强关联性,提高了自然场景图像中小尺度文本检测的召回率。 相似文献
2.
Text detection in the real world images captured in unconstrained environment is an important yet challenging computer vision problem due to a great variety of appearances, cluttered background, and character orientations. In this paper, we present a robust system based on the concepts of Mutual Direction Symmetry (MDS), Mutual Magnitude Symmetry (MMS) and Gradient Vector Symmetry (GVS) properties to identify text pixel candidates regardless of any orientations including curves (e.g. circles, arc shaped) from natural scene images. The method works based on the fact that the text patterns in both Sobel and Canny edge maps of the input images exhibit a similar behavior. For each text pixel candidate, the method proposes to explore SIFT features to refine the text pixel candidates, which results in text representatives. Next an ellipse growing process is introduced based on a nearest neighbor criterion to extract the text components. The text is verified and restored based on text direction and spatial study of pixel distribution of components to filter out non-text components. The proposed method is evaluated on three benchmark datasets, namely, ICDAR2005 and ICDAR2011 for horizontal text evaluation, MSRA-TD500 for non-horizontal straight text evaluation and on our own dataset (CUTE80) that consists of 80 images for curved text evaluation to show its effectiveness and superiority over existing methods. 相似文献
3.
针对现有优秀的anchor-free文本检测方法只挖掘了文本框几何特性而没有考虑文本框位置特性且缺乏有效的过滤机制,提出了挖掘文本框位置特性的anchor-free自然场景文本检测方法.该方法以ResNet50作为卷积神经网络的主干网络,将多个不同尺寸的特征层融合后预测文本框的几何特性和位置特性,最后辅之以二层过滤机制得到最终的检测文本框.在公开的数据集ICDAR2013和ICDAR2011上F值分别达到了0.870和0.861,证明了该方法的有效性. 相似文献
4.
针对越南场景文字检测训练数据缺乏及越南文字声调符号检测不全的问题,在改进的实例分割网络Mask R-CNN的基础上,提出一种针对越南场景文字的检测算法。为了准确地分割带声调符号的越南场景文字,该算法仅使用P2特征层来分割文字区域,并将文字区域的掩码矩阵大小从14×14调整为14×28以更好地适应文字区域。针对用常规非极大值抑制(NMS)算法不能剔除重复文字检测框的问题,设计了一个针对文字区域的文本区域过滤模块并添加在检测模块之后,以有效地剔除冗余检测框。使用模型联合训练的方法训练网络,训练过程包含两部分:第一部分为特征金字塔网络(FPN)和区域生成网络(RPN)的训练,训练使用的数据集为大规模公开的拉丁文字数据,目的是增强模型在不同场景下提取文字的泛化能力;第二部分为候选框坐标回归模块和区域分割模块的训练,此部分模型参数使用像素级标注的越南场景文字数据进行训练,使模型能对包括声调符号的越南文字区域进行分割。大量交叉验证实验和对比实验结果表明,与Mask R-CNN相比,所提算法在不同的交并比(IoU)阈值下都具有更好的准确率与召回率。 相似文献
5.
Multimedia Tools and Applications - Text detection in natural scene images is a challenging problem in computer vision. To robust detect various texts in complex scenes, a hierarchical recursive... 相似文献
7.
行人检测是计算机视觉中重要而有挑战的研究方向。为了提高识别精度,提出了一种更有效的特征提取方法,这个方法的特点是提取梯度方向直方图(HOG)特征时能够获得更多的梯度信息,从而更好地生成表征在更大范围内的图像中或者检测窗口中人体细节的特征描述算子;利用线性核函数(LINEAR)的支持向量机(SVM)和HOG训练得到的行人检测分类器,再采取多尺度检测技术和非极大值抑制能够精确定位行人在图像中的位置。实验结果表明,该行人检测系统检测精度较高。 相似文献
8.
Human detection on emerging intelligent transportation systems is a challenging task in hardware implementation. The histogram of oriented gradients (HOG) based human detection is the most successful algorithm due to its superior performance. Unfortunately, more intensive computations and poor performance at a multi-scale and low-contrast makes human detection more difficult and unreliable. To address the above-stated problems, an efficient histogram of edge oriented gradients (HEOG) based human detection is proposed for preserving the edge gradients at low-contrast and to support the multi-scale detection. The proposed algorithm uses approximation methods and adopts pipelined structure that utilizes low-cost and high-speed respectively. Experiments conducted on various challenging human datasets, shows that the outcome of the proposed method provides efficient detection. This algorithm has been synthesized on Xilinx Virtex-5 FPGA board and achieves better hardware utilization compared to other state-of-the-art approaches. 相似文献
9.
Pattern Analysis and Applications - The most important intricacy when processing natural scene text images is the existence of fog, smoke or haze. These intrusion elements decrease the contrast and... 相似文献
10.
Multimedia Tools and Applications - The problem of text detection and localization in scene images has always been challenging for the researchers over the years due to diversities present in these... 相似文献
12.
Multimedia Tools and Applications - Facial expression recognition plays a significant role in human behavior detection. In this study, we present an efficient and fast facial expression recognition... 相似文献
13.
Multimedia Tools and Applications - Text detection in video/images is challenging due to the presence of multiple blur caused by defocus and motion. In this paper, we present a new method for... 相似文献
14.
International Journal on Document Analysis and Recognition (IJDAR) - How to precisely detect arbitrary-shaped texts in natural images has recently become a new hot topic in areas of computer vision... 相似文献
16.
There are many solutions to prevent the spread of the COVID-19 virus and one of the most effective solutions is wearing a face mask. Almost everyone is wearing face masks at all times in public places during the coronavirus pandemic. This encourages us to explore face mask detection technology to monitor people wearing masks in public places. Most recent and advanced face mask detection approaches are designed using deep learning. In this article, two state-of-the-art object detection models, namely, YOLOv3 and faster R-CNN are used to achieve this task. The authors have trained both the models on a dataset that consists of images of people of two categories that are with and without face masks. This work proposes a technique that will draw bounding boxes (red or green) around the faces of people, based on whether a person is wearing a mask or not, and keeps the record of the ratio of people wearing face masks on the daily basis. The authors have also compared the performance of both the models i.e., their precision rate and inference time. 相似文献
17.
场景文本检测是计算机视觉领域研究的主要方向.文章介绍了近几年深度学习技术在场景文本检测上的应用,包括对场景文本图像检测中存在问题的描述,对近些年场景文本检测算法的分类和分析,以及场景文本检测数据集的介绍.最后总结并展望了未来场景文本检测的发展趋势. 相似文献
18.
This paper proposes a new two-phase approach to robust text detection by integrating the visual appearance and the geometric reasoning rules. In the first phase, geometric rules are used to achieve a higher recall rate. Specifically, a robust stroke width transform (RSWT) feature is proposed to better recover the stroke width by additionally considering the cross of two strokes and the continuousness of the letter border. In the second phase, a classification scheme based on visual appearance features is used to reject the false alarms while keeping the recall rate. To learn a better classifier from multiple visual appearance features, a novel classification method called double soft multiple kernel learning (DS-MKL) is proposed. DS-MKL is motivated by a novel kernel margin perspective for multiple kernel learning and can effectively suppress the influence of noisy base kernels. Comprehensive experiments on the benchmark ICDAR2005 competition dataset demonstrate the effectiveness of the proposed two-phase text detection approach over the state-of-the-art approaches by a performance gain up to 4.4% in terms of F-measure. 相似文献
19.
Developing automated systems to detect and track on-road vehicles is a demanding research area in Intelligent Transportation System (ITS). This article proposes a method for on-road vehicle detection and tracking in varying weather conditions using several region proposal networks (RPNs) of Faster R-CNN. The use of several RPNs in Faster R-CNN is still unexplored in this area of research. The conventional Faster R-CNN produces regions-of-interest (ROIs) through a single fixed sized RPN and therefore cannot detect varying sized vehicles, whereas the present investigation proposes an end-to-end method of on-road vehicle detection where ROIs are generated using several varying sized RPNs and therefore it is able to detect varying sized vehicles. The novelty of the proposed method lies in proposing several varying sized RPNs in conventional Faster R-CNN. The vehicles have been detected in varying weather conditions. Three different public datasets, namely DAWN, CDNet 2014, and LISA datasets have been used to evaluate the performance of the proposed system and it has provided 89.48%, 91.20%, and 95.16% average precision on DAWN, CDNet 2014, and LISA datasets respectively. The proposed system outperforms the existing methods in this regard. 相似文献
20.
针对自然场景中维吾尔文检测难度大的问题,改进单深层神经网络对自然场景中维吾尔文进行检测。该网络结构由维吾尔文特征提取组件和多层特征融合的文本检测组件组成,以端到端的方式训练学习预测维吾尔文文本框的位置以及置信度。维吾尔文特征提取组件利用卷积神经网络提取自然场景维吾尔文图像中的多尺度和多层级维吾尔文特征。多层特征融合的文本检测组件则使用维吾尔文特征提取组件提取的特征,预测文本框的位置和维吾尔文类别的置信度。分析发现与中英文检测不同,维吾尔文文本具有更特殊的特征,针对这种特性设计了多宽高比和多尺寸大小的默认框并调整了部分卷积核的大小。经在自然场景中具有维吾尔文的图片集实验表明,改进的单深层神经网络方法考虑了图像的多尺度和多层级特征对检测精度的影响,算法的准确率和◢F◣值分别达到了0.723 4和0.611 5,提高了检测的准确率。 相似文献
|