首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new method for text detection and recognition in natural scene images is presented in this paper. In the detection process, color, texture, and OCR statistic features are combined in a coarse-to-fine framework to discriminate texts from non-text patterns. In this approach, color feature is used to group text pixels into candidate text lines. Texture feature is used to capture the “dense intensity variance” property of text pattern. Statistic features from OCR (Optical Character Reader) results are employed to further reduce detection false alarms empirically. After the detection process, a restoration process is used. This process is based on plane-to-plane homography. It is carried out to refine the background plane of text when an affine transformation is detected on a located text and independent of camera parameters. Experimental results tested from a large dataset have demonstrated that the proposed method is effective and practical.  相似文献   

2.
基于图理论聚类的彩色图像文本提取   总被引:3,自引:0,他引:3  
本文提出了一种在彩色图像中进行文本区域的自动提取的方法。首先,应用色彩的统计模型,大大减小了图像的彩色空间的大小;其次,使用基于图理论进行彩色聚类。将图像分解成对应各类的多幅二值图;然后,在这些二值图的基础上进行连通分量分析,提取可能的文本区域,并对这些区域进行鉴别;最后,综合各二值图的提取结果,得到原始彩色图像中的文本区域。对于特定的应用,提取出的文本区域经过进一步的处理,可以输入字符识别(0CR)系统中进行识别。实验结果显示了本文提出的方法的有效性.  相似文献   

3.
Road traffic sign detection and classification   总被引:4,自引:0,他引:4  
A vision-based vehicle guidance system for road vehicles can have three main roles: (1) road detection; (2) obstacle detection; and (3) sign recognition. The first two have been studied for many years and with many good results, but traffic sign recognition is a less-studied field. Traffic signs provide drivers with very valuable information about the road, in order to make driving safer and easier. The authors think that traffic signs most play the same role for autonomous vehicles. They are designed to be easily recognized by human drivers mainly because their color and shapes are very different from natural environments. The algorithm described in this paper takes advantage of these features. It has two main parts. The first one, for the detection, uses color thresholding to segment the image and shape analysis to detect the signs. The second one, for the classification, uses a neural network. Some results from natural scenes are shown  相似文献   

4.
Due to the rapid development of mobile devices equipped with cameras, instant translation of any text seen in any context is possible. Mobile devices can serve as a translation tool by recognizing the texts presented in the captured scenes. Images captured by cameras will embed more external or unwanted effects which need not to be considered in traditional optical character recognition (OCR). In this paper, we segment a text image captured by mobile devices into individual single characters to facilitate OCR kernel processing. Before proceeding with character segmentation, text detection and text line construction need to be performed in advance. A novel character segmentation method which integrates touched character filters is employed on text images captured by cameras. In addition, periphery features are extracted from the segmented images of touched characters and fed as inputs to support vector machines to calculate the confident values. In our experiment, the accuracy rate of the proposed character segmentation system is 94.90%, which demonstrates the effectiveness of the proposed method.  相似文献   

5.
The exploitation of video data requires methods able to extract high-level information from the images. Video summarization, video retrieval, or video surveillance are examples of applications. In this paper, we tackle the challenging problem of recognizing dynamic video contents from low-level motion features. We adopt a statistical approach involving modeling, (supervised) learning, and classification issues. Because of the diversity of video content (even for a given class of events), we have to design appropriate models of visual motion and learn them from videos. We have defined original parsimonious global probabilistic motion models, both for the dominant image motion (assumed to be due to the camera motion) and the residual image motion (related to scene motion). Motion measurements include affine motion models to capture the camera motion and low-level local motion features to account for scene motion. Motion learning and recognition are solved using maximum likelihood criteria. To validate the interest of the proposed motion modeling and recognition framework, we report dynamic content recognition results on sports videos.  相似文献   

6.
针对智能交通系统中小尺度交通标志识别率低的问题,文中提出一种改进卷积神经网络的交通标志识别方法。该方法通过在Faster R-CNN算法的低层特征图上增加优化的RPN网络,提升了小尺度交通标志的检测率。该方法还利用Max Pooling方法实了现图像的局部细节特征与全局语义特征充分融合。在TT-100K数据集上稍微实验结果表明新方法可以明显提高小尺度交通标志的识别率。  相似文献   

7.
交通标志检测技术是交通标志识别系统的重要前提和基础。由于背景的复杂性,在进行颜色分割时部分区域可能会受到干扰,为交通标志的检测带来困难。文中利用HSV颜色空间和RGB颜色空间对不同颜色的交通标志进行粗检测,标记不同的值实现ROI分割。然后利用模板匹配的方法对交通标志进行处理,使用模板在ROI区域上滑动,得到模板相似度的最大值,以此来实现检测过程。实验结果表明该方法能获得较好的检测结果。  相似文献   

8.
交通警告标志定位方法研究   总被引:2,自引:0,他引:2  
结合交通标志的彩色图像色彩特征和形状特征,在CCD摄像机采集的场景图像中对三角形交通警告标志定位方法进行研究.先利用RGB到HSV的彩色空间转换和阈值分割将交通标志区域信息增强,然后利用边缘检测提取交通标志的三角形形状特征,并对三个顶点进行准确检测,最后通过直线拟合的方法用三条红色直线标示交通标志的精确位置.实验结果表明,应用上述综合设计能对图像中的三角形交通标志进行准确定位和位置标示.  相似文献   

9.
文章在对信息资源数字化工作中的OCR识别原理进行阐述的基础上,分析了OCR识别在信息资源数字化工作中的作用。随后,文章将信息资源数字化工作中OCR识别的生命周期划分为数字扫描对象的获取、数字图像的生产、数字图像的处理、OCR文本识别和识别结果优化等五个阶段,并依次对各个阶段的主要任务及主要特点展开了介绍。  相似文献   

10.
交通标识检测中样本类别间的不平衡常常导致分类器的检测性能弱化,为了克服这一问题,该文提出一种基于感兴趣区域和HOG-MBLBP融合特征的交通标识检测方法。首先采用颜色增强技术分割提取出自然背景中交通标识所在的感兴趣区域;然后对标识样本库提取HOG-MBLBP融合特征,并用遗传算法对SVM交叉验证进行参数的优化选取,以此来训练和提升SVM分类器性能;最后将提取的感兴趣区域图像的HOG-MBLBP特征送入训练好的SVM多分类器,进行进一步的精确检测和定位,剔除误检区域。在自建的中国交通标识样本库上进行了实验,结果表明所提方法能达到99.2%的分类准确度,混淆矩阵结果也表明了该方法的优越性。  相似文献   

11.
Extraction of foreground contents in complex background document images is very difficult as background texture, color and foreground font, size, color, tilt are not known in advance. In this work, we propose a RGB color model for the input of complex color document images. An algorithm to detect the text regions using Gabor filters followed by extraction of text using color feature luminance is developed too. The proposed approach consists of three stages. Based on the Gabor features, the candidate image segments containing text are detected in stage-1. Because of complex background, certain amount of high frequency non-text objects in the background are also detected as text objects in stage-1. In stage-2, certain amount of false text objects is dropped by performing the connected component analysis. In stage-3, the image segments containing textual information, which are obtained from the previous stage are binarized to extract the foreground text. The color feature luminance is extracted from the input color document image. The threshold value is derived automatically using this color feature. The proposed approach handles both printed and handwritten color document images with foreground text in any color, font, size and orientation. For experimental evaluations, we have considered a variety of document images having non-uniform/uniform textured and multicolored background. Performance of segmentation of foreground text is evaluated on a commercially available OCR. Evaluation results show better recognition accuracy of foreground characters in the processed document images against unprocessed document images.  相似文献   

12.
13.

In today’s highly computerized society, detection and recognition of text present in natural scene images is complex and difficult to be properly recognized by human vision. Most of the existing algorithms and models mainly focus on detection and recognition of text from still images. Many of the recent machine translation systems are built using the Encoder-Decoder framework which works on the format of encoding the sequence of input and then based on the encoded input, the output is decoded. Both the encoder and the decoder use an attention mechanism as an interface, making the model complex. Aiming at this situation, an alternative method for recognition of texts from videos is proposed. The proposed approach is based on a single Two-Dimensional Convolutional Neural Network (2D CNN). An algorithm for extracting features from an image called the crosswise feature extraction is also proposed. The proposed model is tested and shows that crosswise feature extraction gives better recognition accuracy by requiring a lesser period of time for training than the conventional feature extraction technique used by CNN.

  相似文献   

14.
Recently, sparse coding has become popular for image classification. However, images are often captured under different conditions such as varied poses, scales and different camera parameters. This means local features may not be discriminative enough to cope with these variations. To solve this problem, affine transformation along with sparse coding is proposed. Although proven effective, the affine sparse coding has no constraints on the tilt and orientations as well as the encoding parameter consistency of the transformed local features. To solve these problems, we propose a Laplacian affine sparse coding algorithm which combines the tilt and orientations of affine local features as well as the dependency among local features. We add tilt and orientation smooth constraints into the objective function of sparse coding. Besides, a Laplacian regularization term is also used to characterize the encoding parameter similarity. Experimental results on several public datasets demonstrate the effectiveness of the proposed method.  相似文献   

15.
维、汉、英是特点完全不同的文字。该文依据多层次语言判断和适当干预的多语言字符识别系统设计原则首次实现了维、汉、英混排文本识别系统。识别系统首先根据维、汉、英文字的各自特点实现字符块语言属性的初步判断,然后针对每种文字设计不同的字符切割算法。字符识别可信度用来判断字符语言属性和字符切分结果是否正确。实验结果表明,各种维、汉、英混排文本识别率达到96.4%以上。  相似文献   

16.
Discriminating between the text and non text regions of an image is a complex and challenging task. In contrast to Caption text, Scene text can have any orientation and may be distorted by the perspective projection. Moreover, it is often affected by variations in scene and camera parameters such as illumination, focus, etc. These variations make the design of unified text extraction from various kinds of images extremely difficult. This paper proposes a statistical unified approach for the extraction of text from hybrid textual images (both Scene text and Caption text in an image) and Document images with variations in text by using carefully selected features with the help of multi level feature priority (MLFP) algorithm. The selected features are combinedly found to be the good choice of feature vectors and have the efficacy to discriminate between text and non text regions for Scene text, Caption text and Document images and the proposed system is robust to illumination, transformation/perspective projection, font size and radially changing/angular text. MLFP feature selection algorithm is evaluated with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based K-nearest neighbour learner and effectiveness of MLFP is shown by comparing with three feature selection methods with benchmark dataset. The proposed text extraction system is compared with the Edge based method, Connected component method and Texture based method and shown encouraging result and finds its major application in preprocessing for optical character recognition technique and multimedia processing, mobile robot navigation, vehicle license detection and recognition, page segmentation and text-based image indexing, etc.  相似文献   

17.
Vehicle detection in aerial surveillance using dynamic Bayesian networks   总被引:1,自引:0,他引:1  
We present an automatic vehicle detection system for aerial surveillance in this paper. In this system, we escape from the stereotype and existing frameworks of vehicle detection in aerial surveillance, which are either region based or sliding window based. We design a pixelwise classification method for vehicle detection. The novelty lies in the fact that, in spite of performing pixelwise classification, relations among neighboring pixels in a region are preserved in the feature extraction process. We consider features including vehicle colors and local features. For vehicle color extraction, we utilize a color transform to separate vehicle colors and nonvehicle colors effectively. For edge detection, we apply moment preserving to adjust the thresholds of the Canny edge detector automatically, which increases the adaptability and the accuracy for detection in various aerial images. Afterward, a dynamic Bayesian network (DBN) is constructed for the classification purpose. We convert regional local features into quantitative observations that can be referenced when applying pixelwise classification via DBN. Experiments were conducted on a wide variety of aerial videos. The results demonstrate flexibility and good generalization abilities of the proposed method on a challenging data set with aerial surveillance images taken at different heights and under different camera angles.  相似文献   

18.
We present a technique for irreversible watermarking approach robust to affine transform attacks in camera, biomedical and satellite images stored in the form of monochrome bitmap images. The watermarking approach is based on image normalisation in which both watermark embedding and extraction are carried out with respect to an image normalised to meet a set of predefined moment criteria. The normalisation procedure is invariant to affine transform attacks. The result of watermarking scheme is suitable for public watermarking applications, where the original image is not available for watermark extraction. Here, direct-sequence code division multiple access approach is used to embed multibit text information in DCT and DWT transform domains. The proposed watermarking schemes are robust against various types of attacks such as Gaussian noise, shearing, scaling, rotation, flipping, affine transform, signal processing and JPEG compression. Performance analysis results are measured using image processing metrics.  相似文献   

19.
A new approach is proposed for registering a set of histological coronal two-dimensional images of a rat brain sectional material with coronal sections of a three-dimensional brain atlas, an intrinsic step and a significant challenge to current efforts in brain mapping and multimodal fusion of experimental data. The alignment problem is based on matching external contours of the brain sections, and operates in the presence of tissue distortion and tears which are routinely encountered, and possible scale, rotation, and shear changes (the affine and weak perspective groups). It is based on a novel set of local absolute affine invariants derived from the set of ordered inflection points on the external contour represented by a cubic B-spline curve. The inflection points are local intrinsic geometric features, which are preserved under both the affine and the weak perspective transformations. The invariants are constructed from the sequence of area patches bounded by the contour and the line connecting two consecutive inflection points, and hence do make direct use of the area (volume) invariance property associated with the affine transformation. These local absolute invariants are very well suited to handle the tissue distortion and tears (occlusion problem).  相似文献   

20.
Vincent Nozick 《电信纪事》2013,68(11-12):581-596
This paper presents an image rectification method for an arbitrary number of views with aligned camera center. This paper also describes how to extend this method to easily perform a robust camera calibration. These two techniques can be used for stereoscopic rendering to enhance the perception comfort or for depth from stereo. In this paper, we first expose why epipolar geometry is not suited to solve this problem. Second, we propose a nonlinear method that includes all the images in the rectification process. Then, we detail how to extract the rectification parameters to provide a quasi-Euclidean camera calibration. Our method only requires point correspondences between the views and can handle images with different resolutions. The tests show that it is robust to noise and to sparse point correspondences among the views.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号