This paper proposes a two-stage system for text detection in video images. In the first stage, text lines are detected based on the edge map of the image leading in a high recall rate with low computational time expenses. In the second stage, the result is refined using a sliding window and an SVM classifier trained on features obtained by a new Local Binary Pattern-based operator (eLBP) that describes the local edge distribution. The whole algorithm is used in a multiresolution fashion enabling detection of characters for a broad size range. Experimental results, based on a new evaluation methodology, show the promising overall performance of the system on a challenging corpus, and prove the superior discriminating ability of the proposed feature set against the best features reported in the literature.  相似文献   

Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research.  相似文献   

In this paper, we first introduce a recursive procedure for efficiently computing cubic facet parameters for edge detection. The procedure allows to compute facet parameters in a fixed number of operations independent of kernel size. We then introduce an image independent quantitative criterion for analytically evaluating different edge detectors (both gradient and zero-crossing based methods) without the need of ground-truth information. Our criterion is based on our observation that all edge detectors make a decision of whether a pixel is an edgel or not based on the result of convolution of the image with a kernel. The variance of the convolution output therefore directly affects the performance of an edge detector. We propose to analytically compute the variance of the convolution output and use it as a measure to characterize the performance of four well-known edge detectors.  相似文献   


In recent years, many approaches have been exploited for automatic urban road extraction. Most of these approaches are based on edge and line detecting algorithms. In this paper, a new integrated system for automatic extraction of main roads in high-resolution optical satellite images is present. Firstly, a multi-scale greylevel morphological cleaning algorithm is proposed to reduce the grey deviation of the road regions. Secondly, based on the greylevel difference between road surfaces and environmental objects, a colour high-resolution satellite image is segmented into a simplified imagemap by using the mean shift algorithm, which consists of three stages. The first stage deals with image filtering, the second stage deals with colour segmentation, and the third stage is proposed to fuse small regions in the segmented image. The mean shift filter algorithm not only smoothes the image, but also preserves abrupt changes (i.e. edges) in the local structure. The mean shift segmentation algorithm is a straightforward extension of the smoothing algorithm, which preserves discontinuity. From the histogram of the simplified imagemap, we can find the potential road surfaces, and use greylevel threshold to convert the segmented image into a binary one. The binary image is processed by using binary mathematical morphological closing and opening to remove small objects and to open the connected street blocks. We use a contour tracing algorithm to remove holes in street-block regions and to detect the street blocks' contours. In this research we found that many street blocks' contours were preserved perfectly, except for some of them which were depressed. Finally, we utilize the convex hull algorithm to smooth the street blocks' zigzag edges and to close the gaps in some street blocks, and then, we get the road edges. The integrated system for road network extraction is tested on the red band of an IKONOS multispectral image; all algorithms in this study are developed in C++ under Windows XP operating system. Results of the road network extraction are presented to illustrate the validation of the extracting strategy and the corresponding algorithms in this research, and future prospects are exposed.  相似文献   

Figures of merit are defined for possible linking of pairs of edge segments that continue one another or are anti-parallel to each other. These figures depend on both the geometrical configuration of the segments and the gray levels associated with them. Examples are given involving edge segments of buildings and roads on high resolution aerial photographs.  相似文献   

An edge detection algorithm was developed that is capable of objectively detecting significant edges in remotely sensed images of the surface ocean. The algorithm utilizes a gradient-based edge detector that is less sensitive to noise in the input image than previously used detectors and has the ability to detect edges at different length scales. The algorithm was used to provide a statistical view of surface front occurrences in the Southern California Bight using six years of satellite observations of sea-surface temperature (AVHRR) and chlorophyll (SeaWiFS). Regions of high front occurrence probability were identified near capes, headlands, and islands. In onshore direction, chlorophyll concentration increased and temperature decreased, indicative of coastal upwelling. The algorithm was further applied to coincident time series of temperature and chlorophyll to investigate the event-scale dynamics of mesoscale features and the spatial relationships between physical and biological processes. A simple scheme for identifying and classifying eddies delineated by the edge detection algorithm was developed to yield a census of eddy occurrence in the Southern California Bight.  相似文献   

Breast cancer is the second leading cause of death for women all over the world. Since the cause of the disease remains unknown, early detection and diagnosis is the key for breast cancer control, and it can increase the success of treatment, save lives and reduce cost. Ultrasound imaging is one of the most frequently used diagnosis tools to detect and classify abnormalities of the breast. In order to eliminate the operator dependency and improve the diagnostic accuracy, computer-aided diagnosis (CAD) system is a valuable and beneficial means for breast cancer detection and classification. Generally, a CAD system consists of four stages: preprocessing, segmentation, feature extraction and selection, and classification. In this paper, the approaches used in these stages are summarized and their advantages and disadvantages are discussed. The performance evaluation of CAD system is investigated as well.  相似文献   

目的 足球视频镜头和球场区域是足球视频事件检测的必要条件,对于足球视频语义分析具有重要作用。针对现有镜头分类方法的不足,提出识别足球视频镜头类型的波动检测法。方法 该方法使用一个滑动窗口在视频帧图像中滑动,记录滑动窗口内球场像素比例在远镜头阈值上下的波动次数,根据波动次数判断镜头类型。对于足球场地区域分类,提出使用视频图像中球场区域的左上角和右上角点的位置关系识别球场区域类型的方法,该方法使用高斯混合模型识别出球场,根据球场在帧图像中左右边界坐标的高低判断球场区域类型,方法简单高效。结果 本文提出的两种方法与现有的分类方法相比,在准确率和召回率方面具有较大提高,检测效率高,可以满足实时性要求。结论 本文方法解决了传统滑动窗口法无法正确识别球场倾斜角度过大的帧图像,降低了传统球场区域检测方法依赖球场线检测而导致的准确率不高的问题。  相似文献   

This paper presents a new knowledge-based system for extracting and identifying text-lines from various real-life mixed text/graphics compound document images. The proposed system first decomposes the document image into distinct object planes to separate homogeneous objects, including textual regions of interest, non-text objects such as graphics and pictures, and background textures. A knowledge-based text extraction and identification method obtains the text-lines with different characteristics in each plane. The proposed system offers high flexibility and expandability by merely updating new rules to cope with various types of real-life complex document images. Experimental and comparative results prove the effectiveness of the proposed knowledge-based system and its advantages in extracting text-lines with a large variety of illumination levels, sizes, and font styles from various types of mixed and overlapping text/graphics complex compound document images.  相似文献   

边缘检测是计算机视觉中非常重要且实用的图像处理方法,被应用在各个领域。然而在图像采集或传输过程中,由于外界环境的干扰,容易出现结果边缘检测率较低或者伪边缘现象,学者们为此提出了很多改进方法。但是通用的边缘检测方法确很少,现有的算法都是以处理特定场景或特定情况下的问题为目的。Kirsch联合高低双阈值的RGB图像边缘检测算法正是针对上述问题提出的。首先,提取原图RGB色彩空间下的不同分量图,对每个分量图利用改进的Kirsch算子求取边缘强度;然后利用高低双阈值划分图像的边缘点和背景点,得到不同色彩空间的边缘结果;最后对不同分量的边缘检测结果进行融合,得到最终的边缘结果。利用基准数据集BSDS500数据集中的200张测试图像对算法进行验证评估,实验结果表明,本文算法相比于其他算法检测到的边缘更加清晰,细节更加完整,边缘连贯性更好,检测率更高,适用范围更广。  相似文献   

Text characters embedded in images represent a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values, and complex backgrounds. Existing methods cannot handle well those texts with different contrast or embedded in a complex image background. In this paper, a set of sequential algorithms for text extraction and enhancement of image using cellular automata are proposed. The image enhancement includes gray level, contrast manipulation, edge detection, and filtering. First, it applies edge detection and uses a threshold to filter out for low-contrast text and simplify complex background of high-contrast text from binary image. The proposed algorithm is simple and easy to use and requires only a sample texture binary image as an input. It generates textures with perceived quality, better than those proposed by earlier published techniques. The performance of our method is demonstrated by presenting experimental results for a set of text based binary images. The quality of thresholding is assessed using the precision and recall analysis of the resultant text in the binary image.  相似文献   

目的 目前基于卷积神经网络(CNN)的文本检测方法对自然场景中小尺度文本的定位非常困难。但自然场景图像中文本目标与其他目标存在很强的关联性,即自然场景中的文本通常伴随特定物体如广告牌、路牌等同时出现,基于此本文提出了一种顾及目标关联的级联CNN自然场景文本检测方法。方法 首先利用CNN检测文本目标及包含文本的关联物体目标,得到文本候选框及包含文本的关联物体候选框;再扩大包含文本的关联物体候选框区域,并从原始图像中裁剪,然后以该裁剪图像作为CNN的输入再精确检测文本候选框;最后采用非极大值抑制方法融合上述两步生成的文本候选框,得到文本检测结果。结果 本文方法能够有效地检测小尺度文本,在ICDAR-2013数据集上召回率、准确率和F值分别为0.817、0.880和0.847。结论 本文方法顾及自然场景中文本目标与包含文本的物体目标的强关联性,提高了自然场景图像中小尺度文本检测的召回率。  相似文献   

Providing an improved technique which can assist pathologists in correctly classifying meningioma tumours with a significant accuracy is our main objective. The proposed technique, which is based on optimum texture measure combination, inspects the separability of the RGB colour channels and selects the channel which best segments the cell nuclei of the histopathological images. The morphological gradient was applied to extract the region of interest for each subtype and for elimination of possible noise (e.g. cracks) which might occur during biopsy preparation. Meningioma texture features are extracted by four different texture measures (two model-based and two statistical-based) and then corresponding features are fused together in different combinations after excluding highly correlated features, and a Bayesian classifier was used for meningioma subtype discrimination. The combined Gaussian Markov random field and run-length matrix texture measures outperformed all other combinations in terms of quantitatively characterising the meningioma tissue, achieving an overall classification accuracy of 92.50%, improving from 83.75% which is the best accuracy achieved if the texture measures are used individually.  相似文献   

介绍了一种基于面向对象的Visual C++语言,在Windows MFC平台下开发出的视频字幕自动提取系统。该系统应用垂直、水平、对角方向的边缘检测算子检测出3个方向的字幕边缘信息,然后运用形态学对每个方向的边缘图像进行处理,最终运用与融合提取出字幕区域。实验结果验证,该系统性能稳定,字幕定位较精准。  相似文献   

The paper describes the pattern recognition subsystem of an automated specimen analyser being developed for pre-screening applications with a high resolution system. The complete recognition system is divided into a hierarchy of two different recognition systems: the single cell classifier, and the specimen classifier. The single cell image classifier operates on the cell data of isolated cell images based on features which are extracted from the nucleus only. For each specimen a large number of single cells are processed by the single cell classifier. The collection of the resulting discriminant vectors-and not just the decisions only-are used as measurements for the subsequent specimen classifier, which has to produce a real-valued discriminant function indicating the degree of suspiciousness of the specimen analysed. By proper thresholding of this figure of malignancy, the final decision can be made. The approach offers the opportunity of detecting suspicious cell modifications in an early stage.  相似文献   

在基于维吾尔文特殊字体的基础上,提出了一种维吾尔文视频文字定位方法。该方法首先利用RGB彩色边缘检测算子获得水平、垂直、右上方和左上方的边缘图,然后根据加权后的边缘图提取图像的纹理特征,用改进的模糊C均值聚类算法检测出候选的文本区域,根据文本区域的启发式规则,去除虚假的文本区域,最后由维吾尔文本的基线特征判定检测出的区域是否为维吾尔文本区域。实验结果表明,这种方法在简单背景和复杂背景视频图像中均具有较好的效果。  相似文献   

This paper addresses the problem of target detection and classification, where the performance is often limited due to high rates of false alarm and classification error, possibly because of inadequacies in the underlying algorithms of feature extraction from sensory data and subsequent pattern classification. In this paper, a recently reported feature extraction algorithm, symbolic dynamic filtering (SDF), is investigated for target detection and classification by using unmanned ground sensors (UGS). In SDF, sensor time series data are first symbolized to construct probabilistic finite state automata (PFSA) that, in turn, generate low-dimensional feature vectors. In this paper, the performance of SDF is compared with that of two commonly used feature extractors, namely Cepstrum and principal component analysis (PCA), for target detection and classification. Three different pattern classifiers have been employed to compare the performance of the three feature extractors for target detection and human/animal classification by UGS systems based on two sets of field data that consist of passive infrared (PIR) and seismic sensors. The results show consistently superior performance of SDF-based feature extraction over Cepstrum-based and PCA-based feature extraction in terms of successful detection, false alarm, and misclassification rates.  相似文献   

In the last few years, a growing attention has been paid to the problem of human-human communication, trying to devise artificial systems able to mediate a conversational setting between two or more people. In this paper, we propose an automatic system based on a generative structure able to classify dialog scenarios. The generative model is composed by integrating a Gaussian mixture model and a (observed) Markovian influence model, and it is fed with a novel low-level acoustic feature termed steady conversational period (SCP). SCPs are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provides a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features, and may be important for predicting the evolution of typical conversational situations in different dialog scenarios. The model has been tested on an extensive set of real, dyadic and multi-person conversational settings, including a recent dyadic dataset and the AMI meeting corpus. Comparative tests are made using conventional acoustic features and classification methods, showing that the proposed scheme provides superior classification performances for all conversational settings in our datasets. Moreover, we prove that our approach is able to characterize the nature of multi-person conversation (namely, the role of the participants) in a very accurate way, thus demonstrating great versatility.  相似文献   

由动静脉血管组成的眼底视网膜血管结构的特征点是预测心血管疾病、图像分析和生物学应用的重要特征。把角点检测引入到视网膜血管分叉点和交叉点提取中,利用边缘检测算子得到二值边缘图像,采用基于累加点到弦的距离(CPDA)的角点检测方法得到候选特征点,再根据视网膜血管图像的拓扑结构设计自适应矩形探测器对候选特征点进行删减和分类。实验结果表明,基于CPDA的角点检测和自适应矩形探测器的方法有效地实现了节点的提取和分类。  相似文献   

