首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper proposes a two-stage system for text detection in video images. In the first stage, text lines are detected based on the edge map of the image leading in a high recall rate with low computational time expenses. In the second stage, the result is refined using a sliding window and an SVM classifier trained on features obtained by a new Local Binary Pattern-based operator (eLBP) that describes the local edge distribution. The whole algorithm is used in a multiresolution fashion enabling detection of characters for a broad size range. Experimental results, based on a new evaluation methodology, show the promising overall performance of the system on a challenging corpus, and prove the superior discriminating ability of the proposed feature set against the best features reported in the literature.  相似文献   

2.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

3.
This paper presents a new method for detecting and recognizing text in complex images and video frames. Text detection is performed in a two-step approach that combines the speed of a text localization step, enabling text size normalization, with the strength of a machine learning text verification step applied on background independent features. Text recognition, applied on the detected text lines, is addressed by a text segmentation step followed by an traditional OCR algorithm within a multi-hypotheses framework relying on multiple segments, language modeling and OCR statistics. Experiments conducted on large databases of real broadcast documents demonstrate the validity of our approach.  相似文献   

4.
Extraction of special effects caption text events from digital video   总被引:1,自引:1,他引:1  
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature. Received: January 29, 2002 / Accepted: September 13, 2002 D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov Correspondence to: David Crandall  相似文献   

5.
Video text detection and segmentation for optical character recognition   总被引:1,自引:0,他引:1  
In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004  相似文献   

6.
Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of existing perspective rectification methods that were primarily designed for images of document pages. We propose an approach that reliably rectifies and subsequently recognizes individual lines of text. Our system, which includes novel algorithms for extraction of text from real-world scenery, perspective rectification, and binarization, has been rigorously tested on still imagery as well as on MPEG-2 video clips in real time.Received: 15 December 2003, Published online: 14 December 2004Gregory K. Myers: Correspondence to  相似文献   

7.
Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research.  相似文献   

8.
This paper presents a morphology-based text line extraction algorithm for extracting text regions from cluttered images. First of all, the method defines a novel set of morphological operations for extracting important contrast regions as possible text line candidates. The contrast feature is robust to lighting changes and invariant against different image transformations like image scaling, translation, and skewing. In order to detect skewed text lines, a moment-based method is then used for estimating their orientations. According to the orientation, an x-projection technique can be applied to extract various text geometries from the text-analogue segments for text verification. However, due to noise, a text line region is often fragmented to different pieces of segments. Therefore, after the projection, a novel recovery algorithm is then proposed for recovering a complete text line from its pieces of segments. After that, a verification scheme is then proposed for verifying all extracted potential text lines according to their text geometries. Experimental results show that the proposed method improves the state-of-the-art work in terms of effectiveness and robustness for text line detection.  相似文献   

9.
朱成军  李超  熊璋 《计算机工程》2007,33(10):218-219
视频中的文本提供了描述视频内容的有用信息,对于构建基于高级语义的多媒体检索系统具有重要作用。该文从视频文本的特点出发,分析了视频文本检测和识别的各种技术方法及优缺点,以及该领域国内外的发展现状和下一步研究的重点方向。  相似文献   

10.
To increase the range of sizes of video scene text recognizable by optical character recognition (OCR), we developed a Bayesian super-resolution algorithm that uses a text-specific bimodal prior. We evaluated the effectiveness of the bimodal prior, compared and in conjunction with a piecewise smoothness prior, visually and by measuring the accuracy of the OCR results on the variously super-resolved images. The bimodal prior improved the readability of 4- to 7-pixel-high scene text significantly better than bicubic interpolation and increased the accuracy of OCR results better than the piecewise smoothness prior.  相似文献   

11.
Neural network-based text location in color images   总被引:9,自引:0,他引:9  
This paper proposes neural network-based text locations in complex color images. Texture information extracted on several color bands using neural networks is combined and corresponding text location algorithms are then developed. Text extraction filters can be automatically constructed using neural networks. Comparisons with other text location methods are presented; indicating that the proposed system has a better accuracy.  相似文献   

12.
视频和图像文本提取方法综述   总被引:1,自引:0,他引:1  
文本提取在视频和图像中具有重要的应用价值。近年来,大数据时代带来了海量信息检索的迫切需求,大量视频和图像中文本的提取方法涌现出来。回顾了视频和图像中文本提取的算法,从文本提取流程出发,将其分为文本区域检测定位和文本分割两大步骤。在每个步骤中,分析并比较了现有算法的使用范围及相对优缺点,讨论了图像公用数据库,列举了近些年来图像中文本提取的重要应用,指出了当前研究中存在的问题,展望了视频和场景图像文本提取方法的发展趋势。  相似文献   

13.
Authors use images to present a wide variety of important information in documents. For example, two-dimensional (2-D) plots display important data in scientific publications. Often, end-users seek to extract this data and convert it into a machine-processible form so that the data can be analyzed automatically or compared with other existing data. Existing document data extraction tools are semi-automatic and require users to provide metadata and interactively extract the data. In this paper, we describe a system that extracts data from documents fully automatically, completely eliminating the need for human intervention. The system uses a supervised learning-based algorithm to classify figures in digital documents into five classes: photographs, 2-D plots, 3-D plots, diagrams, and others. Then, an integrated algorithm is used to extract numerical data from data points and lines in the 2-D plot images along with the axes and their labels, the data symbols in the figure’s legend and their associated labels. We demonstrate that the proposed system and its component algorithms are effective via an empirical evaluation. Our data extraction system has the potential to be a vital component in high volume digital libraries.  相似文献   

14.
A novel text line extraction technique is presented for multi-skewed document images of handwritten English or Bengali text. It assumes that hypothetical water flows, from both left and right sides of the image frame, face obstruction from characters of text lines. The stripes of areas left unwetted on the image frame are finally labelled for extraction of text lines. The success rate of the technique, as observed experimentally, are 90.34% and 91.44% for handwritten Bengali and English document images, respectively. The work may contribute significantly for the development of applications related to optical character recognition of Bengali/English text.  相似文献   

15.
基于COM技术的视频流文字检测   总被引:7,自引:1,他引:7  
从数字视频中提取文字对基于内容的视频索引的建立具有重要意义。讨论了视频中文字检测的算法,并提出了一种基于COM技术的实现。  相似文献   

16.
现有的从PDF文档抽取文本内容的方法(如PDFBox类库采用的方法)处理速度较低,无法满足高速网络中内容分析的需求,也不能对网络中部分到达的PDF数据包进行流式的处理。为此,提出了基于自动机理论的PDF文本内容抽取方法。该方法通过建立具有层次的关键字自动机,可以快速地抽取完整PDF文档和不完整PDF文档中的文本内容。在中文和英文PDF文档数据集下的实验结果表明,基于自动机理论的PDF文本内容抽取方法耗时仅为PDFBox方法的17%~37%。  相似文献   

17.
隧道的监视与控制管理是高速公路隧道安全正常运行的重要课题。本文对隧道交通智能视频监控系统的硬件和软件进行了介绍,结合视频监控场景的复杂特性,针对视频监控中涉及视频图像处理与分析的图像分割、目标提取等技术进行了研究。并把系统应用于重庆市中梁山隧道(长度3165米左右),结果表明本文方法是比较实用的,能满足实时视频监控系统的要求。  相似文献   

18.
This paper addresses an integrated information mining techniques for broadcasting TV-news. This utilizes technique from the fields of acoustic, image, and video analysis, for information on news story title, newsman and scene identification. The goal is to construct a compact yet meaningful abstraction of broadcast TV-news, allowing users to browse through large amounts of data in a non-linear fashion with flexibility and efficiency. By adding acoustic analysis, a news program can be partitioned into news and commercial clips, with 90% accuracy on a data set of 400 h TV-news recorded off the air from July 2005 to August 2006. By applying speaker identification and/or image detection techniques, each news stories can be segmented with a better accuracy of 95.92%. On-screen captions or subtitles are recognized by OCR techniques to produce the text title of each news stories. The extracted title words can be used to link or to navigate more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can provide users with multimodal query on specific news stories. Some experimental results are presented and discussed for the system reliability, performance evaluation and comparison.  相似文献   

19.
Automatic parsing and indexing of news video   总被引:9,自引:0,他引:9  
Automatic construction of content-based indices for video source material requires general semantic interpretation of both images and their accompanying sounds; but such a broadly-based semantic analysis is beyond the capabilities of the current technologies of machine vision and audio signal analysis. However, if one can assume a limited and well-demarcated body of domain knowledge for describing the content of a body of video, then it becomes easier to interpret a video source in terms of that domain knowledge. This paper presents our work on using domain knowledge to parse news video programs and to index them on the basis of their visual content. Models based on both the spatial structure of image frames and the temporal structure of the entire program have been developed for news videos, along with algorithms that apply these models by locating and identifying instances of their elements. Experimental results are also discussed in detail to evaluate both the models and the algorithms that use them. Finally, proposals for future work are summarized.  相似文献   

20.
图象和视频的检索技术   总被引:10,自引:0,他引:10  
随着网络技术的发展,多媒体数据将成为网络服务的主要内容,因此对多媒体数据管理问题的研究成为近几年的热点。由于媒体信息表现性质的不同,传统关系数据库的检索方式不再适用于图象和视频,因此,必须采用基于自身内容的检索方式。文章对基于内容的图象和视频检索技术分不同层次进行了全面的总结,内容包括依据基本特征,色彩、纹理、形状、和位置关系的技术,视频的场景分割、关键帧提取技术以及基于声音、文字的检索技术等,并阐述了各种方法的优缺点,现状及发展方向。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号