共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper proposes a two-stage system for text detection in video images. In the first stage, text lines are detected based on the edge map of the image leading in a high recall rate with low computational time expenses. In the second stage, the result is refined using a sliding window and an SVM classifier trained on features obtained by a new Local Binary Pattern-based operator (eLBP) that describes the local edge distribution. The whole algorithm is used in a multiresolution fashion enabling detection of characters for a broad size range. Experimental results, based on a new evaluation methodology, show the promising overall performance of the system on a challenging corpus, and prove the superior discriminating ability of the proposed feature set against the best features reported in the literature. 相似文献
2.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
3.
Datong Chen Author Vitae Jean-Marc Odobez Author VitaeAuthor Vitae 《Pattern recognition》2004,37(3):595-608
This paper presents a new method for detecting and recognizing text in complex images and video frames. Text detection is performed in a two-step approach that combines the speed of a text localization step, enabling text size normalization, with the strength of a machine learning text verification step applied on background independent features. Text recognition, applied on the detected text lines, is addressed by a text segmentation step followed by an traditional OCR algorithm within a multi-hypotheses framework relying on multiple segments, language modeling and OCR statistics. Experiments conducted on large databases of real broadcast documents demonstrate the validity of our approach. 相似文献
4.
David Crandall Sameer Antani Rangachar Kasturi 《International Journal on Document Analysis and Recognition》2003,5(2-3):138-157
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically
index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's
semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary
color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about
the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video.
In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However,
text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once
in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial
problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television
programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing,
and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared
with existing work found in the literature.
Received: January 29, 2002 / Accepted: September 13, 2002
D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com
S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov
Correspondence to: David Crandall 相似文献
5.
In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004 相似文献
6.
Gregory K. Myers Robert C. Bolles Quang-Tuan Luong James A. Herson Hrishikesh B. Aradhye 《International Journal on Document Analysis and Recognition》2005,7(2-3):147-158
Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of existing perspective rectification methods that were primarily designed for images of document pages. We propose an approach that reliably rectifies and subsequently recognizes individual lines of text. Our system, which includes novel algorithms for extraction of text from real-world scenery, perspective rectification, and binarization, has been rigorously tested on still imagery as well as on MPEG-2 video clips in real time.Received: 15 December 2003, Published online: 14 December 2004Gregory K. Myers: Correspondence to 相似文献
7.
Keechul Jung Author Vitae Kwang In Kim Author Vitae Author Vitae 《Pattern recognition》2004,37(5):977-997
Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research. 相似文献
8.
This paper presents a morphology-based text line extraction algorithm for extracting text regions from cluttered images. First
of all, the method defines a novel set of morphological operations for extracting important contrast regions as possible text
line candidates. The contrast feature is robust to lighting changes and invariant against different image transformations
like image scaling, translation, and skewing. In order to detect skewed text lines, a moment-based method is then used for
estimating their orientations. According to the orientation, an x-projection technique can be applied to extract various text geometries from the text-analogue segments for text verification.
However, due to noise, a text line region is often fragmented to different pieces of segments. Therefore, after the projection,
a novel recovery algorithm is then proposed for recovering a complete text line from its pieces of segments. After that, a
verification scheme is then proposed for verifying all extracted potential text lines according to their text geometries.
Experimental results show that the proposed method improves the state-of-the-art work in terms of effectiveness and robustness
for text line detection. 相似文献
9.
10.
Katherine Donaldson Gregory K. Myers 《International Journal on Document Analysis and Recognition》2005,7(2-3):159-167
To increase the range of sizes of video scene text recognizable by optical character recognition (OCR), we developed a Bayesian super-resolution algorithm that uses a text-specific bimodal prior. We evaluated the effectiveness of the bimodal prior, compared and in conjunction with a piecewise smoothness prior, visually and by measuring the accuracy of the OCR results on the variously super-resolved images. The bimodal prior improved the readability of 4- to 7-pixel-high scene text significantly better than bicubic interpolation and increased the accuracy of OCR results better than the piecewise smoothness prior. 相似文献
11.
Neural network-based text location in color images 总被引:9,自引:0,他引:9
This paper proposes neural network-based text locations in complex color images. Texture information extracted on several color bands using neural networks is combined and corresponding text location algorithms are then developed. Text extraction filters can be automatically constructed using neural networks. Comparisons with other text location methods are presented; indicating that the proposed system has a better accuracy. 相似文献
12.
13.
Xiaonan Lu Saurabh Kataria William J. Brouwer James Z. Wang Prasenjit Mitra C. Lee Giles 《International Journal on Document Analysis and Recognition》2009,12(2):65-81
Authors use images to present a wide variety of important information in documents. For example, two-dimensional (2-D) plots
display important data in scientific publications. Often, end-users seek to extract this data and convert it into a machine-processible
form so that the data can be analyzed automatically or compared with other existing data. Existing document data extraction
tools are semi-automatic and require users to provide metadata and interactively extract the data. In this paper, we describe
a system that extracts data from documents fully automatically, completely eliminating the need for human intervention. The
system uses a supervised learning-based algorithm to classify figures in digital documents into five classes: photographs,
2-D plots, 3-D plots, diagrams, and others. Then, an integrated algorithm is used to extract numerical data from data points
and lines in the 2-D plot images along with the axes and their labels, the data symbols in the figure’s legend and their associated
labels. We demonstrate that the proposed system and its component algorithms are effective via an empirical evaluation. Our
data extraction system has the potential to be a vital component in high volume digital libraries. 相似文献
14.
S. Basu Author VitaeAuthor Vitae M. Kundu Author Vitae Author Vitae D.K. Basu Author Vitae 《Pattern recognition》2007,40(6):1825-1839
A novel text line extraction technique is presented for multi-skewed document images of handwritten English or Bengali text. It assumes that hypothetical water flows, from both left and right sides of the image frame, face obstruction from characters of text lines. The stripes of areas left unwetted on the image frame are finally labelled for extraction of text lines. The success rate of the technique, as observed experimentally, are 90.34% and 91.44% for handwritten Bengali and English document images, respectively. The work may contribute significantly for the development of applications related to optical character recognition of Bengali/English text. 相似文献
15.
16.
17.
隧道的监视与控制管理是高速公路隧道安全正常运行的重要课题。本文对隧道交通智能视频监控系统的硬件和软件进行了介绍,结合视频监控场景的复杂特性,针对视频监控中涉及视频图像处理与分析的图像分割、目标提取等技术进行了研究。并把系统应用于重庆市中梁山隧道(长度3165米左右),结果表明本文方法是比较实用的,能满足实时视频监控系统的要求。 相似文献
18.
H.T. Pao Y.H. Chen P.S. Lai Y.Y. Xu Hsin-Chia Fu 《Expert systems with applications》2008,35(3):1444-1450
This paper addresses an integrated information mining techniques for broadcasting TV-news. This utilizes technique from the fields of acoustic, image, and video analysis, for information on news story title, newsman and scene identification. The goal is to construct a compact yet meaningful abstraction of broadcast TV-news, allowing users to browse through large amounts of data in a non-linear fashion with flexibility and efficiency. By adding acoustic analysis, a news program can be partitioned into news and commercial clips, with 90% accuracy on a data set of 400 h TV-news recorded off the air from July 2005 to August 2006. By applying speaker identification and/or image detection techniques, each news stories can be segmented with a better accuracy of 95.92%. On-screen captions or subtitles are recognized by OCR techniques to produce the text title of each news stories. The extracted title words can be used to link or to navigate more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can provide users with multimodal query on specific news stories. Some experimental results are presented and discussed for the system reliability, performance evaluation and comparison. 相似文献
19.
Automatic parsing and indexing of news video 总被引:9,自引:0,他引:9
Automatic construction of content-based indices for video source material requires general semantic interpretation of both images and their accompanying sounds; but such a broadly-based semantic analysis is beyond the capabilities of the current technologies of machine vision and audio signal analysis. However, if one can assume a limited and well-demarcated body of domain knowledge for describing the content of a body of video, then it becomes easier to interpret a video source in terms of that domain knowledge. This paper presents our work on using domain knowledge to parse news video programs and to index them on the basis of their visual content. Models based on both the spatial structure of image frames and the temporal structure of the entire program have been developed for news videos, along with algorithms that apply these models by locating and identifying instances of their elements. Experimental results are also discussed in detail to evaluate both the models and the algorithms that use them. Finally, proposals for future work are summarized. 相似文献
20.
图象和视频的检索技术 总被引:10,自引:0,他引:10
随着网络技术的发展,多媒体数据将成为网络服务的主要内容,因此对多媒体数据管理问题的研究成为近几年的热点。由于媒体信息表现性质的不同,传统关系数据库的检索方式不再适用于图象和视频,因此,必须采用基于自身内容的检索方式。文章对基于内容的图象和视频检索技术分不同层次进行了全面的总结,内容包括依据基本特征,色彩、纹理、形状、和位置关系的技术,视频的场景分割、关键帧提取技术以及基于声音、文字的检索技术等,并阐述了各种方法的优缺点,现状及发展方向。 相似文献