期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A two-stage scheme for text detection in video images

Marios Anthimopoulos Basilis Gatos Ioannis Pratikakis 《Image and vision computing》2010

This paper proposes a two-stage system for text detection in video images. In the first stage, text lines are detected based on the edge map of the image leading in a high recall rate with low computational time expenses. In the second stage, the result is refined using a sliding window and an SVM classifier trained on features obtained by a new Local Binary Pattern-based operator (eLBP) that describes the local edge distribution. The whole algorithm is used in a multiresolution fashion enabling detection of characters for a broad size range. Experimental results, based on a new evaluation methodology, show the promising overall performance of the system on a challenging corpus, and prove the superior discriminating ability of the proposed feature set against the best features reported in the literature. 相似文献

2.

Automatic text segmentation and text recognition for video indexing 总被引：13，自引：0，他引：13

Rainer Lienhart Wolfgang Effelsberg 《Multimedia Systems》2000,8(1):69-81

Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos. 相似文献

3.

Extraction of special effects caption text events from digital video 总被引：1，自引：1，他引：1

David Crandall Sameer Antani Rangachar Kasturi 《International Journal on Document Analysis and Recognition》2003,5(2-3):138-157

Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature. Received: January 29, 2002 / Accepted: September 13, 2002 D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov Correspondence to: David Crandall 相似文献

4.

Text detection and recognition in images and video frames

Datong Chen^{Author Vitae} Jean-Marc Odobez Author VitaeAuthor Vitae 《Pattern recognition》2004,37(3):595-608

This paper presents a new method for detecting and recognizing text in complex images and video frames. Text detection is performed in a two-step approach that combines the speed of a text localization step, enabling text size normalization, with the strength of a machine learning text verification step applied on background independent features. Text recognition, applied on the detected text lines, is addressed by a text segmentation step followed by an traditional OCR algorithm within a multi-hypotheses framework relying on multiple segments, language modeling and OCR statistics. Experiments conducted on large databases of real broadcast documents demonstrate the validity of our approach. 相似文献

5.

Video text detection and segmentation for optical character recognition 总被引：1，自引：0，他引：1

Chong-Wah Ngo Chi-Kwong Chan 《Multimedia Systems》2005,10(3):261-272

In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004 相似文献

6.

Text information extraction in images and video: a survey

Keechul Jung Author Vitae Kwang In Kim Author Vitae Author Vitae 《Pattern recognition》2004,37(5):977-997

Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research. 相似文献

7.

Rectification and recognition of text in 3-D scenes

Gregory K. Myers Robert C. Bolles Quang-Tuan Luong James A. Herson Hrishikesh B. Aradhye 《International Journal on Document Analysis and Recognition》2005,7(2-3):147-158

Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of existing perspective rectification methods that were primarily designed for images of document pages. We propose an approach that reliably rectifies and subsequently recognizes individual lines of text. Our system, which includes novel algorithms for extraction of text from real-world scenery, perspective rectification, and binarization, has been rigorously tested on still imagery as well as on MPEG-2 video clips in real time.Received: 15 December 2003, Published online: 14 December 2004Gregory K. Myers: Correspondence to 相似文献

8.

Morphology-based text line extraction

Jui-Chen Wu Jun-Wei Hsieh Yung-Sheng Chen 《Machine Vision and Applications》2008,19(3):195-207

This paper presents a morphology-based text line extraction algorithm for extracting text regions from cluttered images. First of all, the method defines a novel set of morphological operations for extracting important contrast regions as possible text line candidates. The contrast feature is robust to lighting changes and invariant against different image transformations like image scaling, translation, and skewing. In order to detect skewed text lines, a moment-based method is then used for estimating their orientations. According to the orientation, an x-projection technique can be applied to extract various text geometries from the text-analogue segments for text verification. However, due to noise, a text line region is often fragmented to different pieces of segments. Therefore, after the projection, a novel recovery algorithm is then proposed for recovering a complete text line from its pieces of segments. After that, a verification scheme is then proposed for verifying all extracted potential text lines according to their text geometries. Experimental results show that the proposed method improves the state-of-the-art work in terms of effectiveness and robustness for text line detection. 相似文献

9.

视频文本检测和识别技术研究

下载免费PDF全文

朱成军李超熊璋《计算机工程》2007,33(10):218-219

视频中的文本提供了描述视频内容的有用信息，对于构建基于高级语义的多媒体检索系统具有重要作用。该文从视频文本的特点出发，分析了视频文本检测和识别的各种技术方法及优缺点，以及该领域国内外的发展现状和下一步研究的重点方向。相似文献

10.

Bayesian super-resolution of text in videowith a text-specific bimodal prior

Katherine Donaldson Gregory K. Myers 《International Journal on Document Analysis and Recognition》2005,7(2-3):159-167

To increase the range of sizes of video scene text recognizable by optical character recognition (OCR), we developed a Bayesian super-resolution algorithm that uses a text-specific bimodal prior. We evaluated the effectiveness of the bimodal prior, compared and in conjunction with a piecewise smoothness prior, visually and by measuring the accuracy of the OCR results on the variously super-resolved images. The bimodal prior improved the readability of 4- to 7-pixel-high scene text significantly better than bicubic interpolation and increased the accuracy of OCR results better than the piecewise smoothness prior. 相似文献

11.

Neural network-based text location in color images 总被引：9，自引：0，他引：9

Keechul Jung 《Pattern recognition letters》2001,22(14):1503-1515

This paper proposes neural network-based text locations in complex color images. Texture information extracted on several color bands using neural networks is combined and corresponding text location algorithms are then developed. Text extraction filters can be automatically constructed using neural networks. Comparisons with other text location methods are presented; indicating that the proposed system has a better accuracy. 相似文献

12.

Automated analysis of images in documents for intelligent document search

Xiaonan Lu Saurabh Kataria William J. Brouwer James Z. Wang Prasenjit Mitra C. Lee Giles 《International Journal on Document Analysis and Recognition》2009,12(2):65-81

Authors use images to present a wide variety of important information in documents. For example, two-dimensional (2-D) plots display important data in scientific publications. Often, end-users seek to extract this data and convert it into a machine-processible form so that the data can be analyzed automatically or compared with other existing data. Existing document data extraction tools are semi-automatic and require users to provide metadata and interactively extract the data. In this paper, we describe a system that extracts data from documents fully automatically, completely eliminating the need for human intervention. The system uses a supervised learning-based algorithm to classify figures in digital documents into five classes: photographs, 2-D plots, 3-D plots, diagrams, and others. Then, an integrated algorithm is used to extract numerical data from data points and lines in the 2-D plot images along with the axes and their labels, the data symbols in the figure’s legend and their associated labels. We demonstrate that the proposed system and its component algorithms are effective via an empirical evaluation. Our data extraction system has the potential to be a vital component in high volume digital libraries. 相似文献

13.

Text line extraction from multi-skewed handwritten documents

S. Basu Author VitaeAuthor Vitae M. Kundu Author Vitae Author Vitae D.K. Basu Author Vitae 《Pattern recognition》2007,40(6):1825-1839

A novel text line extraction technique is presented for multi-skewed document images of handwritten English or Bengali text. It assumes that hypothetical water flows, from both left and right sides of the image frame, face obstruction from characters of text lines. The stripes of areas left unwetted on the image frame are finally labelled for extraction of text lines. The success rate of the technique, as observed experimentally, are 90.34% and 91.44% for handwritten Bengali and English document images, respectively. The work may contribute significantly for the development of applications related to optical character recognition of Bengali/English text. 相似文献

14.

基于COM技术的视频流文字检测 总被引：8，自引：1，他引：7

胡宏斌徐骏周洞汝《计算机工程》2001,27(6):95-97

从数字视频中提取文字对基于内容的视频索引的建立具有重要意义。讨论了视频中文字检测的算法,并提出了一种基于COM技术的实现。相似文献

15.

一种应用于隧道视频监控的图像识别系统

全洪渊黄席樾刘爱君《自动化与仪器仪表》2007,(2):48-51

隧道的监视与控制管理是高速公路隧道安全正常运行的重要课题。本文对隧道交通智能视频监控系统的硬件和软件进行了介绍,结合视频监控场景的复杂特性,针对视频监控中涉及视频图像处理与分析的图像分割、目标提取等技术进行了研究。并把系统应用于重庆市中梁山隧道（长度3165米左右）,结果表明本文方法是比较实用的,能满足实时视频监控系统的要求。相似文献

16.

基于自动机理论的PDF文本内容抽取

王晓娟谭建龙刘燕兵刘金刚《计算机应用》2012,32(9):2491-2495

现有的从PDF文档抽取文本内容的方法(如PDFBox类库采用的方法)处理速度较低,无法满足高速网络中内容分析的需求,也不能对网络中部分到达的PDF数据包进行流式的处理。为此,提出了基于自动机理论的PDF文本内容抽取方法。该方法通过建立具有层次的关键字自动机,可以快速地抽取完整PDF文档和不完整PDF文档中的文本内容。在中文和英文PDF文档数据集下的实验结果表明,基于自动机理论的PDF文本内容抽取方法耗时仅为PDFBox方法的17%~37%。相似文献

17.

Constructing and application of multimedia TV-news archives

H.T. Pao Y.H. Chen P.S. Lai Y.Y. Xu Hsin-Chia Fu 《Expert systems with applications》2008,35(3):1444-1450

This paper addresses an integrated information mining techniques for broadcasting TV-news. This utilizes technique from the fields of acoustic, image, and video analysis, for information on news story title, newsman and scene identification. The goal is to construct a compact yet meaningful abstraction of broadcast TV-news, allowing users to browse through large amounts of data in a non-linear fashion with flexibility and efficiency. By adding acoustic analysis, a news program can be partitioned into news and commercial clips, with 90% accuracy on a data set of 400 h TV-news recorded off the air from July 2005 to August 2006. By applying speaker identification and/or image detection techniques, each news stories can be segmented with a better accuracy of 95.92%. On-screen captions or subtitles are recognized by OCR techniques to produce the text title of each news stories. The extracted title words can be used to link or to navigate more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can provide users with multimodal query on specific news stories. Some experimental results are presented and discussed for the system reliability, performance evaluation and comparison. 相似文献

18.

图象和视频的检索技术 总被引：10，自引：0，他引：10

樊凌涛陈健《计算机工程与应用》2001,37(9):71-77

随着网络技术的发展,多媒体数据将成为网络服务的主要内容,因此对多媒体数据管理问题的研究成为近几年的热点。由于媒体信息表现性质的不同,传统关系数据库的检索方式不再适用于图象和视频,因此,必须采用基于自身内容的检索方式。文章对基于内容的图象和视频检索技术分不同层次进行了全面的总结,内容包括依据基本特征,色彩、纹理、形状、和位置关系的技术,视频的场景分割、关键帧提取技术以及基于声音、文字的检索技术等,并阐述了各种方法的优缺点,现状及发展方向。相似文献

19.

Automatic parsing and indexing of news video 总被引：9，自引：0，他引：9

HongJiang Zhang Shuang Yeo Tan Stephen W. Smoliar Gong Yihong 《Multimedia Systems》1995,2(6):256-266

Automatic construction of content-based indices for video source material requires general semantic interpretation of both images and their accompanying sounds; but such a broadly-based semantic analysis is beyond the capabilities of the current technologies of machine vision and audio signal analysis. However, if one can assume a limited and well-demarcated body of domain knowledge for describing the content of a body of video, then it becomes easier to interpret a video source in terms of that domain knowledge. This paper presents our work on using domain knowledge to parse news video programs and to index them on the basis of their visual content. Models based on both the spatial structure of image frames and the temporal structure of the entire program have been developed for news videos, along with algorithms that apply these models by locating and identifying instances of their elements. Experimental results are also discussed in detail to evaluate both the models and the algorithms that use them. Finally, proposals for future work are summarized. 相似文献

20.

Chinese text location under complex background using Gabor filter and SVM 总被引：1，自引：0，他引：1

Jianqiang YanAuthor VitaeJie LiAuthor Vitae Xinbo GaoAuthor Vitae 《Neurocomputing》2011,74(17):2998-3008

For the Chinese text location under complex background, this paper presents a novel method by combining Gabor filter and support vector machine (SVM). It bases on such a fact that Chinese characters are composed of four kinds of strokes. By extracting four kinds of stroke features with Gabor filters, Chinese text location problem can be transformed into a texture classification one, which can use SVM classifier for the purpose. So, the proposed method is composed of two phases. First, Gabor filters with different scales and orientations are employed to obtain four texture images representing the stokes of Chinese text in horizontal line, top-down vertical line, left-downward slope line and short pausing stroke directions. Then, the text regions and background regions in four texture images are used to train four SVM classifiers to distinguish the texture in four directions, by integrating an SVM classification network to obtain the final classification results, according to the sum of the weights to determine whether the block is the text region. Some experiments are conducted on a large amount of typical images with different texts and different fonts. Compared with some existing methods, the proposed approach achieves better results for Chinese text location. 相似文献