首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader), which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multi-frame integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method, and intermediate character recognition results are used to improve the segmentation. We also include a method for locating text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.  相似文献   

2.
Scene change detection techniques for video database systems   总被引:1,自引:0,他引:1  
Scene change detection (SCD) is one of several fundamental problems in the design of a video database management system (VDBMS). It is the first step towards the automatic segmentation, annotation, and indexing of video data. SCD is also used in other aspects of VDBMS, e.g., hierarchical representation and efficient browsing of the video data. In this paper, we provide a taxonomy that classifies existing SCD algorithms into three categories: full-video-image-based, compressed-video-based, and model-based algorithms. The capabilities and limitations of the SCD algorithms are discussed in detail. The paper also proposes a set of criteria for measuring and comparing the performance of various SCD algorithms. We conclude by discussing some important research directions.  相似文献   

3.
Extraction of special effects caption text events from digital video   总被引:2,自引:1,他引:1  
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature. Received: January 29, 2002 / Accepted: September 13, 2002 D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov Correspondence to: David Crandall  相似文献   

4.
We present several algorithms suitable for analysis of broadcast video. First, we show how wavelet analysis of frames of video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments. Next we describe how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based “signature.” Finally, we show that during a subsequent broadcast of a similar or identical video clip, the segment can be found in the database by quickly searching for the relevant signature. The method is robust against noise and typical variations in the video stream, even global changes in brightness that can fool histogram-based techniques. In the paper, we compare experimentally our shot transition mechanism to a color histogram implementation, and also evaluate the effectiveness of our database-searching scheme. Our algorithms are very efficient and run in realtime on a desktop computer. We describe how this technology could be employed to construct a “smart VCR” that was capable of alerting the viewer to the beginning of a specific program or identifying  相似文献   

5.
Using string matching to detect video transitions   总被引:2,自引:0,他引:2  
The detection of shot boundaries in videos captures the structure of the image sequences by the identification of transitional effects. This task is important in the video indexing and retrieval domain. The video slice or visual rhythm is a single two-dimensional image sampling that has been used to detect several types of video events, including transitions. We use the longest common subsequence (LCS) between two strings to transform the video slice into one-dimensional signals obtaining a highly simplified representation of the video content. We also developed a chain of mathematical morphology operations over these signals leading to the detection of the most frequent video transitions, namely, cut, fade, and wipe. The algorithms are tested with success with various genres of videos.  相似文献   

6.
Extraction and recognition of artificial text in multimedia documents   总被引:1,自引:0,他引:1  
Abstract The systems currently available for contentbased image and video retrieval work without semantic knowledge, i. e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e. g. by key word based queries. In this paper we present an algorithm to localise artificial text in images and videos using a measure of accumulated gradients and morphological processing. The quality of the localised text is improved by robust multiple frame integration. A new technique for the binarisation of the text boxes based on a criterion maximizing local contrast is proposed. Finally, detection and OCR results for a commercial OCR are presented, justifying the choice of the binarisation technique.An erratum to this article can be found at  相似文献   

7.
Dot-matrix text recognition is a difficult problem, especially when characters are broken into several disconnected components. We present a dot-matrix text recognition system which uses the fact that dot-matrix fonts are fixed-pitch, in order to overcome the difficulty of the segmentation process. After finding the most likely pitch of the text, a decision is made as to whether the text is written in a fixed-pitch or proportional font. Fixed-pitch text is segmented using a pitch-based segmentation process that can successfully segment both touching and broken characters. We report performance results for the pitch estimation, fixed-pitch decision and segmentation, and recognition processes. Received October 18, 1999 / Revised April 21, 2000  相似文献   

8.
Video in digital format is now commonplace and widespread in both professional use, and in domestic consumer products from camcorders to mobile phones. Video content is growing in volume and while we can capture, compress, store, transmit and display video with great facility, editing videos and manipulating them based on their content is still a non-trivial activity. In this paper, we give a brief review of the state of the art of video analysis, indexing and retrieval and we point to research directions which we think are promising and could make searching and browsing of video archives based on video content, as easy as searching and browsing (text) web pages. We conclude the paper with a list of grand challenges for researchers working in the area.  相似文献   

9.
图象和视频的检索技术   总被引:10,自引:0,他引:10  
随着网络技术的发展,多媒体数据将成为网络服务的主要内容,因此对多媒体数据管理问题的研究成为近几年的热点。由于媒体信息表现性质的不同,传统关系数据库的检索方式不再适用于图象和视频,因此,必须采用基于自身内容的检索方式。文章对基于内容的图象和视频检索技术分不同层次进行了全面的总结,内容包括依据基本特征,色彩、纹理、形状、和位置关系的技术,视频的场景分割、关键帧提取技术以及基于声音、文字的检索技术等,并阐述了各种方法的优缺点,现状及发展方向。  相似文献   

10.
11.
Video text detection and segmentation for optical character recognition   总被引:1,自引:0,他引:1  
In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004  相似文献   

12.
视频和图像中的文本通常在基于内容的视频数据库检索、网络视频搜索,图像分割和图像修复等中起到重要作用,为了提高文本检测的效率,给出了一种基于多种特征自适应阈值的视频文本检测方法.方法是在Michael算法的基础上,利用文本边缘的强度,密度,水平竖直边缘比3个特征计算自适应局部阈值,用阈值能较好去除非文本区域,提取文本边缘,检测并定位文本,减少了Michael算法单一特征阈值的不利影响.在文本定位阶段引入了合并机制.减少了不完整区域的出现.实验结果表明有较高的精度和召回率,可用于视频搜索、图像分割和图像修复等.  相似文献   

13.
On fast microscopic browsing of MPEG-compressed video   总被引:1,自引:0,他引:1  
MPEG has been established as a compression standard for efficient storage and transmission of digital video. However, users are limited to VCR-like (and tedious) functionalities when viewing MPEG video. The usefulness of MPEG video is presently limited by the lack of tools available for fast browsing, manipulation and processing of MPEG video. In this paper, we first address the problem of rapid access to individual shots and frames in MPEG video. We build upon the compressed-video-processing framework proposed in [1, 8], and propose new and fast algorithms based on an adaptive mixture of approximation techniques for extracting spatially reduced image sequence of uniform quality from MPEG video across different frame types and also under different motion activities in the scenes. The algorithms execute faster than real time on a Pentium personal computer. We demonstrate how the reduced images facilitate fast and convenient shot- and frame-level video browsing and access, shot-level editing and annotation, without the need for frequent decompression of MPEG video. We further propose methods for reducing the auxiliary data size associated with the reduced images through exploitation of spatial and temporal redundancy. We also address how the reduced images lead to computationally efficient algorithms for video analysis based on intra- and inter-shot processing for video database and browsing applications. The algorithms, tools for browsing and techniques for video processing presented in this paper have been used by many in IBM Research on more than 30 h of MPEG-1 video for video browsing and analysis.  相似文献   

14.
In video processing, a common first step is to segment the videos into physical units, generally called shots. A shot is a video segment that consists of one continuous action. In general, these physical units need to be clustered to form more semantically significant units, such as scenes, sequences, programs, etc. This is the so-called story-based video structuring. Automatic video structuring is of great importance for video browsing and retrieval. The shots or scenes are usually described by one or several representative frames, called key-frames. Viewed from a higher level, key frames of some shots might be redundant in terms of semantics. In this paper, we propose automatic solutions to the problems of: (i) video partitioning, (ii) key frame computing, (iii) key frame pruning. For the first problem, an algorithm called “net comparison” is devised. It is accurate and fast because it uses both statistical and spatial information in an image and does not have to process the entire image. For the last two problems, we develop an original image similarity criterion, which considers both spatial layout and detail content in an image. For this purpose, coefficients of wavelet decomposition are used to derive parameter vectors accounting for the above two aspects. The parameters exhibit (quasi-) invariant properties, thus making the algorithm robust for many types of object/camera motions and scaling variances. The novel “seek and spread” strategy used in key frame computing allows us to obtain a large representative range for the key frames. Inter-shot redundancy of the key-frames is suppressed using the same image similarity measure. Experimental results demonstrate the effectiveness and efficiency of our techniques.  相似文献   

15.
Query by video clip   总被引:15,自引:0,他引:15  
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.  相似文献   

16.
17.
Saliency text is that characters are ordered with visibility and expressivity. It also contains important clues for video analysis, indexing, and retrieval. Thus, in order to localize the saliency text, a critical stage is to collect key points from real text pixels. In this paper, we propose an evidence-based model of saliency feature extraction (SFE) to probe saliency text points (STPs), which have strong text signal structure in multi-observations simultaneously and always appear between text and its background. Through the multi-observations, each signal structure with rhythms of signal segments is extracted at every location in the visual field. It supports source of evidences for our evidence-based model, where evidences are measured to effectively estimate the degrees of plausibility for obtaining the STP. The evaluation results on benchmark datasets also demonstrate that our proposed approach achieves the state-of-the-art performance on exploring real text pixels and significantly outperforms the existing algorithms for detecting text candidates. The STPs can be the extremely reliable text candidates for future text detectors.  相似文献   

18.
基于多帧图像的视频文字跟踪和分割算法   总被引:6,自引:2,他引:6  
视频中文字的提取是视频语义理解和检索的重要信息来源.针对视频中的静止文字时间和空间上的冗余特性,以文字区域的边缘位图为特征对检测结果作精化,并提出了基于二分搜索法的快速文字跟踪算法,实现了对文字对象快速有效的定位.在分割阶段,除了采用传统的灰度融合图像进行文字区域增强方法,还结合边缘位图对文字区域进行进一步的背景过滤.实验表明,文字的检测精度和分割质量都有很大提高.  相似文献   

19.
In this paper, we present a novel approach for multimedia data indexing and retrieval that is machine independent and highly flexible for sharing multimedia data across applications. Traditional multimedia data indexing and retrieval problems have been attacked using the central data server as the main focus, and most of the indexing and query-processing for retrieval are highly application dependent. This precludes the use of created indices and query processing mechanisms for multimedia data which, in general, have a wide variety of uses across applications. The approach proposed in this paper addresses three issues: 1. multimedia data indexing; 2. inference or query processing; and 3. combining indices and inference or query mechanism with the data to facilitate machine independence in retrieval and query processing. We emphasize the third issue, as typically multimedia data are huge in size and requires intra-data indexing. We describe how the proposed approach addresses various problems faced by the application developers in indexing and retrieval of multimedia data. Finally, we present two applications developed based on the proposed approach: video indexing; and video content authorization for presentation.  相似文献   

20.
The transportation of prerecorded, compressed video data without loss of picture quality requires the network and video servers to support large fluctuations in bandwidth requirements. Fully utilizing a client-side buffer for smoothing bandwidth requirements can limit the fluctuations in bandwidth required from the underlying network and the video-on-demand servers. This paper shows that, for a fixed-size buffer constraint, the critical bandwidth allocation technique results in plans for continuous playback of stored video that have (1) the minimum number of bandwidth increases, (2) the smallest peak bandwidth requirements, and (3) the largest minimum bandwidth requirements. In addition, this paper introduces an optimal bandwidth allocation algorithm which, in addition to the three critical bandwidth allocation properties, minimizes the total number of bandwidth changes necessary for continuous playback. A comparison between the optimal bandwidth allocation algorithm and other critical bandwidth-based algorithms using 17 full-length movie videos and 3 seminar videos is also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号