共查询到20条相似文献,搜索用时 0 毫秒
1.
WebClip (on-line demo at http://www.ctr.columbia.edu/webclip) is a compressed video searching and editing system operating
over the World Wide Web. WebClip uses a distributed client-server model including a server engine for content analysis/editing,
and clients for interactive controls of video browsing/editing. It specializes several unique features, including compressed-domain
video feature extraction and manipulation, multi-resolution video access, content based video browsing/retrieval, and a distributed
network architecture. 相似文献
2.
Published online: 15 March 2002 相似文献
3.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
4.
Xiaodong Wen Theodore D. Huffmire Helen H. Hu Adam Finkelstein 《Multimedia Systems》1999,7(5):350-358
We present several algorithms suitable for analysis of broadcast video. First, we show how wavelet analysis of frames of
video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments. Next we
describe how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based “signature.”
Finally, we show that during a subsequent broadcast of a similar or identical video clip, the segment can be found in the
database by quickly searching for the relevant signature. The method is robust against noise and typical variations in the
video stream, even global changes in brightness that can fool histogram-based techniques. In the paper, we compare experimentally
our shot transition mechanism to a color histogram implementation, and also evaluate the effectiveness of our database-searching
scheme. Our algorithms are very efficient and run in realtime on a desktop computer. We describe how this technology could
be employed to construct a “smart VCR” that was capable of alerting the viewer to the beginning of a specific program or identifying 相似文献
5.
Much work on video servers has concentrated on movies on demand, in which a relatively small number of titles are viewed
and users are given basic VCR-style controls. This paper concentrates on analyzing video server performance for non-linear
access applications. In particular, we study two non-linear video applications: video libraries, in which users select from
a large collection of videos and may be interested in viewing only a small part of the title; and video walk-throughs, in
which users can move through an image-mapped representation of a space. We present a characterization of the workloads of
these applications. Our simulation studies show that video server architectures developed for movies on demand can be adapted
to video library usage, though caching is less effective and the server can support a smaller user population for non-linear
video applications. We also show that video walk-throughs require extremely large amounts of RAM buffering to provide adequate
performance for even a small number of users. 相似文献
6.
Query by video clip 总被引:15,自引:0,他引:15
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries
that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features
around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained
with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar
to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features
of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and
a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one
basketball video as query and a different basketball video as the database show the effectiveness of feature representation
and matching schemes. 相似文献
7.
Scene change detection techniques for video database systems 总被引:1,自引:0,他引:1
Haitao Jiang Abdelsalam Helal Ahmed K. Elmagarmid Anupam Joshi 《Multimedia Systems》1998,6(3):186-195
Scene change detection (SCD) is one of several fundamental problems in the design of a video database management system (VDBMS).
It is the first step towards the automatic segmentation, annotation, and indexing of video data. SCD is also used in other
aspects of VDBMS, e.g., hierarchical representation and efficient browsing of the video data. In this paper, we provide a
taxonomy that classifies existing SCD algorithms into three categories: full-video-image-based, compressed-video-based, and
model-based algorithms. The capabilities and limitations of the SCD algorithms are discussed in detail. The paper also proposes
a set of criteria for measuring and comparing the performance of various SCD algorithms. We conclude by discussing some important
research directions. 相似文献
8.
Fast techniques for the optimal smoothing of stored video 总被引:3,自引:0,他引:3
Work-ahead smoothing is a technique whereby a server, transmitting stored compressed video to a client, utilizes client buffer
space to reduce the rate variability of the transmitted stream. The technique requires the server to compute a schedule of
transfer under the constraints that the client buffer neither overflows nor underflows. Recent work established an optimal
off-line algorithm (which minimizes peak, variance and rate variability of the transmitted stream) under the assumptions of
fixed client buffer size, known worst case network jitter, and strict playback of the client video. In this paper, we examine
the practical considerations of heterogeneous and dynamically variable client buffer sizes, variable worst case network jitter
estimates, and client interactivity. These conditions require on-line computation of the optimal transfer schedule. We focus on techniques for reducing on-line computation time. Specifically,
(i) we present an algorithm for precomputing and storing the optimal schedules for all possible client buffer sizes in a compact
manner; (ii) we show that it is theoretically possible to precompute and store compactly the optimal schedules for all possible
estimates of worst case network jitter; (iii) in the context of playback resumption after client interactivity, we show convergence
of the recomputed schedule with the original schedule, implying greatly reduced on-line computation time; and (iv) we propose
and empirically evaluate an “approximation scheme” that produces a schedule close to optimal but takes much less computation
time. 相似文献
9.
Easy-to-use audio/video authoring tools play a crucial role in moving multimedia software from research curiosity to mainstream
applications. However, research in multimedia authoring systems has rarely been documented in the literature. This paper describes
the design and implementation of an interactive video authoring system called Zodiac, which employs an innovative edit history abstraction to support several unique editing features not found in existing commercial
and research video editing systems. Zodiac provides users a conceptually clean and semantically powerful branching history model of edit operations to organize the authoring process, and to navigate among versions of authored documents. In addition,
by analyzing the edit history, Zodiac is able to reliably detect a composed video stream's shot and scene boundaries, which facilitates interactive video browsing.
Zodiac also features a video object annotation capability that allows users to associate annotations to moving objects in a video sequence. The annotations themselves could
be text, image, audio, or video. Zodiac is built on top of MMFS, a file system specifically designed for interactive multimedia development environments, and implements an internal buffer
manager that supports transparent lossless compression/decompression. Shot/scene detection, video object annotation, and buffer
management all exploit the edit history information for performance optimization. 相似文献
10.
The promise of a broadband integrated service digital network has led to the design of mechanisms for efficient transport
of real-time compressed video in packet switching networks. We examine feedback control for video transport in ATM networks
where the available feedback is a single bit of information carried in the cell header. We investigate the performance of
three single-bit schemes for source rate adaptation. Two were originally for congestion control of bursty data traffic and
are modified to control video traffic. The third scheme conveys more information about
the state of queue(s) at the bottleneck. The simulation results show that all three schemes for feedback control of VBR video
streams work remarkably well. During severe network congestion, the signal quality degrades gracefully, but not uniformly
across all connections. Based on insights from the initial simulations, we propose a scheme to improve the fairness of service
and demonstrate its effectiveness. 相似文献
11.
An optimal bandwidth allocation strategy for the delivery of compressed prerecorded video 总被引:1,自引:0,他引:1
The transportation of prerecorded, compressed video data without loss of picture quality requires the network and video
servers to support large fluctuations in bandwidth requirements. Fully utilizing a client-side buffer for smoothing bandwidth
requirements can limit the fluctuations in bandwidth required from the underlying network and the video-on-demand servers.
This paper shows that, for a fixed-size buffer constraint, the critical bandwidth allocation technique results in plans
for continuous playback of stored video that have (1) the minimum number of bandwidth increases, (2) the smallest peak bandwidth
requirements, and (3) the largest minimum bandwidth requirements. In addition, this paper introduces an optimal bandwidth allocation algorithm which, in addition to the three critical bandwidth allocation properties, minimizes the total number of bandwidth
changes necessary for continuous playback. A comparison between the optimal bandwidth allocation algorithm and other critical
bandwidth-based algorithms using 17 full-length movie videos and 3 seminar videos is also presented. 相似文献
12.
David Crandall Sameer Antani Rangachar Kasturi 《International Journal on Document Analysis and Recognition》2003,5(2-3):138-157
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically
index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's
semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary
color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about
the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video.
In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However,
text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once
in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial
problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television
programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing,
and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared
with existing work found in the literature.
Received: January 29, 2002 / Accepted: September 13, 2002
D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com
S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov
Correspondence to: David Crandall 相似文献
13.
Abstract. Conventional tracking methods encounter difficulties as the number of objects, clutter, and sensors increase, because of
the requirement for data association. Statistical tracking, based on the concept of network tomography, is an alternative
that avoids data association. It estimates the number of trips made from one region to another in a scene based on interregion
boundary traffic counts accumulated over time. It is not necessary to track an object through a scene to determine when an
object crosses a boundary. This paper describes statistical tracing and presents an evaluation based on the estimation of
pedestrian and vehicular traffic intensities at an intersection over a period of 1 month. We compare the results with those
from a multiple-hypothesis tracker and manually counted ground-truth estimates.
Received: 30 August 2001 / Accepted: 28 May 2002
Correspondence to: J.E. Boyd 相似文献
14.
Craig G. Nevill-Manning Ian H. Witten Gordon W. Paynter 《International Journal on Digital Libraries》1999,2(2-3):111-123
Developing intuition for the content of a digital collection is difficult. Hierarchies of subject terms allow users to explore the space of topics that a collection covers, to form and specialize useful query terms, and to directly identify interesting documents. We describe two interfaces for navigating such hierarchies, and present a technique for inferring hierarchies automatically from large corpora. We also discuss scalability issues for the techniques involved, and our solutions to these problems. Received: 15 December 1997 / Revised: June 1999 相似文献
15.
In this paper, we propose a multi-level abstraction mechanism for capturing the spatial and temporal semantics associated
with various objects in an input image or in a sequence of video frames. This abstraction can manifest itself effectively
in conceptualizing events and views in multimedia data as perceived by individual users. The objective is to provide an efficient
mechanism for handling content-based queries, with the minimum amount of processing performed on raw data during query evaluation.
We introduce a multi-level architecture for video data management at different levels of abstraction. The architecture facilitates
a multi-level indexing/searching mechanism. At the finest level of granularity, video data can be indexed based on mere appearance
of objects and faces. For management of information at higher levels of abstractions, an object-oriented paradigm is proposed
which is capable of supporting domain specific views. 相似文献
16.
We present a novel technique, called 2-Phase Service Model, for streaming videos to home users in a limited-bandwidth environment. This scheme first delivers some number of non-adjacent
data fragments to the client in Phase 1. The missing fragments are then transmitted in Phase 2 as the client is playing back
the video. This approach offers many benefits. The isochronous bandwidth required for Phase 2 can be controlled within the
capability of the transport medium. The data fragments received during Phase 1 can be used to provide an excellent preview
of the video. They can also be used to facilitate VCR-style operations such as fast-forward and fast-reverse. Systems designed
based on this method are less expensive because the fast-forward and fast-reverse versions of the video files are no longer
needed. Eliminating these files also improves system performance because mapping between the regular files and their fast-forward
and fast-reverse versions is no longer part of the VCR operations. Furthermore, since each client machine handles its own
VCR-style interaction, this technique is very scalable. We provide simulation results to show that 2-Phase Service Model is
able to handle VCR functions efficiently. We also implement a video player called {\em FRVplayer}. With this prototype, we
are able to judge that the visual quality of the previews and VCR-style operations is excellent. These features are essential
to many important applications. We discuss the application of FRVplayer in the design of a video management system, called
VideoCenter. This system is intended for Internet applications such as digital video libraries. 相似文献
17.
Metrics for shot boundary detection in digital video sequences 总被引:5,自引:0,他引:5
The detection of shot boundaries in video sequences is an important task for generating indexed video databases. This paper
provides a comprehensive quantitative comparison of the metrics that have been applied to shot boundary detection. In addition,
several standardized statistical tests that have not been applied to this problem, as well as three new metrics, are considered.
A mathematical framework for quantitatively comparing metrics is supplied. Experimental results based on a video database
containing 39,000 frames are included. 相似文献
18.
19.
Multimedia data, especially continuous media including video and audio objects, represent a rich and natural stimulus for
humans, but require large amount of storage capacity and real-time processing. In this paper, we describe how to organize
video data efficiently on multiple disks in order to support arbitrary-rate playback requested by different users independently.
Our approach is to segment and decluster video objects and to place the segments in multiple disks using a restricted round-robin
scheme, called prime round-robin (PRR). Its placement scheme provides uniform load balance of disks for arbitrary retrieval rate as well as normal playback,
since it eliminates hot spots. Moreover, it does not require any additional disk bandwidth to support VCR-like operations
such as fast-forward and rewind. We have studied the various effects of placement and retrieval schemes in a storage server
by simulation. The results show that PRR offers even disk accesses, and the failure in reading segment by deadline occurs
only at the beginning of new operations. In addition, the number of users admitted is not decreased, regardless of arbitrary-rate
playback requests. 相似文献
20.
The broadband integrated services digital networks (B-ISDN) based on asynchronous transfer mode (ATM) technology can support
a wide range of applications such as voice, video, still images, and data. Compression techniques increase the effective bandwidth
utilization, but the bursty and asynchronous nature of the traffic can still lead to congestion in the network, and degradation
of image quality and quality of service (QOS). Some of the features to provide better coding schemes for ATM networks are
layered coding, resynchronization, buffering, interleaved schemes, constrained bit rate due to buffers, encapsulation with
the RTP or AAL1 for clock recovery, lapped transforms, motion compensation, and optimal bit allocation for coders based on
wavelet transforms. We review various techniques forimage and video coding such as transforms, motion compensation, vector
quantization, and subband coding. We outline the impact of the cell loss ratio (CLR), delay and cell delay variation (CDV)
on video coding: blocking effects, loss of frame synchronization, motion vectors, and vector quantization codewords. The open
problems include tuning coding parameters to the available QOS provided by the network. 相似文献