共查询到20条相似文献,搜索用时 78 毫秒
1.
The paper presents an automatic video summarization technique based on graph theory methodology and the dominant sets clustering
algorithm. The large size of the video data set is handled by exploiting the connectivity information of prototype frames
that are extracted from a down-sampled version of the original video sequence. The connectivity information for the prototypes
which is obtained from the whole set of data improves video representation and reveals its structure. Automatic selection
of the optimal number of clusters and hereafter keyframes is accomplished at a next step through the dominant set clustering
algorithm. The method is free of user-specified modeling parameters and is evaluated in terms of several metrics that quantify
its content representational ability. Comparison of the proposed summarization technique to the Open Video storyboard, the
Adaptive clustering algorithm and the Delaunay clustering approach, is provided.
相似文献
2.
Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes
to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes
and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly,
key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template
matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video
content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously
recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm
has been performed on various genres of films and TV program. Promising experimental results show that the proposed method
makes sense to efficient retrieval of interesting video content.
相似文献
3.
This paper addresses the problem of ensuring the integrity of a digital video and presents a scalable signature scheme for
video authentication based on cryptographic secret sharing. The proposed method detects spatial cropping and temporal jittering
in a video, yet is robust against frame dropping in the streaming video scenario. In our scheme, the authentication signature
is compact and independent of the size of the video. Given a video, we identify the key frames based on differential energy
between the frames. Considering video frames as shares, we compute the corresponding secret at three hierarchical levels.
The master secret is used as digital signature to authenticate the video. The proposed signature scheme is scalable to three
hierarchical levels of signature computation based on the needs of different scenarios. We provide extensive experimental
results to show the utility of our technique in three different scenarios—streaming video, video identification and face tampering.
相似文献
4.
We present a real-time implementation of 2D to 3D video conversion using compressed video. In our method, compressed 2D video
is analyzed by extracting motion vectors. Using the motion vector maps, depth maps are built for each frame and the frames
are segmented to provide object-wise depth ordering. These data are then used to synthesize stereo pairs. 3D video synthesized
in this fashion can be viewed using any stereoscopic display. In our implementation, anaglyph projection was selected as the
3D visualization method, because it is mostly suited to standard displays.
相似文献
5.
In this paper, we propose a new real-time content filtering framework for live broadcasts in TV terminals. Content filtering
in TV terminals is a necessary provision of personalized broadcasting services in that it enables a TV viewer to obtain desired
scenes from multiple channel broadcasts. In this paper, a stable and reliable filtering structure and an algorithm for multiple
inputs are proposed. Moreover, real-time filtering requirements such as frame sampling rate per channel, number of input channels,
and buffer condition are analyzed to achieve real-time processing in terminals with limited computing power. Based on queueing
theory, we model the system and resolve the filtering requirements. To verify the proposed system and analysis, a filtering
algorithm for soccer videos is applied which is modified for real-time processing. Through analysis of visual features (e.g.,
dominant color and edge components) and detection of spatial objects (e.g., a score board), it recognizes a temporal pattern
between successive video frames and filters desired scenes. Experiments on soccer videos have been performed and the results
validate the effectiveness of the proposed approach and system.
相似文献
6.
The paper presents a real-time algorithm that compensates image distortions due to atmospheric turbulence in video sequences,
while keeping the real moving objects in the video unharmed. The algorithm involves (1) generation of a “reference” frame,
(2) estimation, for each incoming video frame, of a local image displacement map with respect to the reference frame, (3)
segmentation of the displacement map into two classes: stationary and moving objects; (4) turbulence compensation of stationary
objects. Experiments with both simulated and real-life sequences have shown that the restored videos, generated in real-time
using standard computer hardware, exhibit excellent stability for stationary objects while retaining real motion.
相似文献
7.
This paper proposes an appearance generative mixture model based on key frames for meanshift tracking. Meanshift tracking
algorithm tracks an object by maximizing the similarity between the histogram in tracking window and a static histogram acquired
at the beginning of tracking. The tracking therefore could fail if the appearance of the object varies substantially. In this
paper, we assume the key appearances of the object can be acquired before tracking and the manifold of the object appearance
can be approximated by piece-wise linear combination of these key appearances in histogram space. The generative process is
described by a Bayesian graphical model. An Online EM algorithm is proposed to estimate the model parameters from the observed
histogram in the tracking window and to update the appearance histogram. We applied this approach to track human head motion
and to infer the head pose simultaneously in videos. Experiments verify that our online histogram generative model constrained
by key appearance histograms alleviates the drifting problem often encountered in tracking with online updating, that the
enhanced meanshift algorithm is capable of tracking object of varying appearances more robustly and accurately, and that our
tracking algorithm can infer additional information such as the object poses.
Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.
相似文献
8.
In applications, such as post-production and archiving of audiovisual material, users are confronted with large amounts of
redundant unedited raw material, called rushes. Viewing and organizing this material are crucial but time consuming tasks.
Typically, multiple but slightly different takes of the same scene can be found in the rushes video. We propose a method for
detecting and clustering takes of one scene shot from the same or very similar camera positions. An important subproblem is
to determine the similarity of video segments. We propose a distance measure based on the Longest Common Subsequence (LCSS)
model. Two variants of the proposed approach, one with a threshold parameter and one with automatically determined threshold,
are compared against the Dynamic Time Warping (DTW) distance measure on six videos from the TRECVID 2007 BBC rushes summarization
data set. We also evaluate the influence of the applied temporal segmentation method at the input on the results. Applications
of the proposed method to automatic skimming and interactive browsing of rushes video are described.
相似文献
9.
A video encryption scheme combining with advanced video coding (AVC) is presented and analyzed in this paper, which is different
from the ones used in MPEG1/2 video encryption. In the proposed scheme, the intra-prediction mode and motion vector difference
are encrypted with the length-kept encryption algorithm (LKE) in order to keep the format compliance, and the residue data
of the macroblocks are encrypted with the residue data encryption algorithm (RDE) in order to keep low cost. Additionally,
a key distribution scheme is proposed to keep the robustness to transmission errors, which assigns sub-keys to different frames
or slices independently. The encryption scheme’s security, time efficiency and error robustness are analyzed in detail. Experimental
results show that the encryption scheme keeps file format unchanged, is secure against replacement attacks, is efficient in
computing, and is robust to some transmission errors. These properties make it a suitable choice for real-time applications,
such as secure IPTV, secure videoconference or mobile/wireless multimedia, etc.
相似文献
10.
In this paper, we address the problem of video frame rate up-conversion (FRC) in the compressed domain. FRC is often recognized
as video temporal interpolation. This problem is very challenging when targeted for video sequences with inconsistent camera
and object motion, such as sports videos. A novel compressed domain motion compensation scheme is presented and applied in
this paper, aiming at up-sampling frame rates in sports videos. MPEG-2 encoded motion vectors (MVs) are utilized as inputs
in the proposed algorithm. The decoded MVs undergo a cumulative spatiotemporal interpolation. An iterative rejection scheme
based on the dense motion vector field (MVF) and the generalized affine motion model is exploited to detect global camera
motion. Subsequently, the foreground object separation is performed by additionally examining the temporal consistency of
the output of iterative rejections. This consistency check process helps coalesce the resulting foreground blocks and weed
out the unqualified blocks. Finally, different compensation strategies for the camera and object motions are applied to interpolate
the new frames. Illustrative examples are provided to demonstrate the efficacy of the proposed approach. Experimental results
are compared with the popular block and non-block based frame interpolation approaches.
相似文献
11.
This paper presents an FPGA-based architecture for local tone mapping of gray scale high dynamic range images. The architecture
is described in VHDL and has been synthesized using Altera Quartus tools. It achieves an operating frequency consistent with
a video rate of 60 frames per second for a frame of 1,024 × 768 pixels. The proposed architecture is a modification of the
nine-scale Reinhard operator. Approximations to the original Reinhard operator ensure that the operator is amenable to implementation
in hardware. A peak signal-to-noise ratio study shows that our fixed-point hardware approximation produces results similar
to a floating-point original.
相似文献
12.
The objective measurement of blocking artifacts plays an important role in the design, optimization, and assessment of image
and video compression. In this paper, we propose a novel measurement algorithm for blocking artifacts. Computer simulation
results indicate that the proposed method accurately measures the blocking artifacts without using the original image. Moreover,
the proposed algorithm can be easily implemented in both pixel and DCT domains.
相似文献
13.
Multimedia applications often have performance requirements that make these applications computing resource intense; e.g., the number of video frames displayed to the user must be about 25 frames per second. A user hint is an indication of the interest that a user has in an application. Examples of user hints include a screen saver being invoked or covering a window with another window. In the case that the user is running video, the occurrence of these hints imply that the user is no longer viewing the video. However, the resource usage of the application has not changed. This paper describes an architecture that make use of user hints to reduce the resource consumption of an application. The emphasis is on network traffic and CPU usage. Experimental results are presented. 相似文献
14.
Video is an information-intensive media with much redundancy. Therefore, it is desirable to be able to mine structure or semantics
of video data for efficient browsing, summarization and highlight extraction. In this paper, we propose a mosaic based approach
to key-event as well as structure mining, which is regarded as a complementary view for sports video analysis. Mosaic is generated
for each shot by a novel efficient mosaicing scheme, which constructs a global motion path and selects a best subset of frames
for mosaicing. These improved mosaics are then used as the representative image of shot content. Based on mosaic, the structure
and event in sports video are mined by the methods with prior knowledge and without prior knowledge. Without prior knowledge,
our system is able to locate global view shots taken by dominant camera. If prior knowledge is available, the events in these
global view shots are detected using robust features extracted from mosaics. For global view mining, the experiments compared
with key-frame-based scheme have demonstrated that this mosaic-based scheme presents better results in several kinds of sports
videos; for events mining, the detection of key-plays and key-events in the specific-domain of soccer videos have proved its
effectiveness.
相似文献
15.
Using the multiple reference frames compensation in the H264 coder improves the coding efficiency for sequences which contain
uncovered backgrounds, repetitive motions and highly textured areas. Unfortunately this technique requires excessive memory
and computation resources. In this article, we proposed and implemented a technique based on Markov Random Fields Algorithm
relying on robust moving pixel segmentation. By the introduction of this technique, we were able to decrease the number of
reference frames from five to three while keeping similar video coding performances. The coding time decreased by 35% and
the sequence quality was preserved. After the validation of our idea, we evaluated the processing time of the Markov algorithm
on architectures intended for embedded multimedia applications. Both DSP and FPGA implementations were explored. We were able
to process 50 frames(128 × 128)/s on the EP1S10 FPGA paltform and 35 frames(128 × 128)/s on the ADSP BF533.
相似文献
16.
Quantitative usability requirements are a critical but challenging, and hence an often neglected aspect of a usability engineering process. A case study is described where quantitative usability requirements played a key role in the development of a new user interface of a mobile phone. Within the practical constraints of the project, existing methods for determining usability requirements and evaluating the extent to which these are met, could not be applied as such, therefore tailored methods had to be developed. These methods and their applications are discussed. 相似文献
17.
This paper presents an efficient VLSI architecture for fast implementation of sub-pixel interpolation of H.264/AVC. Several
optimization techniques at different design levels, such as parallel processing, vector register, pipeline architecture, and
in-place computation, are utilized to reduce the number of memory access and accelerate the interpolation computations. The
proposed application-specific processor can meet the real-time constraint of the sub-pixel interpolation algorithm for the
16:9 video format (4,690 × 2,304) at 30 frames per second (fps) at 100 MHz clock rate.
相似文献
18.
Television daily produces massive amounts of videos. Digital video is unfortunately an unstructured document in which it is
very difficult to find any information. Television streams have however a strong and stable but hidden structure that we want
to discover by detecting repeating objects in the video stream. This paper shows that television streams are actually highly
redundant and that detecting repeats can be an effective way to detect the underlying structure of the video. A method for
detecting these repetitions is presented here with an emphasis on the efficiency of the search in a large video corpus. Very
good results are obtained both in terms of effectiveness (98% in recall and precision) as well as efficiency since one day
of video is queried against a 3 weeks dataset in only 1 s.
相似文献
19.
In conventional motion compensated temporal filtering based wavelet coding scheme, where the group of picture structure and
low-pass frame position are fixed, variations in motion activities of video sequences are not considered. In this paper, we
propose an adaptive group of picture structure selection scheme, which the group of picture size and low-pass frame position
are selected based on mutual information. Furthermore, the temporal decomposition process is determined adaptively according
to the selected group of picture structure. A large amount of experimental work is carried out to compare the compression
performance of proposed method with the conventional motion compensated temporal filtering encoding scheme and adaptive group
of picture structure in standard scalable video coding model. The proposed low-pass frame selection can improve the compression
quality by about 0.3–0.5 dB comparing to the conventional scheme in video sequences with high motion activities. In the scenes
with un-even variation of motion activities, e.g. frequent shot cuts, the proposed adaptive group of picture size can achieve
a better compression capability than conventional scheme. When comparing to adaptive group of picture in standard scalable
video coding model, the proposed group of picture structure scheme can lead to about 0.2~0.8 dB improvements in sequences
with high motion activities or shot cut.
相似文献
20.
The purposes of this study are (a) to establish a measurement for evaluating conversational impressions of group discussions,
and (b) to make an exploratory investigation on their interactional processes which may affect to form those impressions.
The impression rating and factor analysis undertaken first give us four factors concerning conversational impressions of “focus
group interviews (FGIs)”: conversational activeness, conversational sequencing, the attitudes of participants and the relationships
of participants. In relation to the factors of conversational activeness and conversational sequencing in particular, the
microanalysis of four selected topical scenes from our database further shows that the behavior of the moderator and the interviewees
is organized not independently but with reference to each other. The study thus emphasizes the importance of the integration
of quantitative and qualitative approaches towards human interactions.
相似文献
|