首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Video Summarization is a technique to reduce the original raw video into a short video summary. Video summarization automates the task of acquiring key frames/segments from the video and combining them to generate a video summary. This paper provides a framework for summarization based on different criteria and also compares different literature work related to video summarization. The framework deals with formulating model for video summarization based on different criteria. Based on target audience/ viewership, number of videos, type of output intended, type of video summary and summarization factor; a model generating video summarization framework is proposed. The paper examines significant research works in the area of video summarization to present a comprehensive review against the framework. Different techniques, perspectives and modalities are considered to preserve the diversity of survey. This paper examines important mathematical formulations to provide meaningful insights for video summarization model creation.  相似文献   

2.
The world is covered with millions of cameras with each recording a huge amount of video. It is a time-consuming task to watch these videos, as most of them are of little interest due to the lack of activity. Video representation is thus an important technology to tackle with this issue. However, conventional video representation methods mainly focus on a single video, aiming at reducing the spatiotemporal redundancy as much as possible. In contrast, this paper describes a novel approach to present the dynamics of multiple videos simultaneously, aiming at a less intrusive viewing experience. Given a main video and multiple supplementary videos, the proposed approach automatically constructs a synthesized multi-video synopsis by integrating the supplementary videos into the most suitable spatiotemporal portions within this main video. The problem of finding suitable integration between the main video and supplementary videos is formulated as the maximum a posterior (MAP) problem, in which the desired properties related to a less intrusive viewing experience, i.e., informativeness, consistency, visual naturalness, and stability, are maximized. This problem is solved by using an efficient Viterbi beam search algorithm. Furthermore, an informative blending algorithm that naturalizes the connecting boundary between different videos is proposed.The proposed method has a wide variety of applications such as visual information representation, surveillance video browsing, video summarization, and video advertising. The effectiveness of multi-video synopsis is demonstrated in extensive experiments over different types of videos with different synopsis cases.  相似文献   

3.
Recent advances in technology have increased the availability of video data, creating a strong requirement for efficient systems to manage those materials. Making efficient use of video information requires that data to be accessed in a user-friendly way. Ideally, one would like to understand a video content, without having to watch it entirely. This has been the goal of a quickly evolving research area known as video summarization. In this paper, we present a novel approach for video summarization that works in the compressed domain and allows the progressive generation of a video summary. The proposed method relies on exploiting visual features extracted from the video stream and on using a simple and fast algorithm to summarize the video content. Experiments on a TRECVID 2007 dataset show that our approach presents high quality relative to the state-of-the-art solutions and in a computational time that makes it suitable for online usage.  相似文献   

4.
Video summarization can facilitate rapid browsing and efficient video indexing in many applications. A good summary should maintain the semantic interestingness and diversity of the original video. While many previous methods extracted key frames based on low-level features, this study proposes Memorability-Entropy-based video summarization. The proposed method focuses on creating semantically interesting summaries based on image memorability. Further, image entropy is introduced to maintain the diversity of the summary. In the proposed framework, perceptual hashing-based mutual information (MI) is used for shot segmentation. Then, we use a large annotated image memorability dataset to fine-tune Hybrid-AlexNet. We predict the memorability score by using the fine-tuned deep network and calculate the entropy value of the images. The frame with the maximum memorability score and entropy value in each shot is selected to constitute the video summary. Finally, our method is evaluated on a benchmark dataset, which comes with five human-created summaries. When evaluating our method, we find it generates high-quality results, comparable to human-created summaries and conventional methods.  相似文献   

5.
Optimal video stream multiplexing through linear programming   总被引:1,自引:0,他引:1  
This paper presents a new optimal multiplexing scheme for compressed video streams based on their individual e-PCRTT transmission schedules. A linear programming algorithm is proffered, which takes into account the different constraints of each client. The algorithm simultaneously finds the optimum total multiplexed and individual stream schedules that minimize the peak transmission rate. Since the problem is formulated as a linear program it is bounded in polynomial time. It is shown that the algorithm succeeds in obtaining maximum bandwidth utilization with Quality of Service (QoS) guarantees. Simulation results using 10 real MPEG-1 video sequences are presented. The optimal multiplexing linear programming results are compared to the e-PCRTT and Join-the-Shortest-Queue (JSQ) procedures in terms of peak transmission bandwidth, P-loss performance and standard deviation. For several client buffer sizes, the rate obtained by our LP solution when compared to a previous e-PCRTT and JSQ methods resulted in reductions of 47% and 56%, respectively. This implies for a fixed rate problem that the proposed scheme can allow an increase in the number of simultaneously served video streams.  相似文献   

6.
This paper presents a novel video stabilization algorithm for real-time optical character recognition (OCR) applications. The proposed method generates output frames in order to stabilize the position of a target word that will be recognized by the OCR application. Unlike in conventional algorithms, in the proposed algorithm, a causal low pass filter is not applied to the trajectory of the target word for reducing the high frequency component of camera motion. The proposed algorithm directly calculates the stable position of the word using two forces: the force used to pull the target word to the center of an output frame and a back force used to return the center of an output frame to the center of an input frame. Hence, the proposed algorithm significantly minimizes the time take to respond to sudden camera movement. Although the proposed method may not outperform state-of-the-art video stabilization in terms of video stability, the proposed technique is much more appropriate for real-time OCR applications than the conventional techniques in terms of accuracy, computational cost, processing delay, and the time taken to respond to sudden camera movement. Simulation results prove the superiority of the proposed method over conventional techniques for real-time OCR applications.  相似文献   

7.
Segmentation of semantic Video Object Planes (VOP‘s) from video sequence is a key to the standard MPEG-4 with content-based video coding. In this paper, the approach of automatic Segmentation of VOP‘s Based on Spatio-Temporal Information (SBSTI) is proposed. The proceeding results demonstrate the good performance of the algorithm.  相似文献   

8.
A novel error concealment algorithm based on a stochastic modeling approach is proposed as a post-processing tool at the decoder side for recovering the lost information incurred during the transmission of encoded digital video bitstreams. In our proposed scheme, both the spatial and the temporal contextual features in video signals are separately modeled using the multiscale Markov random field (MMRF). The lost information is then estimated using maximum a posteriori (MAP) probabilistic approach based on the spatial and temporal MMRF models; hence, a unified MMRF-MAP framework. To preserve the high frequency information (in particular, the edges) of the damaged video frames through iterative optimization, a new adaptive potential function is also introduced in this paper. Comparing to the existing MRF-based schemes and other traditional concealment algorithms, the proposed dual MMRF (DMMRF) modeling method offers significant improvement on both objective peak signal-to-noise ratio (PSNR) measurement and subjective visual quality of restored video sequence.  相似文献   

9.
10.
This paper presents a novel method of key-frame selection for video summarization based on multidimensional time series analysis. In the proposed scheme, the given video is first segmented into a set of sequential clips containing a number of similar frames. Then the key frames are selected by a clustering procedure as the frames closest to the cluster centres in each resulting video clip. The proposed algorithm is implemented experimentally on a wide range of testing data, and compared with state-of-the-art approaches in the literature, which demonstrates excellent performance and outperforms existing methods on frame selection in terms of fidelity-based metric and subjective perception.  相似文献   

11.
In this paper, we propose a new and novel modality fusion method designed for combining spatial and temporal fingerprint information to improve video copy detection performance. Most of the previously developed methods have been limited to use only pre-specified weights to combine spatial and temporal modality information. Hence, previous approaches may not adaptively adjust the significance of the temporal fingerprints that depends on the difference between the temporal variances of compared videos, leading to performance degradation in video copy detection. To overcome the aforementioned limitation, the proposed method has been devised to extract two types of fingerprint information: (1) spatial fingerprint that consists of the signs of DCT coefficients in local areas in a keyframe and (2) temporal fingerprint that computes the temporal variances in local areas in consecutive keyframes. In addition, the so-called temporal strength measurement technique is developed to quantitatively represent the amount of the temporal variances; it can be adaptively used to consider the significance of compared temporal fingerprints. The experimental results show that the proposed modality fusion method outperforms other state-of-the-arts fusion methods and popular spatio-temporal fingerprints in terms of video copy detection. Furthermore, the proposed method can save 39.0%, 25.1%, and 46.1% time complexities needed to perform video fingerprint matching without a significant loss of detection accuracy for our synthetic dataset, TRECVID 2009 CCD Task, and MUSCLE-VCD 2007, respectively. This result indicates that our proposed method can be readily incorporated into the real-life video copy detection systems.  相似文献   

12.
在车载监控过程中,为提高视频检索的效率,提出了一种快速检索方法.首先,采用一种改进的基于块匹配的方法快速消除视频抖动,完成图像序列预处理;然后根据不同用户需求,通过自动和主动两种方式检索视频.实验结果表明,该方法在不遗漏重要画面的同时,能够快速检索到用户感兴趣的视频片段,为用户减少了大量的视频浏览时间.  相似文献   

13.
In this paper, we propose a novel and robust modus operandi for fast and accurate shot boundary detection where the whole design philosophy is based on human perceptual rules and the well-known “Information Seeking Mantra”. By adopting a top–down approach, redundant video processing is avoided and furthermore elegant shot boundary detection accuracy is obtained under significantly low computational costs. Objects within shots are detected via local image features and used for revealing visual discontinuities among shots. The proposed method can be used for detecting all types of gradual transitions as well as abrupt changes. Another important feature is that the proposed method is fully generic, which can be applied to any video content without requiring any training or tuning in advance. Furthermore, it allows a user interaction to direct the SBD process to the user's “Region of Interest” or to stop it once satisfactory results are obtained. Experimental results demonstrate that the proposed algorithm achieves superior computational times compared to the state-of-art methods without sacrificing performance.  相似文献   

14.
为了满足航空大面阵CCD相机视频数据高速、实时传输和存储的要求,本文设计了一种基于H.264视频编码算法的压缩系统。整个压缩系统分为CCD前端、视频压缩、视频显示、视频压缩码流存储以及压缩分析单元,视频压缩单元采用高性能视频专用DSP处理器TMS320DM642,软件平台采用在CCS3.1上使用C语言实现H.264压缩算法。为了使压缩算法高效快速的运行,本文使用了DSP/BIOS资源来管理软硬件工作。 为了高速交互数据,采用了EDMA高速搬运数据策略,进而保证了数据实时传输的需要。实验结果表明,本文提出的压缩系统可以稳定正常的工作,具有良好压缩性能,在压缩比40:1~10:1范围内,平均信噪比高于35dB,满足了航空CCD相机应用的需求。  相似文献   

15.
Rate-distortion optimal video summary generation.   总被引:1,自引:0,他引:1  
The need for video summarization originates primarily from a viewing time constraint. A shorter version of the original video sequence is desirable in a number of applications. Clearly, a shorter version is also necessary in applications where storage, communication bandwidth, and/or power are limited. The summarization process inevitably introduces distortion. The amount of summarization distortion is related to its "conciseness," or the number of frames available in the summary. If there are m frames in the original sequence and n frames in the summary, we define the summarization rate as m/n, to characterize this "conciseness". We also develop a new summarization distortion metric and formulate the summarization problem as a rate-distortion optimization problem. Optimal algorithms based on dynamic programming are presented and compared experimentally with heuristic algorithms. Practical constraints, like the maximum number of frames that can be skipped, are also considered in the formulation and solution of the problem.  相似文献   

16.
The rate budget constraint and the available instantaneous signal-to-noise ratio of the best relay selection in cooperative systems can dramatically impact the system performance and complexity of video applications, since they determine the video distortion. By taking into account these constrained factors, we first outline the signal model and formulate the system optimization problem. Next, we propose a new approach to cross-layer optimization for 3-D video transmission over cooperative relay systems. We propose procedures for estimation of the end-to-end instantaneous signal-to-noise ratio using an estimate of the available instantaneous signal-to-noise ratios between the source–destination, and source–relay–destination before starting to send the video signal to the best relay and destination. A novel approach using Lagrange multipliers is developed to solve the optimum bit allocation problem. Based on the rate budget constraint and the estimated the end-to-end instantaneous signal-to-noise ratio, the proposed joint source–channel coding (JSCC) algorithm simultaneously assigns source code rates for the application layer, the number of high and low priority packets for the network layer, and channel code rates for the physical layer based on criteria that maximize the quality of video, whilst minimizing the complexity of the system. Finally, we investigate the impact of the estimated the end-to-end instantaneous signal-to-noise ratio on the video system performance and complexity. Experimental results show that the proposed JSCC algorithm outperforms existing algorithms in terms of peak signal-to-noise ratio. Moreover, the proposed JSCC algorithm is found to be computationally more efficient since it can minimize the overall video distortion in a few iterations.  相似文献   

17.
Video compression is essential for uploading videos to online platforms which usually have bandwidth limitations. However, the compression reduces the visual quality. To overcome this problem, the visual quality of the low bitrate compressed videos for various standards, including H.264 and HEVC in decoders, needs to be improved. Accordingly, this paper proposes a novel method for improving video quality based on 3D convolutional neural networks (CNNs). This method is totally compatible with the encoders of video compression standards, i.e., H.264, VVC, and HEVC, and can be implemented easily. In particular, the proposed neural network model receives five frames of the low bitrate compressed video as input and subsequently predicts the compression error of frames using the first and fifth frames. Finally, it reconstructs an improved version of the frame with high quality. The CNN is an Additive (3D) model that can predict the eliminated inter-frame redundancies resulting from compression. Our goal is to increase the peak signal to noise ratio (PSNR) and structural index similarity (SSIM) of the luminance (Y) and chrominance (U, V) frames in the video. Additive 3D-CNN achieves an average of 12.4%, 9.9% and 5% BD-rate increases for LP, LB and RA for the Y component. The results indicate that the new proposed algorithm outperforms the previous methods in terms of PSNR, SSIM, and BD-rate.  相似文献   

18.
为解决TD-LTE系统中视频传输的功耗效率和实时性问题,提出了一种保证QoS的基于预测功能的基站架构和视频传输非连续接收(DRX)跨层优化算法。通过在基站架构中添加预测功能块,并结合对视频帧尺寸信息的建模、参数估计和预测,实现DRX周期的优化,在减小功耗的同时保证视频传输的实时性。通过大量的模拟,仿真结果证明可以在省电和视频传输质量两者之间权衡找到最佳工作区域,节省终端设备约30-80%的功率。  相似文献   

19.
便于硬件实现的视频格式转换算法的研究   总被引:1,自引:0,他引:1  
视频领域内大量格式的并存,使不同格式视频信号间的信息难以交流,这个问题可以通过视频格式转换等技术解决。在讨论视频格式转换的相关算法的基础上,提出了两种效果较好且便于硬件实现的视频格式转换算法:窗函数(边增强)法和中值滤波(边增强)法。第一种方法在FPGA板上进行的验证和软件仿真表明,图像主观测评效果良好,具有较强的稳健性。第二种方法利用了信号在空间和时间上的相关信息,克服了图像边缘不够锐利的不足,效果较好,且硬件的成本较低。两种方法都极具实用价值。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号