首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Key frame extraction based on visual attention model   总被引:2,自引:0,他引:2  
Key frame extraction is an important technique in video summarization, browsing, searching and understanding. In this paper, we propose a novel approach to extract the most attractive key frames by using a saliency-based visual attention model that bridges the gap between semantic interpretation of the video and low-level features. First, dynamic and static conspicuity maps are constructed based on motion, color and texture features. Then, by introducing suppression factor and motion priority schemes, the conspicuity maps are fused into a saliency map that includes only true attention regions to produce attention curve. Finally, after time-constraint cluster algorithm grouping frames with similar content, the frames with maximum saliency value are selected as key-frames. Experimental results demonstrate the effectiveness of our approach for video summarization by retrieving the meaningful key frames.  相似文献   

2.
With the fast evolution of digital video, research and development of new technologies are greatly needed to lower the cost of video archiving, cataloging and indexing, as well as improve the efficiency and accessibility of stored video sequences. A number of methods to respectively meet these requirements have been researched and proposed. As one of the most important research topics, video abstraction helps to enable us to quickly browse a large video database and to achieve efficient content access and representation. In this paper, a video abstraction algorithm based on the visual attention model and online clustering is proposed. First, shot boundaries are detected and key frames in each shot are extracted so that consecutive key frames in a shot have the same distance. Second, the spatial saliency map indicating the saliency value of each region of the image is generated from each key frame and regions of interest (ROI) is extracted according to the saliency map. Third, key frames, as well as their corresponding saliency map, are passed to a specific filter, and several thresholds are used so that the key frames containing less information are discarded. Finally, key frames are clustered using an online clustering method based on the features in ROIs. Experimental results demonstrate the performance and effectiveness of the proposed video abstraction algorithm.  相似文献   

3.
Video summarization is a method to reduce redundancy and generate succinct representation of the video data. One of the mechanisms to generate video summaries is to extract key frames which represent the most important content of the video. In this paper, a new technique for key frame extraction is presented. The scheme uses an aggregation mechanism to combine the visual features extracted from the correlation of RGB color channels, color histogram, and moments of inertia to extract key frames from the video. An adaptive formula is then used to combine the results of the current iteration with those from the previous. The use of the adaptive formula generates a smooth output function and also reduces redundancy. The results are compared to some of the other techniques based on objective criteria. The experimental results show that the proposed technique generates summaries that are closer to the summaries created by humans.  相似文献   

4.
Video summarization can facilitate rapid browsing and efficient video indexing in many applications. A good summary should maintain the semantic interestingness and diversity of the original video. While many previous methods extracted key frames based on low-level features, this study proposes Memorability-Entropy-based video summarization. The proposed method focuses on creating semantically interesting summaries based on image memorability. Further, image entropy is introduced to maintain the diversity of the summary. In the proposed framework, perceptual hashing-based mutual information (MI) is used for shot segmentation. Then, we use a large annotated image memorability dataset to fine-tune Hybrid-AlexNet. We predict the memorability score by using the fine-tuned deep network and calculate the entropy value of the images. The frame with the maximum memorability score and entropy value in each shot is selected to constitute the video summary. Finally, our method is evaluated on a benchmark dataset, which comes with five human-created summaries. When evaluating our method, we find it generates high-quality results, comparable to human-created summaries and conventional methods.  相似文献   

5.
冀中  樊帅飞 《电子学报》2017,45(5):1035-1043
视频摘要技术作为一种快速感知视频内容的方式得到了广泛的关注.现有基于图模型的视频摘要方法将视频帧作为顶点,通过边表示两个顶点之间的关系,但并不能很好地捕获视频帧之间的复杂关系.为了克服该缺点,本文提出了一种基于超图排序算法的静态视频摘要方法(Hyper-Graph Ranking based Video Summarization,HGRVS).HGRVS方法首先通过构建视频超图模型,将任意多个有内在关联的视频帧使用一条超边连接;然后提出一种基于超图排序的视频帧分类算法将视频帧按内容分类;最后通过求解提出的一种优化函数来生成静态视频摘要.在Open Video Project和YouTube两个数据集上的大量主观与客观实验验证了所提HGRVS算法的优良性能.  相似文献   

6.
Key frame based video summarization has emerged as an important area of research for the multimedia community. Video key frames enable an user to access any video in a friendly and meaningful way. In this paper, we propose an automated method of video key frame extraction using dynamic Delaunay graph clustering via an iterative edge pruning strategy. A structural constraint in form of a lower limit on the deviation ratio of the graph vertices further improves the video summary. We also employ an information-theoretic pre-sampling where significant valleys in the mutual information profile of the successive frames in a video are used to capture more informative frames. Various video key frame visualization techniques for efficient video browsing and navigation purposes are incorporated. A comprehensive evaluation on 100 videos from the Open Video and YouTube databases using both objective and subjective measures demonstrate the superiority of our key frame extraction method.  相似文献   

7.
为了提高关键帧提取的准确率,改善视频摘要的质量,提出了一种HEVC压缩域的视频摘要关键帧提取方法。首先,对视频序列进行编解码,在解码中统计HEVC帧内编码PU块的亮度预测模式数目。然后,特征提取是利用统计得到的模式数目构建成模式特征向量,并将其作为视频帧的纹理特征用于关键帧的提取。最后,利用融合迭代自组织数据分析算法(ISODATA)的自适应聚类算法对模式特征向量进行聚类,在聚类结果中选取每个类内中间向量对应的帧作为候选关键帧,并通过相似度对候选关键帧进行再次筛选,剔除冗余帧,得到最终的关键帧。实验结果表明,在Open Video Project数据集上进行的大量实验验证,该方法提取关键帧的精度为79.9%、召回率达到93.6%、F-score为86.2%,有效地改善了视频摘要的质量。   相似文献   

8.
Video summarization refers to an important set of abstraction techniques aimed to provide a compact representation of the video essential to effectively browse and retrieve video content from multimedia repositories. Most of these video summarization techniques, such as image storyboards, video skims and fast previews, are based on selecting some frames or segments. H.264/AVC has become a widely accepted coding standard and is expected that many of the content will be available in this format soon. This paper proposes a generic model of video summarization especially suitable for generating summaries of H.264/AVC bitstreams in a highly efficient manner, using the concept of temporal scalability via hierarchical prediction structures. Along with the model, specific examples of summarization techniques are given to prove the utility of the model.  相似文献   

9.
Video summary technology based on keyframe extraction is an effective means to rapidly access video content. Traditional video summary generation technology requires high video resolution, which poses a problem as most existing studies have no targeted solutions for videos that are subject to privacy protection. We propose a novel keyframe extraction algorithm for video data in the visual shielding domain, named visual shielding compressed sensing coding and double-layer affinity propagation (VSCS-DAP). VSCS-DAP involves three main steps. First, the video is compressed by compressed sensing technology to provide a visual shielding effect (protecting the privacy of monitored figures), while the data volume is significantly reduced. Then, pyramid histogram of oriented gradients (PHOG) features are extracted from the compressed video to be clustered by the first step affinity propagation (AP) to gain the summaries of the first stage. Finally, the PHOG and Hist fusion features are extracted from the keyframes of the first stage, and they cluster the fused PHOG-Hist features by the second step AP algorithm to obtain the final output summaries. Experimental results obtained on two common video datasets show that our method exhibits advantages including low redundancy and few missing frames, low computational complexity, strong real-time performance, and robustness to vision-shielded video.  相似文献   

10.
Online video nowadays has become one of the top activities for users and has become easy to access. In the meantime, how to manage such huge amount of video data and retrieve them efficiently has become a big issue. In this article, we propose a novel method for video abstraction based on fast clustering of the regions of interest (ROIs). Firstly, the key-frames in each shot are extracted using the average histogram algorithm. Secondly, the saliency and edge maps are generated from each key-frame. According to these two maps, the key points for the visual attention model can be determined. Meanwhile, in order to expand the regions surrounding the key points, several thresholds are calculated from the corresponding key-frame. Thirdly, based on the key points and thresholds, several regions of interest are expanded and thus the main content in each frame is obtained. Finally, the fast clustering method is performed on the key frames by utilizing their ROIs. The performance and effectiveness of the proposed video abstraction algorithm is demonstrated by several experimental results.  相似文献   

11.
Recently Saliency maps from input images are used to detect interesting regions in images/videos and focus on processing these salient regions. This paper introduces a novel, macroblock level visual saliency guided video compression algorithm. This is modelled as a 2 step process viz. salient region detection and frame foveation. Visual saliency is modelled as a combination of low level, as well as high level features which become important at the higher-level visual cortex. A relevance vector machine is trained over 3 dimensional feature vectors pertaining to global, local and rarity measures of conspicuity, to yield probabilistic values which form the saliency map. These saliency values are used for non-uniform bit-allocation over video frames. To achieve these goals, we also propose a novel video compression architecture, incorporating saliency, to save tremendous amount of computation. This architecture is based on thresholding of mutual information between successive frames for flagging frames requiring re-computation of saliency, and use of motion vectors for propagation of saliency values.  相似文献   

12.
This paper addresses a novel approach to automatically extract video salient objects based on visual attention mechanism and seeded object growing technique. First, a dynamic visual attention model to capture the object motions by global motion estimation and compensation is constructed. Through combining it with a static attention model, a saliency map is formed. Then, with a modified inhibition of return (MIOR) strategy, the winner-take-all (WTA) neural network is used to scan the saliency map for the most salient locations selected as attention seeds. Lastly, the particle swarm optimization (PSO) algorithm is employed to grow the attention objects modeled by Markov random field (MRF) from the seeds. Experiments verify that our presented approach could extract both of stationary and moving salient objects efficiently.  相似文献   

13.
In recent years, many computational models for saliency prediction have been introduced. For dynamic scenes, the existing models typically combine different feature maps extracted from spatial and temporal domains either by following generic integration strategies such as averaging or winners take all or using machine learning techniques to set each feature’s importance. Rather than resorting to these fixed feature integration schemes, in this paper, we propose a novel weakly supervised dynamic saliency model called HedgeSal, which is based on a decision-theoretic online learning scheme. Our framework uses two pretrained deep static saliency models as experts to extract individual saliency maps from appearance and motion streams, and then generates the final saliency map by weighted decisions of all these models. As visual characteristics of dynamic scenes constantly vary, the models providing consistently good predictions in the past are automatically assigned higher weights, allowing each expert to adjust itself to the current conditions. We demonstrate the effectiveness of our model on the CRCNS, UCFSports and CITIUS datasets.  相似文献   

14.
Recent advances in technology have increased the availability of video data, creating a strong requirement for efficient systems to manage those materials. Making efficient use of video information requires that data to be accessed in a user-friendly way. Ideally, one would like to understand a video content, without having to watch it entirely. This has been the goal of a quickly evolving research area known as video summarization. In this paper, we present a novel approach for video summarization that works in the compressed domain and allows the progressive generation of a video summary. The proposed method relies on exploiting visual features extracted from the video stream and on using a simple and fast algorithm to summarize the video content. Experiments on a TRECVID 2007 dataset show that our approach presents high quality relative to the state-of-the-art solutions and in a computational time that makes it suitable for online usage.  相似文献   

15.
In this paper, a novel face segmentation algorithm is proposed based on facial saliency map (FSM) for head-and-shoulder type video application. This method consists of three stages. The first stage is to generate the saliency map of input video image by our proposed facial attention model. In the second stage, a geometric model and an eye-map built from chrominance components are employed to localize the face region according to the saliency map. The third stage involves the adaptive boundary correction and the final face contour extraction. Based on the segmented result, an effective boundary saliency map (BSM) is then constructed, and applied for the tracking based segmentation of the successive frames. Experimental evaluation on test sequences shows that the proposed method is capable of segmenting the face area quite effectively.  相似文献   

16.
是快速获取视频关键信息的有效途径,现有的视频摘要方法通常计算复杂度高,在计算资源受限的场景下难以实际应用。为此,提出了一种高效的顾及方向信息的时空联合监控视频摘要方法。该方法首先采用水平切片获得目标时空运动轨迹;其次去除时空轨迹背景并计算直线轨迹斜率,依据目标时空轨迹斜率完成目标运动方向判定;然后检测采样域运动片段以确定目标在视频中的时序位置;最后依据目标时序位置及其运动方向自适应构建视频摘要。实验结果表明,所提方法的帧平均处理时间(average frame processing time,AFPT)达到了0.374 s,明显优于对比方法,且所生成的视频摘要简洁高效、用户体验好。  相似文献   

17.
State of the art methods for video resizing usually produce perceivable visual discontinuities. Therefore, how to preserve the visual continuity in video frames is one of the most critical issues. In this paper, we propose a novel approach for modeling dynamic visual attention based on spatiotemporal analysis in order to detect the focus of interest automatically. The continuously varied co-sited blocks in a video cube are first detected and their variations are characterized as visual cubes, which are further employed to determine a proper extent of salient regions in video frames. Once the proper extent through video cubes is determined, the resizing process then can be conducted to find the global optimum. Our experiment shows that the proposed content-aware video resizing based on spatiotemporal visual cubes can effectively generate resized videos while keeping their isotropic manipulation and the continuous dynamics of visual perception.  相似文献   

18.
With the emerging development of three-dimensional (3D) related technologies, 3D visual saliency modeling is becoming particularly important and challenging. This paper presents a new depth perception and visual comfort guided saliency computational model for stereoscopic 3D images. The prominent advantage of the proposed model is that we incorporate the influence of depth perception and visual comfort on 3D visual saliency computation. The proposed saliency model is composed of three components: 2D image saliency, depth saliency and visual comfort based saliency. In the model, color saliency, texture saliency and spatial compactness are computed respectively and fused to derive 2D image saliency. Global disparity contrast is considered to compute depth saliency. Particularly, we train a visual comfort prediction function to distinguish stereoscopic image pair as high comfortable stereo viewing (HCSV) or low comfortable stereo viewing (LCSV), and devise different computational rules to generate a visual comfort based saliency map. The final 3D saliency map is obtained by using a linear combination and enhanced by a “saliency-center bias” model. Experimental results show that the proposed 3D saliency model outperforms the state-of-the-art models on predicting human eye fixations and visual comfort assessment.  相似文献   

19.
We propose a new statistical generative model for spatiotemporal video segmentation. The objective is to partition a video sequence into homogeneous segments that can be used as "building blocks" for semantic video segmentation. The baseline framework is a Gaussian mixture model (GMM)-based video modeling approach that involves a six-dimensional spatiotemporal feature space. Specifically, we introduce the concept of frame saliency to quantify the relevancy of a video frame to the GMM-based spatiotemporal video modeling. This helps us use a small set of salient frames to facilitate the model training by reducing data redundancy and irrelevance. A modified expectation maximization algorithm is developed for simultaneous GMM training and frame saliency estimation, and the frames with the highest saliency values are extracted to refine the GMM estimation for video segmentation. Moreover, it is interesting to find that frame saliency can imply some object behaviors. This makes the proposed method also applicable to other frame-related video analysis tasks, such as key-frame extraction, video skimming, etc. Experiments on real videos demonstrate the effectiveness and efficiency of the proposed method.  相似文献   

20.
基于区域增长的遥感影像视觉显著目标快速检测   总被引:4,自引:3,他引:4  
张立保 《中国激光》2012,39(11):1114001
针对传统视觉注意模型在遥感影像视觉显著区域检测中存在的计算复杂度高、检测精度低等缺点,提出了一种新的视觉显著区域快速检测算法。首先利用整数小波变换降低遥感影像的空间分辨率,从而降低视觉注意焦点检测的计算复杂度;然后在视觉特征融合中引入二维离散矩变换,生成边缘与纹理信息更为丰富的遥感影像显著图;最后在显著图分析中提出区域增长策略来获得视觉显著区域的精确轮廓。实验结果表明,新算法不仅有效降低了遥感影像视觉显著区域检测的计算复杂度,而且能够精确描述视觉显著区域的轮廓信息,同时避免了对整幅遥感影像的分割与特征提取,为今后的遥感影像目标检测提供了一定地参考价值。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号