期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient visual attention based framework for extracting key frames from videos

Naveed Ejaz Irfan Mehmood Sung Wook Baik 《Signal Processing: Image Communication》2013,28(1):34-44

The huge amount of video data on the internet requires efficient video browsing and retrieval strategies. One of the viable solutions is to provide summaries of the videos in the form of key frames. The video summarization using visual attention modeling has been used of late. In such schemes, the visually salient frames are extracted as key frames on the basis of theories of human attention modeling. The visual attention modeling schemes have proved to be effective in video summarization. However, the high computational costs incurred by these techniques limit their applicability in practical scenarios. In this context, this paper proposes an efficient visual attention model based key frame extraction method. The computational cost is reduced by using the temporal gradient based dynamic visual saliency detection instead of the traditional optical flow methods. Moreover, for static visual saliency, an effective method employing discrete cosine transform has been used. The static and dynamic visual attention measures are fused by using a non-linear weighted fusion method. The experimental results indicate that the proposed method is not only efficient, but also yields high quality video summaries. 相似文献

2.

Video saliency detection incorporating temporal information in compressed domain

《Signal Processing: Image Communication》2015

Saliency detection is widely used to pick out relevant parts of a scene as visual attention regions for various image/video applications. Since video is increasingly being captured, moved and stored in compressed form, there is a need for detecting video saliency directly in compressed domain. In this study, a compressed video saliency detection algorithm is proposed based on discrete cosine transformation (DCT) coefficients and motion information within a visual window. Firstly, DCT coefficients and motion information are extracted from H.264 video bitstream without full decoding. Due to a high quantization parameter setting in encoder, skip/intra is easily chosen as the best prediction mode, resulting in a large number of blocks with zero motion vector and no residual existing in video bitstream. To address these problems, the motion vectors of skip/intra coded blocks are calculated by interpolating its surroundings. In addition, a visual window is constructed to enhance the contrast of features and to avoid being affected by encoder. Secondly, after spatial and temporal saliency maps being generated by the normalized entropy, a motion importance factor is imposed to refine the temporal saliency map. Finally, a variance-like fusion method is proposed to dynamically combine these maps to yield the final video saliency map. Experimental results show that the proposed approach significantly outperforms other state-of-the-art video saliency detection models. 相似文献

3.

Visual saliency guided video compression algorithm

Rupesh Gupta Meera Thapar Khanna Santanu Chaudhury 《Signal Processing: Image Communication》2013,28(9):1006-1022

Recently Saliency maps from input images are used to detect interesting regions in images/videos and focus on processing these salient regions. This paper introduces a novel, macroblock level visual saliency guided video compression algorithm. This is modelled as a 2 step process viz. salient region detection and frame foveation. Visual saliency is modelled as a combination of low level, as well as high level features which become important at the higher-level visual cortex. A relevance vector machine is trained over 3 dimensional feature vectors pertaining to global, local and rarity measures of conspicuity, to yield probabilistic values which form the saliency map. These saliency values are used for non-uniform bit-allocation over video frames. To achieve these goals, we also propose a novel video compression architecture, incorporating saliency, to save tremendous amount of computation. This architecture is based on thresholding of mutual information between successive frames for flagging frames requiring re-computation of saliency, and use of motion vectors for propagation of saliency values. 相似文献

4.

Video abstraction based on the visual attention model and online clustering

Qing-Ge Ji Zhi-Dang Fang Zhen-Hua Xie Zhe-Ming Lu 《Signal Processing: Image Communication》2013,28(3):241-253

With the fast evolution of digital video, research and development of new technologies are greatly needed to lower the cost of video archiving, cataloging and indexing, as well as improve the efficiency and accessibility of stored video sequences. A number of methods to respectively meet these requirements have been researched and proposed. As one of the most important research topics, video abstraction helps to enable us to quickly browse a large video database and to achieve efficient content access and representation. In this paper, a video abstraction algorithm based on the visual attention model and online clustering is proposed. First, shot boundaries are detected and key frames in each shot are extracted so that consecutive key frames in a shot have the same distance. Second, the spatial saliency map indicating the saliency value of each region of the image is generated from each key frame and regions of interest (ROI) is extracted according to the saliency map. Third, key frames, as well as their corresponding saliency map, are passed to a specific filter, and several thresholds are used so that the key frames containing less information are discarded. Finally, key frames are clustered using an online clustering method based on the features in ROIs. Experimental results demonstrate the performance and effectiveness of the proposed video abstraction algorithm. 相似文献

5.

Compressed domain video saliency detection using global and local spatiotemporal features

《Journal of Visual Communication and Image Representation》2016

A compressed domain video saliency detection algorithm, which employs global and local spatiotemporal (GLST) features, is proposed in this work. We first conduct partial decoding of a compressed video bitstream to obtain motion vectors and DCT coefficients, from which GLST features are extracted. More specifically, we extract the spatial features of rarity, compactness, and center prior from DC coefficients by investigating the global color distribution in a frame. We also extract the spatial feature of texture contrast from AC coefficients to identify regions, whose local textures are distinct from those of neighboring regions. Moreover, we use the temporal features of motion intensity and motion contrast to detect visually important motions. Then, we generate spatial and temporal saliency maps, respectively, by linearly combining the spatial features and the temporal features. Finally, we fuse the two saliency maps into a spatiotemporal saliency map adaptively by comparing the robustness of the spatial features with that of the temporal features. Experimental results demonstrate that the proposed algorithm provides excellent saliency detection performance, while requiring low complexity and thus performing the detection in real-time. 相似文献

6.

Spatiotemporal saliency detection and salient region determination for H.264 videos

Kang-Ting Hu Jin-Jang Leou Han-Hui Hsiao 《Journal of Visual Communication and Image Representation》2013,24(7):760-772

In this study, a spatiotemporal saliency detection and salient region determination approach for H.264 videos is proposed. After Gaussian filtering in Lab color space, the phase spectrum of Fourier transform is used to generate the spatial saliency map of each video frame. On the other hand, the motion vector fields from each H.264 compressed video bitstream are backward accumulated. After normalization and global motion compensation, the phase spectrum of Fourier transform for the moving parts is used to generate the temporal saliency map of each video frame. Then, the spatial and temporal saliency maps of each video frame are combined to obtain its spatiotemporal saliency map using adaptive fusion. Finally, a modified salient region determination scheme is used to determine salient regions (SRs) of each video frame. Based on the experimental results obtained in this study, the performance of the proposed approach is better than those of two comparison approaches. 相似文献

7.

A novel video abstraction method based on fast clustering of the regions of interest in key frames

《AEUE-International Journal of Electronics and Communications》2014,68(8):783-794

Online video nowadays has become one of the top activities for users and has become easy to access. In the meantime, how to manage such huge amount of video data and retrieve them efficiently has become a big issue. In this article, we propose a novel method for video abstraction based on fast clustering of the regions of interest (ROIs). Firstly, the key-frames in each shot are extracted using the average histogram algorithm. Secondly, the saliency and edge maps are generated from each key-frame. According to these two maps, the key points for the visual attention model can be determined. Meanwhile, in order to expand the regions surrounding the key points, several thresholds are calculated from the corresponding key-frame. Thirdly, based on the key points and thresholds, several regions of interest are expanded and thus the main content in each frame is obtained. Finally, the fast clustering method is performed on the key frames by utilizing their ROIs. The performance and effectiveness of the proposed video abstraction algorithm is demonstrated by several experimental results. 相似文献

8.

Activity-driven content adaptation for effective video summarization

Jinchang Ren Jianmin Jiang Yue Feng 《Journal of Visual Communication and Image Representation》2010,21(8):930-938

In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided. 相似文献

9.

基于超图排序算法的视频摘要

下载免费PDF全文

冀中樊帅飞《电子学报》2017,45(5):1035-1043

视频摘要技术作为一种快速感知视频内容的方式得到了广泛的关注.现有基于图模型的视频摘要方法将视频帧作为顶点,通过边表示两个顶点之间的关系,但并不能很好地捕获视频帧之间的复杂关系.为了克服该缺点,本文提出了一种基于超图排序算法的静态视频摘要方法（Hyper-Graph Ranking based Video Summarization,HGRVS）.HGRVS方法首先通过构建视频超图模型,将任意多个有内在关联的视频帧使用一条超边连接;然后提出一种基于超图排序的视频帧分类算法将视频帧按内容分类;最后通过求解提出的一种优化函数来生成静态视频摘要.在Open Video Project和YouTube两个数据集上的大量主观与客观实验验证了所提HGRVS算法的优良性能. 相似文献

10.

No-reference quality assessment of HEVC video streams based on visual memory modelling

《Journal of Visual Communication and Image Representation》2021

Providing adequate Quality of Experience (QoE) to end-users is crucial for streaming service providers. In this paper, in order to realize automatic quality assessment, a No-Reference (NR) bitstream Human-Vision-System-(HVS)-based video quality assessment (VQA) model is proposed. Inspired by discoveries from the neuroscience community, which suggest there is a considerable overlap between active areas of the brain when engaging in video quality assessment and saliency detection tasks, saliency maps are used in the proposed method to improve the quality assessment accuracy. To this end, saliency maps are first generated from features extracted from the HEVC bitstream. Then, saliency map statistics are employed to create a model of visual memory. Finally, a support vector regression pipeline learns an estimate of the video quality from the visual memory, saliency, and frame features. Evaluations on SJTU dataset indicate that the proposed bitstream based no-reference video quality assessment algorithm achieves a competitive performance. 相似文献

11.

HEVC压缩域的视频摘要关键帧提取方法

下载免费PDF全文

朱树明王凤随程海鹰《信号处理》2019,35(3):481-489

为了提高关键帧提取的准确率，改善视频摘要的质量，提出了一种HEVC压缩域的视频摘要关键帧提取方法。首先，对视频序列进行编解码，在解码中统计HEVC帧内编码PU块的亮度预测模式数目。然后，特征提取是利用统计得到的模式数目构建成模式特征向量，并将其作为视频帧的纹理特征用于关键帧的提取。最后，利用融合迭代自组织数据分析算法(ISODATA）的自适应聚类算法对模式特征向量进行聚类，在聚类结果中选取每个类内中间向量对应的帧作为候选关键帧，并通过相似度对候选关键帧进行再次筛选，剔除冗余帧，得到最终的关键帧。实验结果表明，在Open Video Project数据集上进行的大量实验验证，该方法提取关键帧的精度为79.9%、召回率达到93.6%、F-score为86.2%，有效地改善了视频摘要的质量。相似文献

12.

注意力模型指导下的视频质量评价方法

王淦宋利张文军《电视技术》2014,38(7):11-14,5

在视频质量评价方法中,常常需要对人眼视觉系统做出合理的假设,其中注意力模型就是一个很重要的因素。提出了一种在注意力模型指导下的视频质量评价方法,在图像帧的质量评价中加入了显著性区域信息,使之更能符合人眼视觉特性,并兼顾了视频中的运动信息,在一定程度上提高了客观质量评价方法的性能。相似文献

13.

Selecting salient frames for spatiotemporal video modeling and segmentation.

Xiaomu Song Guoliang Fan 《IEEE transactions on image processing》2007,16(12):3035-3046

We propose a new statistical generative model for spatiotemporal video segmentation. The objective is to partition a video sequence into homogeneous segments that can be used as "building blocks" for semantic video segmentation. The baseline framework is a Gaussian mixture model (GMM)-based video modeling approach that involves a six-dimensional spatiotemporal feature space. Specifically, we introduce the concept of frame saliency to quantify the relevancy of a video frame to the GMM-based spatiotemporal video modeling. This helps us use a small set of salient frames to facilitate the model training by reducing data redundancy and irrelevance. A modified expectation maximization algorithm is developed for simultaneous GMM training and frame saliency estimation, and the frames with the highest saliency values are extracted to refine the GMM estimation for video segmentation. Moreover, it is interesting to find that frame saliency can imply some object behaviors. This makes the proposed method also applicable to other frame-related video analysis tasks, such as key-frame extraction, video skimming, etc. Experiments on real videos demonstrate the effectiveness and efficiency of the proposed method. 相似文献

14.

Adaptive key frame extraction for video summarization using an aggregation mechanism

Naveed Ejaz Tayyab Bin Tariq Sung Wook Baik 《Journal of Visual Communication and Image Representation》2012,23(7):1031-1040

Video summarization is a method to reduce redundancy and generate succinct representation of the video data. One of the mechanisms to generate video summaries is to extract key frames which represent the most important content of the video. In this paper, a new technique for key frame extraction is presented. The scheme uses an aggregation mechanism to combine the visual features extracted from the correlation of RGB color channels, color histogram, and moments of inertia to extract key frames from the video. An adaptive formula is then used to combine the results of the current iteration with those from the previous. The use of the adaptive formula generates a smooth output function and also reduces redundancy. The results are compared to some of the other techniques based on objective criteria. The experimental results show that the proposed technique generates summaries that are closer to the summaries created by humans. 相似文献

15.

Finding spatio-temporal salient paths for video objects discovery

《Journal of Visual Communication and Image Representation》2016

Many videos capture and follow salient objects in a scene. Detecting such salient objects is thus of great interests to video analytics and search. However, the discovery of salient objects in an unsupervised way is a challenging problem as there is no prior knowledge of the salient objects provided. Different from existing salient object detection methods, we propose to detect and track salient object by finding a spatio-temporal path which has the largest accumulated saliency density in the video. Inspired by the observation that salient video objects usually appear in consecutive frames, we leverage the motion coherence of videos into the path discovery and make the salient object detection more robust. Without any prior knowledge of the salient objects, our method can detect salient objects of various shapes and sizes, and is able to handle noisy saliency maps and moving cameras. Experimental results on two public datasets validate the effectiveness of the proposed method in both qualitative and quantitative terms. Comparisons with the state-of-the-art methods further demonstrate the superiority of our method on salient object detection in videos. 相似文献

16.

基于视觉显著性的雷达视频舰船检测

周伟何东亮关键李国强《雷达科学与技术》2012,10(1):54-58

将计算机视觉领域的视觉显著性的概念引入常规雷达视频序列中舰船目标的检测,提出了适用于雷达视频的目标显著性表示模型。利用局部对比度来表征目标与杂波在回波强度上的差异,利用运动显著性来提取目标与固定地杂波和起伏海杂波之间的差异,经线性组合形成综合显著图,可快速准确提取目标。将连续多帧提取的结果进行积累,通过分析目标历史轨迹对目标加以确认。最后,利用采集的某型导航雷达实测数据进行的实验表明了方法的有效性。相似文献

17.

A comprehensive survey and mathematical insights towards video summarization

《Journal of Visual Communication and Image Representation》2022

Video Summarization is a technique to reduce the original raw video into a short video summary. Video summarization automates the task of acquiring key frames/segments from the video and combining them to generate a video summary. This paper provides a framework for summarization based on different criteria and also compares different literature work related to video summarization. The framework deals with formulating model for video summarization based on different criteria. Based on target audience/ viewership, number of videos, type of output intended, type of video summary and summarization factor; a model generating video summarization framework is proposed. The paper examines significant research works in the area of video summarization to present a comprehensive review against the framework. Different techniques, perspectives and modalities are considered to preserve the diversity of survey. This paper examines important mathematical formulations to provide meaningful insights for video summarization model creation. 相似文献

18.

Spatial-temporal saliency action mask attention network for action recognition

《Journal of Visual Communication and Image Representation》2020

Recently, video action recognition about two-stream network is still a popular research topic in computer vision. However, most of current two-stream-based methods have two redundancy issues, including: inter-frame redundancy and intra-frame redundancy. To solve the above problems, a Spatial-Temporal Saliency Action Mask Attention network (STSAMANet) is built for action recognition. First, this paper introduces a key-frame mechanism to eliminate inter-frame redundancy. This mechanism can compute key frames on each video sequence to get the greatest difference between frames. Then, Mask R-CNN detection technology is introduced to build a saliency attention layer to eliminate intra-frame redundancy. This layer is to focus on the saliency human body and objects for each action class. We experiment on two public video action datasets, i.e., the UCF101 dataset and Penn Action dataset to verify the effectiveness of our method in action recognition. 相似文献

19.

A novel video salient object extraction method based on visual attention

Yang Yi Jia Ding Jieling Lai 《Signal Processing: Image Communication》2013,28(1):45-54

This paper addresses a novel approach to automatically extract video salient objects based on visual attention mechanism and seeded object growing technique. First, a dynamic visual attention model to capture the object motions by global motion estimation and compensation is constructed. Through combining it with a static attention model, a saliency map is formed. Then, with a modified inhibition of return (MIOR) strategy, the winner-take-all (WTA) neural network is used to scan the saliency map for the most salient locations selected as attention seeds. Lastly, the particle swarm optimization (PSO) algorithm is employed to grow the attention objects modeled by Markov random field (MRF) from the seeds. Experiments verify that our presented approach could extract both of stationary and moving salient objects efficiently. 相似文献

20.

基于提取双选紧密特征的RGB-D显著性检测网络

化春键邹新童蒋毅俞建峰陈莹《光电子．激光》2023,34(10):1026-1035

针对现有算法对不同来源特征之间的交互选择关注度欠缺以及对跨模态特征提取不充分的问题,提出了一种基于提取双选紧密特征的RGB-D显著性检测网络。首先,为了筛选出能够同时增强RGB图像显著区域和深度图像显著区域的特征,引入双向选择模块(bi-directional selection module, BSM);为了解决跨模态特征提取不充分,导致算法计算冗余且精度低的问题,引入紧密提取模块(dense extraction module, DEM);最后,通过特征聚合模块(feature aggregation module, FAM)对密集特征进行级联融合,并将循环残差优化模块(recurrent residual refinement aggregation module, RAM)配合深度监督实现粗显著图的持续优化,最终得到精确的显著图。在4个广泛使用的数据集上进行的综合实验表明,本文提出的算法在4个关键指标方面优于7种现有方法。相似文献