首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.

Video anomaly detection automatically recognizes abnormal events in surveillance videos. Existing works have made advances in recognizing whether a video contains abnormal events; however, they cannot temporally localize the abnormal events within videos. This paper presents a novel anomaly attention-based framework for accurately temporally localize the abnormal events. Benefiting from the proposed framework, we can achieve frame-level VAD using video-level labels, which significantly reduces the burden of data annotation. Our method is an end-to-end deep neural network-based approach, which contains three modules: anomaly attention module (AAM), discriminative anomaly attention module (DAAM) and generative anomaly attention module (GAAM). Specifically, AAM is trained to generate the anomaly attention, which is used to measure the abnormal degree of each frame. Whereas, DAAM and GAAM are used to alternately augmenting AAM from two different aspects. On the one hand, DAAM enhancing AAM by optimizing the video-level video classification. On the other hand, GAAM adopts a conditional variational autoencoder to model the likelihood of each frame given the attention for refining AAM. As a result, AAM can generate higher anomaly scores for abnormal frames while lower anomaly scores for normal frames. Experimental results show that our proposed approach outperforms state-of-the-art methods, which validates the superiority of our AAVAD.

  相似文献   

2.
Image and video matting are still challenging problems in areas with low foreground‐background contrast. Video matting also has the challenge of ensuring temporally coherent mattes because the human visual system is highly sensitive to temporal jitter and flickering. On the other hand, video provides the opportunity to use information from other frames to improve the matte accuracy on a given frame. In this paper, we present a new video matting approach that improves the temporal coherence while maintaining high spatial accuracy in the computed mattes. We build sample sets of temporal and local samples that cover all the color distributions of the object and background over all previous frames. This helps guarantee spatial accuracy and temporal coherence by ensuring that proper samples are found even when distantly located in space or time. An explicit energy term encourages temporal consistency in the mattes derived from the selected samples. In addition, we use localized texture features to improve spatial accuracy in low contrast regions where color distributions overlap. The proposed method results in better spatial accuracy and temporal coherence than existing video matting methods.  相似文献   

3.
近年来,深度学习在人工智能领域表现出优异的性能。基于深度学习的人脸生成和操纵技术已经能够合成逼真的伪造人脸视频,也被称作深度伪造,让人眼难辨真假。然而,这些伪造人脸视频可能会给社会带来巨大的潜在威胁,比如被用来制作政治虚假新闻,从而引发政治暴力或干扰正常选举等。因此,亟需研发对应的检测方法来主动发现伪造人脸视频。现有的方法在制作伪造人脸视频时,容易在空间上和时序上留下一些细微的伪造痕迹,比如纹理和颜色上的扭曲或脸部的闪烁等。主流的检测方法同样采用深度学习,可以被划分为两类,即基于视频帧的方法和基于视频片段的方法。前者采用卷积神经网络(Convolutional Neural Network,CNN)发现单个视频帧中的空间伪造痕迹,后者则结合循环神经网络(Recurrent Neural Network,RNN)捕捉视频帧之间的时序伪造痕迹。这些方法都是基于图像的全局信息进行决策,然而伪造痕迹一般存在于五官的局部区域。因而本文提出了一个统一的伪造人脸视频检测框架,利用全局时序特征和局部空间特征发现伪造人脸视频。该框架由图像特征提取模块、全局时序特征分类模块和局部空间特征分类模块组成。在FaceForensics++数据集上的实验结果表明,本文所提出的方法比之前的方法具有更好的检测效果。  相似文献   

4.
Temporal coherence is an important problem in Non‐Photorealistic Rendering for videos. In this paper, we present a novel approach to enhance temporal coherence in video painting. Instead of painting on video frame, our approach first partitions the video into multiple motion layers, and then places the brush strokes on the layers to generate the painted imagery. The extracted motion layers consist of one background layer and several object layers in each frame. Then, background layers from all the frames are aligned into a panoramic image, on which brush strokes are placed to paint the background in one‐shot. The strokes used to paint object layers are propagated frame by frame using smooth transformations defined by thin plate splines. Once the background and object layers are painted, they are projected back to each frame and blent to form the final painting results. Thanks to painting a single image, our approach can completely eliminate the flickering in background, and temporal coherence on object layers is also significantly enhanced due to the smooth transformation over frames. Additionally, by controlling the painting strokes on different layers, our approach is easy to generate painted video with multi‐style. Experimental results show that our approach is both robust and efficient to generate plausible video painting.  相似文献   

5.
自注意力机制的视频摘要模型   总被引:1,自引:0,他引:1  
针对如何高效地识别出视频中具有代表性的内容问题,提出了一种对不同的视频帧赋予不同重要性的视频摘要算法.首先使用长短期记忆网络来建模视频序列的时序关系,然后利用自注意力机制建模视频中不同帧的重要性程度并提取全局特征,最后通过每一帧回归得到的重要性得分进行采样,并使用强化学习策略优化模型参数.其中,强化学习的动作定义为每一帧选或者不选,状态定义为当前这个视频的选择情况,反馈信号使用多样性和代表性代价.在2个公开数据集SumMe和TVSum中进行视频摘要实验,并使用F-度量来衡量这2个数据集上不同视频摘要算法的准确度,实验结果表明,提出的视频摘要算法结果要优于其他算法.  相似文献   

6.
Chen  Yu  Hu  Ruimin  Xiao  Jing  Xu  Liang  Wang  Zhongyuan 《Multimedia Tools and Applications》2019,78(11):14705-14731

The rapidly increasing surveillance video data has challenged the existing video coding standards. Even though knowledge based video coding scheme has been proposed to remove redundancy of moving objects across multiple videos and achieved great coding efficiency improvement, it still has difficulties to cope with complicated visual changes of objects resulting from various factors. In this paper, a novel hierarchical knowledge extraction method is proposed. Common knowledge on three coarse-to-fine levels, namely category level, object level and video level, are extracted from history data to model the initial appearance, stable changes and temporal changes respectively for better object representation and redundancy removal. In addition, we apply the extracted hierarchical knowledge to surveillance video coding tasks and establish a hybrid prediction based coding framework. On the one hand, hierarchical knowledge is projected to the image plane to generate reference for I frames to achieve better prediction performance. On the other hand, we develop a transform based prediction for P/B frames to reduce the computational complexity while improve the coding efficiency. Experimental results demonstrate the effectiveness of our proposed method.

  相似文献   

7.
Full-frame video stabilization with motion inpainting   总被引:1,自引:0,他引:1  
Video stabilization is an important video enhancement technology which aims at removing annoying shaky motion from videos. We propose a practical and robust approach of video stabilization that produces full-frame stabilized videos with good visual quality. While most previous methods end up with producing smaller size stabilized videos, our completion method can produce full-frame videos by naturally filling in missing image parts by locally aligning image data of neighboring frames. To achieve this, motion inpainting is proposed to enforce spatial and temporal consistency of the completion in both static and dynamic image areas. In addition, image quality in the stabilized video is enhanced with a new practical deblurring algorithm. Instead of estimating point spread functions, our method transfers and interpolates sharper image pixels of neighboring frames to increase the sharpness of the frame. The proposed video completion and deblurring methods enabled us to develop a complete video stabilizer which can naturally keep the original image quality in the stabilized videos. The effectiveness of our method is confirmed by extensive experiments over a wide variety of videos.  相似文献   

8.
We present an algorithm that stylizes an input video into a painterly animation without user intervention. In particular, we focus on pointillist animation with stable temporal coherence. Temporal coherence is an important problem in non-photorealistic rendering for videos. To realize pointillist animation, the various characters of pointillism should be considered in painting process to maintain temporal coherence. For this, weused the particle video algorithm which is a new approach to long-range motion estimation in video. Based on this method, we introduce a method to control the density of particles considering the features of frames and importance maps. Finally, the propagation methods of stroke to minimize flickering effects of brush strokes are introduced.  相似文献   

9.
10.
We focus on the recognition of human actions in uncontrolled videos that may contain complex temporal structures. It is a difficult problem because of the large intra-class variations in viewpoint, video length, motion pattern, etc. To address these difficulties, we propose a novel system in this paper that represents each action class by hidden temporal models. In this system, we represent the crucial action event per category by a video segment that covers a fixed number of frames and can move temporally within the sequences. To capture the temporal structures, the video segment is described by a temporal pyramid model. To capture large intra-class variations, multiple models are combined using Or operation to represent alternative structures. The index of model and the start frame of segment are both treated as hidden variables. We implement a learning procedure based on the latent SVM method. The proposed approach is tested on two difficult benchmarks: the Olympic Sports and HMDB51 data sets. The experimental results reveal that our system is comparable to the state-of-the-art methods in the literature.  相似文献   

11.

Videos are tampered by the forgers to modify or remove their content for malicious purpose. Many video authentication algorithms are developed to detect this tampering. At present, very few standard and diversified tampered video dataset is publicly available for reliable verification and authentication of forensic algorithms. In this paper, we propose the development of total 210 videos for Temporal Domain Tampered Video Dataset (TDTVD) using Frame Deletion, Frame Duplication and Frame Insertion. Out of total 210 videos, 120 videos are developed based on Event/Object/Person (EOP) removal or modification and remaining 90 videos are created based on Smart Tampering (ST) or Multiple Tampering. 16 original videos from SULFA and 24 original videos from YouTube (VTD Dataset) are used to develop different tampered videos. EOP based videos include 40 videos for each tampering type of frame deletion, frame insertion and frame duplication. ST based tampered video contains multiple tampering in a single video. Multiple tampering is developed in three categories (1) 10-frames tampered (frame deletion, frame duplication or frame insertion) at 3-different locations (2) 20-frames tampered at 3- different locations and (3) 30-frames tampered at 3-different locations in the video. Proposed TDTVD dataset includes all temporal domain tampering and also includes multiple tampering videos. The resultant tampered videos have video length ranging from 6 s to 18 s with resolution 320X240 or 640X360 pixels. The database is comprised of static and dynamic videos with various activities, like traffic, sports, news, a ball rolling, airport, garden, highways, zoom in zoom out etc. This entire dataset is publicly accessible for researchers, and this will be especially valuable to test their algorithms on this vast dataset. The detailed ground truth information like tampering type, frames tampered, location of tampering is also given for each developed tampered video to support verifying tampering detection algorithms. The dataset is compared with state of the art and validated with two video tampering detection methods.

  相似文献   

12.
马彦博  李琳  陈缘  赵洋  胡锐 《图学学报》2022,43(4):651-658
为了减少视频的存储和传输开销,通常对视频进行有损压缩处理以减小体积,往往会在视频中引入各类不自然效应,造成主观质量的严重下降。基于单帧的压缩图像复原方法仅利用当前帧有限的空间信息,效果有限。而现有的多帧方法则大多采用帧间对齐或时序结构来利用相邻帧信息以加强重建,但在对齐性能上仍有较大的提升空间。针对上述问题,提出一种基于多帧时空融合的压缩视频复原方法,通过设计的深度特征提取块和自适应对齐网络实现更优的对齐融合,充分地利用多帧时空信息以重建高质量视频。该方法在公开测试集上(HEVC HM16.5 低延时 P 配置)优于所有对比方法,并在客观指标上(峰值信噪比 PSNR)相比于目前最先进的方法 STDF 取得了平均 0.13 dB 的提升。同时,在主观比较上,该方法也取得了领先的效果,重建出更干净的画面,实现了良好的压缩不自然效应去除效果。  相似文献   

13.
王寒光  王旭光  汪浩源 《计算机应用》2016,36(10):2849-2853
针对视频拼接过程中面临的许多挑战,如实时性、有动态物体产生鬼影现象等,提出了一种基于圆形感兴趣区域(ROI)图像配准结合简化处理及图形处理器(GPU)加速的方法。首先,仅在ROI内提取特征点,提高了特征检测效率和匹配准确率。其次,为进一步降低时间开销,满足视频处理实时性需求,采用了两种策略:一方面,通过简化处理仅对首帧作图像配准,后续帧利用得到的单应性矩阵进行图像融合;另一方面,利用GPU多核实现并行化硬件加速。此外,当视场中有动态物体时,采用图形分割和多频带图像融合算法,有效地消除了鬼影。实验对两路640×480的视频进行拼接,该方法的处理速度可达27.8帧/秒。相对于基于加速鲁棒特征(SURF)算法的视频拼接方法,效率提高了26.27倍;相对于基于带方向的加速分段测试特征提取结合旋转的二进制鲁棒独立元素特征描述(ORB)算法的视频拼接方法,效率提高了11.57倍。实验结果表明,该方法可将多路视频实时地拼接为高质量的大场景视频。  相似文献   

14.
Hashing is a common solution for content-based multimedia retrieval by encoding high-dimensional feature vectors into short binary codes. Previous works mainly focus on image hashing problem. However, these methods can not be directly used for video hashing, as videos contain not only spatial structure within each frame, but also temporal correlation between successive frames. Several researchers proposed to handle this by encoding the extracted key frames, but these frame-based methods are time-consuming in real applications. Other researchers proposed to characterize the video by averaging the spatial features of frames and then the existing hashing methods can be adopted. Unfortunately, the sort of “video” features does not take the correlation between frames into consideration and may lead to the loss of the temporal information. Therefore, in this paper, we propose a novel unsupervised video hashing framework via deep neural network, which performs video hashing by incorporating the temporal structure as well as the conventional spatial structure. Specially, the spatial features of videos are obtained by utilizing convolutional neural network, and the temporal features are established via long-short term memory. After that, the time series pooling strategy is employed to obtain the single feature vector for each video. The obtained spatio-temporal feature can be applied to many existing unsupervised hashing methods. Experimental results on two real datasets indicate that by employing the spatio-temporal features, our hashing method significantly improves the performance of existing methods which only deploy the spatial features, and meanwhile obtains higher mean average precision compared with the state-of-the-art video hashing methods.  相似文献   

15.
庄燕滨  桂源  肖贤建 《计算机应用》2013,33(9):2577-2579
为了解决传统视频压缩传感方法中对视频逐帧单独重构所产生的图像模糊,将压缩传感理论与MPEG标准视频编码的相关技术相结合,提出了一种基于运动估计与运动补偿的视频压缩传感方法,以消除视频信号在空域和时域上的冗余。该方法在充分考虑视频序列时域相关性的同时,首先对视频图像进行前、后向和双向预测和补偿,然后采用回溯自适应正交匹配追踪(BAOMP)算法,对运动预测残差进行重构,最后实现当前帧的重构。实验结果表明,该方法较逐帧重构的视频图像质量有较大改善,且可获得更高的峰值信噪比。  相似文献   

16.
Despite considerable advances in natural image matting over the last decades, video matting still remains a difficult problem. The main challenges faced by existing methods are the large amount of user input required, and temporal inconsistencies in mattes between pairs of adjacent frames. We present a temporally‐coherent matte‐propagation method for videos based on PatchMatch and edge‐aware filtering. Given an input video and trimaps for a few frames, including the first and last, our approach generates alpha mattes for all frames of the video sequence. We also present a user scribble‐based interface for video matting that takes advantage of the efficiency of our method to interactively refine the matte results. We demonstrate the effectiveness of our approach by using it to generate temporally‐coherent mattes for several natural video sequences. We perform quantitative comparisons against the state‐of‐the‐art sparse‐input video matting techniques and show that our method produces significantly better results according to three different metrics. We also perform qualitative comparisons against the state‐of‐the‐art dense‐input video matting techniques and show that our approach produces similar quality results while requiring only about 7% of the amount of user input required by such techniques. These results show that our method is both effective and user‐friendly, outperforming state‐of‐the‐art solutions.  相似文献   

17.
Video remains the method of choice for capturing temporal events. However, without access to the underlying 3D scene models, it remains difficult to make object level edits in a single video or across multiple videos. While it may be possible to explicitly reconstruct the 3D geometries to facilitate these edits, such a workflow is cumbersome, expensive, and tedious. In this work, we present a much simpler workflow to create plausible editing and mixing of raw video footage using only sparse structure points (SSP) directly recovered from the raw sequences. First, we utilize user‐scribbles to structure the point representations obtained using structure‐from‐motion on the input videos. The resultant structure points, even when noisy and sparse, are then used to enable various video edits in 3D, including view perturbation, keyframe animation, object duplication and transfer across videos, etc. Specifically, we describe how to synthesize object images from new views adopting a novel image‐based rendering technique using the SSPs as proxy for the missing 3D scene information. We propose a structure‐preserving image warping on multiple input frames adaptively selected from object video, followed by a spatio‐temporally coherent image stitching to compose the final object image. Simple planar shadows and depth maps are synthesized for objects to generate plausible video sequence mimicking real‐world interactions. We demonstrate our system on a variety of input videos to produce complex edits, which are otherwise difficult to achieve.  相似文献   

18.
In some warning applications, such as aircraft taking-off and landing, ship sailing, and traffic guidance in foggy weather, the high definition (HD) and rapid dehazing of images and videos is increasingly necessary. Existing technologies for the dehazing of videos or images have not completely exploited the parallel computing capacity of modern multi-core CPU and GPU, and leads to the long dehazing time or the low frame rate of video dehazing which cannot meet the real-time requirement. In this paper, we propose a parallel implementation and optimization method for the real-time dehazing of the high definition videos based on a single image haze removal algorithm. Our optimization takes full advantage of the modern CPU+GPU architecture, which increases the parallelism of the algorithm, and greatly reduces the computational complexity and the execution time. The optimized OpenCL parallel implementation is integrate into FFmpeg as an independent module. The experimental results show that for a single image, the performance of the optimized OpenCL algorithm is improved approximately 500% compared with the existing algorithm, and approximately 153% over the basic OpenCL algorithm. The 1080p (1920?×?1080) high definition hazy video can also processed at a real-time rate (more than 41 frames per second).  相似文献   

19.
目的 为了提升高效视频编码(HEVC)的编码效率,使之满足高分辨率、高帧率视频实时编码传输的需求。由分析可知帧内编码单元(CU)的划分对HEVC的编码效率有决定性的影响,通过提高HEVC的CU划分效率,可以大大提升HEVC编码的实时性。方法 通过对视频数据分析发现,视频数据具有较强的时间、空间相关性,帧内CU的划分结果也同样具有较强的时间和空间相关性,可以利用前一帧以及当前帧CU的划分结果进行预判以提升帧内CU划分的效率。据此,本文给出一种帧内CU快速划分算法,先根据视频相邻帧数据的时间相关性和帧内数据空间相关性初步确定当前编码块的编码树单元(CTU)形状,再利用前一帧同位CTU平均深度、当前帧已编码CTU深度以及对应的率失真代价值决定当前编码块CTU的最终形状。算法每间隔指定帧数设置一刷新帧,该帧采用HM16.7模型标准CU划分以避免快速CU划分算法带来的误差累积影响。结果 利用本文算法对不同分辨率、不同帧率的视频进行测试,与HEVC的参考模型HM16.7相比,本文算法在视频编码质量基本不变,视频码率稍有增加的情况下平均可以节省约40%的编码时间,且高分辨率高帧率的视频码率增加幅度普遍小于低分辨率低帧率的视频码率。结论 本文算法在HEVC的框架内,利用视频数据的时间和空间相关性,通过优化帧内CU划分方法,对提升HEVC编码,特别是提高高分辨率高帧率视频HEVC编码的实时性具有重要作用。  相似文献   

20.
基于深度学习的视频超分辨率重构方法常面临重构精度不高或重构时间过长的问题,难以实时获得高精度的重构结果.针对此问题,文中提出基于深度残差网络的视频超分辨率重构方法,可以快速地对视频进行高精度重构,并在较小分辨率视频的重构过程中达到实时重构的要求.自适应关键帧判别子网自适应地从视频帧中判别关键帧,关键帧经过高精度关键帧重构子网进行重构.对于非关键帧,将其特征与邻近关键帧间的运动估计特征和邻近关键帧的特征逐层融合,直接获得非关键帧的特征,从而快速获得非关键帧的重构结果.在公开数据集上的实验表明,文中方法能实现对视频的快速、高精度重构,鲁棒性较好.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号