首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对公共场合人群异常行为检测准确率不高和训练样本缺乏的问题,提出一种基于深度时空卷积神经网络的人群异常行为检测和定位的方法。首先针对监控视频中人群行为的特点,综合利用静态图像的空间特征和前后帧的时间特征,将二维卷积扩展到三维空间,设计面向人群异常行为检测和定位的深度时空卷积神经网络;为了定位人群异常行为,将视频分成若干子区域,获取视频的子区域时空数据样本,然后将数据样本输入设计的深度时空卷积神经网络进行训练和分类,实现人群异常行为的检测与定位。同时,为了解决深度时空卷积神经网络训练时样本数量不足的问题,设计一种迁移学习的方法,利用样本数量多的数据集预训练网络,然后在待测试的数据集中进行微调和优化网络模型。实验结果表明,该方法在UCSD和subway公开数据集上的检测准确率分别达到了99%和93%以上。  相似文献   

2.
This paper presents an approach for detecting suspicious events in videos by using only the video itself as the training samples for valid behaviors. These salient events are obtained in real-time by detecting anomalous spatio-temporal regions in a densely sampled video. The method codes a video as a compact set of spatio-temporal volumes, while considering the uncertainty in the codebook construction. The spatio-temporal compositions of video volumes are modeled using a probabilistic framework, which calculates their likelihood of being normal in the video. This approach can be considered as an extension of the Bag of Video words (BOV) approaches, which represent a video as an order-less distribution of video volumes. The proposed method imposes spatial and temporal constraints on the video volumes so that an inference mechanism can estimate the probability density functions of their arrangements. Anomalous events are assumed to be video arrangements with very low frequency of occurrence. The algorithm is very fast and does not employ background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. Experiments were performed on four video datasets of abnormal activities in both crowded and non-crowded scenes and under difficult illumination conditions. The proposed method outperformed all other approaches based on BOV that do not account for contextual information.  相似文献   

3.
视频中异常事件所体现的时空特征存在着较强的相关关系针对视频异常事件发生的时空特征相关性而影响检测性能问题,提出了基于时空融合图网络学习的视频异常事件检测方法,该方法针对视频片段的特征分别构建空间相似图和时间连续图,将各片段对应为图中的节点,考虑各节点特征与其他节点特征的Top-k相似性动态形成边的权重,构成空间相似图;...  相似文献   

4.
近年来,深度学习在人工智能领域表现出优异的性能。基于深度学习的人脸生成和操纵技术已经能够合成逼真的伪造人脸视频,也被称作深度伪造,让人眼难辨真假。然而,这些伪造人脸视频可能会给社会带来巨大的潜在威胁,比如被用来制作政治虚假新闻,从而引发政治暴力或干扰正常选举等。因此,亟需研发对应的检测方法来主动发现伪造人脸视频。现有的方法在制作伪造人脸视频时,容易在空间上和时序上留下一些细微的伪造痕迹,比如纹理和颜色上的扭曲或脸部的闪烁等。主流的检测方法同样采用深度学习,可以被划分为两类,即基于视频帧的方法和基于视频片段的方法。前者采用卷积神经网络(Convolutional Neural Network,CNN)发现单个视频帧中的空间伪造痕迹,后者则结合循环神经网络(Recurrent Neural Network,RNN)捕捉视频帧之间的时序伪造痕迹。这些方法都是基于图像的全局信息进行决策,然而伪造痕迹一般存在于五官的局部区域。因而本文提出了一个统一的伪造人脸视频检测框架,利用全局时序特征和局部空间特征发现伪造人脸视频。该框架由图像特征提取模块、全局时序特征分类模块和局部空间特征分类模块组成。在FaceForensics++数据集上的实验结果表明,本文所提出的方法比之前的方法具有更好的检测效果。  相似文献   

5.
一种整体的视频匹配方法   总被引:1,自引:0,他引:1  
柴登峰  彭群生 《软件学报》2006,17(9):1899-1907
给出一种视频时空配准的整体方法,提出一种视频内匹配与视频间匹配相结合的空间配准策略,改进动态时间扭曲方法以用于时间维的对齐.视频内匹配跟踪视频内各帧图像的特征点并记录其轨迹,视频间匹配配准不同视频的帧图像,使用轨迹对应提供图像配准所需的初始特征点对应,根据图像配准得到的特征点对应建立和更新轨迹对应.该匹配策略充分利用了视频的连贯性提高了匹配的稳定性和效率,同时提高了配准视频的连贯性.改进的动态时间扭曲方法通过极小化两段视频的整体距离建立视频之间的帧对应关系,保持视频内部各帧之间的时序关系并能处理非线性偏移  相似文献   

6.
在视频监控领域聚众等异常事件检测有着广泛的应用前景,然而相关研究在国内发展还比较缓慢。在这里给出了基于隐马尔科夫模型的聚众事件的检测方法,其简单过程如下:首先在高斯混合模型检测出目标的基础上,针对聚众事件视频序列的特性,完成了关于帧图像二元组的特征提取;然后,在合理选择初始模型的前提下使用Baum-Welch算法训练聚众事件的隐马尔科夫模型;最后通过实拍的视频序列验证其有效性。  相似文献   

7.
近年来,视频换脸技术发展迅速。该技术可被用于伪造视频来影响政治行动和获得不当利益,从而给社会带来严重危害,目前已经引起了各国政府和舆论的广泛关注。本文通过分析现有的主流视频换脸生成技术和检测技术,指出当前主流的生成方法在时域和空域中均具有伪造痕迹和生成损失。而当前基于神经网络检测合成人脸视频的算法大部分方法只考虑了空域的单幅图像特征,并且在实际检测中有明显的过拟合问题。针对目前检测方法的不足,本文提出一种高效的基于时空域结合的检测算法。该方法同时对视频换脸生成结果在空域与时域中的伪造痕迹进行捕捉,其中,针对单帧的空域特征设计了全卷积网络模块,该模块采用3D卷积结构,能够精确地提取视频帧阵列中每帧的伪造痕迹;针对帧阵列的时域特征设计了卷积长短时记忆网络模块,该模块能够检测伪造视频帧之间的时序伪造痕迹;最后,根据特征分类设计特征网络金字塔网络结构,该结构能够融合不同尺寸的时空域特征,通过多尺度融合来提高分类效果,并减少过拟合现象。与现有方法相比,该方法在训练中的收敛效果和分类效果方面有明显优势。除此之外,我们在保证检测准确率的前提下采用较少的参数,相比现有结构而言训练效率更高。  相似文献   

8.
视频隐写分析技术可以侦测出含有隐藏秘密信息的视频,为社会安全提供保障。视频除了含有图像内的空间信息,还蕴含着相邻帧图像之间的时间信息。针对这一特点,提出一种精细化辨识时空特征的视频隐写分析方法。该方法对视频在时间和空间维度的特征量进行精细化建模。采用Marcov对视频空间层次上的块内和块间过程进行建模,以提取空间特征量。利用差值分析视频时间层次上的变化,以提取时间特征量,并将时间和空间特征量输入到SVM模型中进行训练和检测。实际测试结果表明,该方法能够有效区分载密视频和非载密视频,对3 100段测试视频样本的检测准确率高达97.13%。  相似文献   

9.
The need for early detection of temporal events from sequential data arises in a wide spectrum of applications ranging from human-robot interaction to video security. While temporal event detection has been extensively studied, early detection is a relatively unexplored problem. This paper proposes a maximum-margin framework for training temporal event detectors to recognize partial events, enabling early detection. Our method is based on Structured Output SVM, but extends it to accommodate sequential data. Experiments on datasets of varying complexity, for detecting facial expressions, hand gestures, and human activities, demonstrate the benefits of our approach.  相似文献   

10.
罗凡波  王平  梁思源  徐桂菲  王伟 《计算机工程》2020,46(4):287-293,300
目前公共场所人群异常行为检测的异常种类检测准确率较低,且多数对突然奔跑等部分异常行为无法识别.为此,提出一种基于YOLO_v3与稀疏光流的人群异常行为识别算法,通过检测小团体异常为群体异常预警与采取相应的应急措施提供充足的时间.为方便定位异常发生区域,将视频分割为多个子区域,通过获取子区域的图像样本进行诱发群体异常的小团体异常检测,利用改进YOLO_v3神经网络对传统算法较难检测行人持棍、持枪、持刀与面部遮挡等异常进行检测,在未检测到上述异常诱因时,使用稀疏光流法获取人群平均动能与运动方向熵,将得到的特征数据通过PSO-ELM进行分类,区分正常行为与同向突散或无规则突散.实验结果表明,与现有同类算法相比,该算法能有效检测行人持械与面部遮挡等小团体异常,并且定位异常发生区域的准确率达到98.227%.  相似文献   

11.
12.
In this paper, a convolutional neural networks (CNN) and optical flow based method is proposed for prediction of visual attention in the videos. First, a deep-learning framework is employed to extract spatial features in frames to replace those commonly used handcrafted features. The optical flow is calculated to obtain the temporal feature of the moving objects in video frames, which always draw audiences’ attentions. By integrating these two groups of features, a hybrid spatial temporal feature set is obtained and taken as the input of a support vector machine (SVM) to predict the degree of visual attention. Finally, two publicly available video datasets were used to test the performance of the proposed model, where the results have demonstrated the efficacy of the proposed approach.  相似文献   

13.

Video anomaly detection automatically recognizes abnormal events in surveillance videos. Existing works have made advances in recognizing whether a video contains abnormal events; however, they cannot temporally localize the abnormal events within videos. This paper presents a novel anomaly attention-based framework for accurately temporally localize the abnormal events. Benefiting from the proposed framework, we can achieve frame-level VAD using video-level labels, which significantly reduces the burden of data annotation. Our method is an end-to-end deep neural network-based approach, which contains three modules: anomaly attention module (AAM), discriminative anomaly attention module (DAAM) and generative anomaly attention module (GAAM). Specifically, AAM is trained to generate the anomaly attention, which is used to measure the abnormal degree of each frame. Whereas, DAAM and GAAM are used to alternately augmenting AAM from two different aspects. On the one hand, DAAM enhancing AAM by optimizing the video-level video classification. On the other hand, GAAM adopts a conditional variational autoencoder to model the likelihood of each frame given the attention for refining AAM. As a result, AAM can generate higher anomaly scores for abnormal frames while lower anomaly scores for normal frames. Experimental results show that our proposed approach outperforms state-of-the-art methods, which validates the superiority of our AAVAD.

  相似文献   

14.
视频技术的广泛应用带来海量的视频数据,仅依靠人力对监控视频中的异常进行检测是不太可能的。异常行为的自动化检测在公共安全等领域的地位极其重要。提出一种综合考虑目标特性和时空上下文的异常检测方法,该方法利用光流纹理图描述移动物体的刚性特征,建立基于隐马尔可夫模型HMM的时间上下文异常检测模型。在此基础上,提取异常目标的Radon特征,以支持向量机SVM的异常预分类结果为基础,通过HMM建立异常场景的空间上下文分类模型。该模型在公共数据集UCSD PED2上进行了实验验证,结果表明,本算法不仅在异常检测方面优于已有算法,而且还能给出异常分类。  相似文献   

15.
With the popularity of multimedia editing tools, more and more forged multimedia content appeared on the network. Thus, the legal authorities need novel techniques to distinguish copyright infringements from a large number of videos on the Internet. Since logo removal is a common editing operation during unauthorized reproduction, logo removal detection is often equivalent to copyright infringements detection to some extent. In this paper, we proposed a video forensics framework for logo removal detection. Our framework mainly contains two stages: the removal traces detection and the removal region location. In the first stage, we use sparse representation to show the difference between the tampered areas and the original areas in sparsity. In the second stage, spatial priors and temporal correlations are used to refine the location of the removal regions. Finally, a spatiotemporal suspected region can obviously show the edited regions. The proposed method is validated on our video logo removal dataset by extensive experiments, showing promising results.  相似文献   

16.
Video capture is limited by the trade‐off between spatial and temporal resolution: when capturing videos of high temporal resolution, the spatial resolution decreases due to bandwidth limitations in the capture system. Achieving both high spatial and temporal resolution is only possible with highly specialized and very expensive hardware, and even then the same basic trade‐off remains. The recent introduction of compressive sensing and sparse reconstruction techniques allows for the capture of single‐shot high‐speed video, by coding the temporal information in a single frame, and then reconstructing the full video sequence from this single‐coded image and a trained dictionary of image patches. In this paper, we first analyse this approach, and find insights that help improve the quality of the reconstructed videos. We then introduce a novel technique, based on convolutional sparse coding (CSC), and show how it outperforms the state‐of‐the‐art, patch‐based approach in terms of flexibility and efficiency, due to the convolutional nature of its filter banks. The key idea for CSC high‐speed video acquisition is extending the basic formulation by imposing an additional constraint in the temporal dimension, which enforces sparsity of the first‐order derivatives over time.  相似文献   

17.
Abnormal crowd behavior detection is an important research issue in computer vision. The traditional methods first extract the local spatio-temporal cuboid from video. Then the cuboid is described by optical flow or gradient features, etc. Unfortunately, because of the complex environmental conditions, such as severe occlusion, over-crowding, etc., the existing algorithms cannot be efficiently applied. In this paper, we derive the high-frequency and spatio-temporal (HFST) features to detect the abnormal crowd behaviors in videos. They are obtained by applying the wavelet transform to the plane in the cuboid which is parallel to the time direction. The high-frequency information characterize the dynamic properties of the cuboid. The HFST features are applied to the both global and local abnormal crowd behavior detection. For the global abnormal crowd behavior detection, Latent Dirichlet allocation is used to model the normal scenes. For the local abnormal crowd behavior detection, Multiple Hidden Markov Models, with an competitive mechanism, is employed to model the normal scenes. The comprehensive experiment results show that the speed of detection has been greatly improved using our approach. Moreover, a good accuracy has been achieved considering the false positive and false negative detection rates.  相似文献   

18.
对固定镜头下视频序列中运动人体的检测和跟踪方法进行研究,利用灰度图像差分双向投影信息检测人体目标,提出一种基于统计运动区域几何特征固定比例的分割算法,使用最近邻匹配方法对人体进行跟踪。完整地实现了一个有效的实时人群计数系统。大量室内和室外场景实验结果表明,该算法具有很好的实时性(每秒处理25帧~30帧且可并行处理4路视频)、对光照变化的鲁棒性以及对稀疏人群检测精度高等特点。  相似文献   

19.
20.
Tracking pedestrians is a vital component of many computer vision applications, including surveillance, scene understanding, and behavior analysis. Videos of crowded scenes present significant challenges to tracking due to the large number of pedestrians and the frequent partial occlusions that they produce. The movement of each pedestrian, however, contributes to the overall crowd motion (i.e., the collective motions of the scene's constituents over the entire video) that exhibits an underlying spatially and temporally varying structured pattern. In this paper, we present a novel Bayesian framework for tracking pedestrians in videos of crowded scenes using a space-time model of the crowd motion. We represent the crowd motion with a collection of hidden Markov models trained on local spatio-temporal motion patterns, i.e., the motion patterns exhibited by pedestrians as they move through local space-time regions of the video. Using this unique representation, we predict the next local spatio-temporal motion pattern a tracked pedestrian will exhibit based on the observed frames of the video. We then use this prediction as a prior for tracking the movement of an individual in videos of extremely crowded scenes. We show that our approach of leveraging the crowd motion enables tracking in videos of complex scenes that present unique difficulty to other approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号