首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
在视觉分析中,人的同一动作在不同场景下会有截然不同的理解.为了判断在不同场景中行为是否为异常,在监控系统中使用双层词包模型来解决这个问题.把视频信息放在第1层包中,把场景动作文本词放在第2层包中.视频由一系列时空兴趣点组成的时空词典表示,动作性质由在指定场景下的动作文本词集合来确定.使用潜在语义分析概率模型(pLSA)不但能自动学习时空词的概率分布,找到与之对应的动作类别,也能在监督情况下学习在规定场景下运动文本词概率分布并区分出对应异常或正常行动结果.经过训练学习后,该算法可以识别新视频在相应场景下行为的异常或正常.  相似文献   

2.
This paper discusses the task of continuous human action recognition. By continuous, it refers to videos that contain multiple actions which are connected together. This task is important to applications like video surveillance and content based video retrieval. It aims to identify the action category and detect the start and end key frame of each action. It is a challenging task due to the frequent changes of human actions and the ambiguity of action boundaries. In this paper, a novel and efficient continuous action recognition framework is proposed. Our approach is based on the bag of words representation. A visual local pattern is regarded as a word and the action is modeled by the distribution of words. A generative translation and scale invariant probabilistic Latent Semantic Analysis model is presented. The continuous action recognition result is obtained frame by frame and updated from time to time. Experimental results show that this approach is effective and efficient to recognize both isolated actions and continuous actions.  相似文献   

3.
传统基于计算机视觉特征的人体运动分析和动作评分技术对局部人体运动特征判别性不强,导致对相似人体动作的类内差异不敏感,自动评分准确率低。提出一种局部时空保持的单目运动视频人体动作特征Fisher矢量(FV)编码方法和自动评分技术。首先提取梯度方向直方图(HOG)和光流直方图(HOF)描述运动视频中人体动作姿态和运动特征,实施?2归一化和基于主成分分析的数据降维后获得具有判别性的人体动作特征矢量;然后利用时空金字塔方法在FV编码中嵌入时空特征,提高对动作正确性和协调性的判别能力;最后通过建立不同动作分类的线性模型确定动作评分。在健美操动作自动评分数据集上的实验表明,文中算法的敏感性和特异性约为94.4%和71.4%,与专家评分的中位数平均误差为7.0%,适用于在线体育教学和普通运动训练中基于单目运动视频的动作完成质量评价。  相似文献   

4.
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos   总被引:1,自引:0,他引:1  
Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.  相似文献   

5.
时序行为检测是指在一段未分割的长视频中,检测出其中包含的若干行为片段的起止时间和类别.针对该项任务,提出基于双流卷积神经网络的行为检测模型.首先使用双流卷积神经网络提取视频的特征序列,然后使用TAG (Temporal Actionness Grouping)生成行为提议,为了构建高质量的行为提议,将行为提议送入边界回归网络中修正边界,使之更为贴近真实数据,再将行为提议扩展为含有上下文信息的三段式特征设计,最后使用多层感知机对行为进行识别.实验结果表明,本算法在THUMOS 2014数据集和ActivityNet v1.3数据集获得较好的识别率.  相似文献   

6.
Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints, large interclass variations, cluttered background, occlusions, bad illumination conditions, and poor quality of web videos cause the majority of the state-of-the-art action recognition approaches to fail. Also, an increased number of categories and the inclusion of actions with high confusion add to the challenges. In this paper, we propose using the scene context information obtained from moving and stationary pixels in the key frames, in conjunction with motion features, to solve the action recognition problem on a large (50 actions) dataset with videos from the web. We perform a combination of early and late fusion on multiple features to handle the very large number of categories. We demonstrate that scene context is a very important feature to perform action recognition on very large datasets. The proposed method does not require any kind of video stabilization, person detection, or tracking and pruning of features. Our approach gives good performance on a large number of action categories; it has been tested on the UCF50 dataset with 50 action categories, which is an extension of the UCF YouTube Action (UCF11) dataset containing 11 action categories. We also tested our approach on the KTH and HMDB51 datasets for comparison.  相似文献   

7.
Sun  Yanjing  Huang  Han  Yun  Xiao  Yang  Bin  Dong  Kaiwen 《Applied Intelligence》2022,52(1):113-126

Skeleton-based action recognition has recently attracted widespread attention in the field of computer vision. Previous studies on skeleton-based action recognition are susceptible to interferences from redundant video frames in judging complex actions but ignore the fact that the spatial-temporal features of different actions are extremely different. To solve these problems, we propose a triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition (AM-GCN), which can not only capture the multiple spacetime-semantic feature from the video images to avoid limited information diversity from single-layer feature representation but can also improve the generalization ability of the network. We also present the triplet attention mechanism to apply an attention mechanism to different key points, key channels, and key frames of the actions, improving the accuracy and interpretability of the judgement of complex actions. In addition, different kinds of spacetime-semantic feature information are combined through the proposed fusion decision for comprehensive prediction in order to improve the robustness of the algorithm. We validate AM-GCN with two standard datasets, NTU-RGBD and Kinetics, and compare it with other mainstream models. The results show that the proposed model achieves tremendous improvement.

  相似文献   

8.
Video recordings of earthmoving construction operations provide understandable data that can be used for benchmarking and analyzing their performance. These recordings further support project managers to take corrective actions on performance deviations and in turn improve operational efficiency. Despite these benefits, manual stopwatch studies of previously recorded videos can be labor-intensive, may suffer from biases of the observers, and are impractical after substantial period of observations. This paper presents a new computer vision based algorithm for recognizing single actions of earthmoving construction equipment. This is particularly a challenging task as equipment can be partially occluded in site video streams and usually come in wide variety of sizes and appearances. The scale and pose of the equipment actions can also significantly vary based on the camera configurations. In the proposed method, a video is initially represented as a collection of spatio-temporal visual features by extracting space–time interest points and describing each feature with a Histogram of Oriented Gradients (HOG). The algorithm automatically learns the distributions of the spatio-temporal features and action categories using a multi-class Support Vector Machine (SVM) classifier. This strategy handles noisy feature points arisen from typical dynamic backgrounds. Given a video sequence captured from a fixed camera, the multi-class SVM classifier recognizes and localizes equipment actions. For the purpose of evaluation, a new video dataset is introduced which contains 859 sequences from excavator and truck actions. This dataset contains large variations of equipment pose and scale, and has varied backgrounds and levels of occlusion. The experimental results with average accuracies of 86.33% and 98.33% show that our supervised method outperforms previous algorithms for excavator and truck action recognition. The results hold the promise for applicability of the proposed method for construction activity analysis.  相似文献   

9.
This paper aims to address the problem of modeling human behavior patterns captured in surveillance videos for the application of online normal behavior recognition and anomaly detection. A novel framework is developed for automatic behavior modeling and online anomaly detection without the need for manual labeling of the training data set. The framework consists of the following key components. 1) A compact and effective behavior representation method is developed based on spatial-temporal interest point detection. 2) The natural grouping of behavior patterns is determined through a novel clustering algorithm, topic hidden Markov model (THMM) built upon the existing hidden Markov model (HMM) and latent Dirichlet allocation (LDA), which overcomes the current limitations in accuracy, robustness, and computational efficiency. The new model is a four-level hierarchical Bayesian model, in which each video is modeled as a Markov chain of behavior patterns where each behavior pattern is a distribution over some segments of the video. Each of these segments in the video can be modeled as a mixture of actions where each action is a distribution over spatial-temporal words. 3) An online anomaly measure is introduced to detect abnormal behavior, whereas normal behavior is recognized by runtime accumulative visual evidence using the likelihood ratio test (LRT) method. Experimental results demonstrate the effectiveness and robustness of our approach using noisy and sparse data sets collected from a real surveillance scenario.  相似文献   

10.
Online social video websites such as YouTube allow users to manually annotate their video documents with textual labels. These labels can be used as indexing keywords to facilitate search and organization of video data. However, manual video annotation is usually a labor-intensive and time-consuming process. In this work, we propose a novel social video annotation approach that combines multiple feature sets based on a tri-adaptation approach. For the shots in each video, they are annotated by aggregating models that are learned from three complementary feature sets. Meanwhile, the models are collaboratively adapted by exploring unlabeled shots. In this sense, the method can be viewed as a novel semi-supervised algorithm that explores three complementary views. Our approach also exploits the temporal smoothness of video labels by applying a label correction strategy. Experiments on a web video dataset demonstrate the effectiveness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号