首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a novel data-driven modeling framework to construct agent-based crowd model based on real-world video data. The constructed crowd model can generate crowd behaviors that match those observed in the video and can be used to predict trajectories of pedestrians in the same scenario. In the proposed framework, a dual-layer architecture is proposed to model crowd behaviors. The bottom layer models the microscopic collision avoidance behaviors, while the top layer models the macroscopic crowd behaviors such as the goal selection patterns and the path navigation patterns. An automatic learning algorithm is proposed to learn behavior patterns from video data. The learned behavior patterns are then integrated into the dual-layer architecture to generate realistic crowd behaviors. To validate its effectiveness, the proposed framework is applied to two different real world scenarios. The simulation results demonstrate that the proposed framework can generate crowd behaviors similar to those observed in the videos in terms of crowd density distribution. In addition, the proposed framework can also offer promising performance on predicting the trajectories of pedestrians.  相似文献   

2.
3.
This paper addresses learning and recognition of human behavior models from multimodal observation in a smart home environment. The proposed approach is part of a framework for acquiring a high-level contextual model for human behavior in an augmented environment. A 3-D video tracking system creates and tracks entities (persons) in the scene. Further, a speech activity detector analyzes audio streams coming from head set microphones and determines for each entity, whether the entity speaks or not. An ambient sound detector detects noises in the environment. An individual role detector derives basic activity like ldquowalkingrdquo or ldquointeracting with tablerdquo from the extracted entity properties of the 3-D tracker. From the derived multimodal observations, different situations like ldquoaperitifrdquo or ldquopresentationrdquo are learned and detected using statistical models (HMMs). The objective of the proposed general framework is two-fold: the automatic offline analysis of human behavior recordings and the online detection of learned human behavior models. To evaluate the proposed approach, several multimodal recordings showing different situations have been conducted. The obtained results, in particular for offline analysis, are very good, showing that multimodality as well as multiperson observation generation are beneficial for situation recognition.  相似文献   

4.
5.
This paper addresses the problem of fully automated mining of public space video data, a highly desirable capability under contemporary commercial and security considerations. This task is especially challenging due to the complexity of the object behaviors to be profiled, the difficulty of analysis under the visual occlusions and ambiguities common in public space video, and the computational challenge of doing so in real-time. We address these issues by introducing a new dynamic topic model, termed a Markov Clustering Topic Model (MCTM). The MCTM builds on existing dynamic Bayesian network models and Bayesian topic models, and overcomes their drawbacks on sensitivity, robustness and efficiency. Specifically, our model profiles complex dynamic scenes by robustly clustering visual events into activities and these activities into global behaviours with temporal dynamics. A Gibbs sampler is derived for offline learning with unlabeled training data and a new approximation to online Bayesian inference is formulated to enable dynamic scene understanding and behaviour mining in new video data online in real-time. The strength of this model is demonstrated by unsupervised learning of dynamic scene models for four complex and crowded public scenes, and successful mining of behaviors and detection of salient events in each.  相似文献   

6.
This paper presents a method for learning decision theoretic models of human behaviors from video data. Our system learns relationships between the movements of a person, the context in which they are acting, and a utility function. This learning makes explicit that the meaning of a behavior to an observer is contained in its relationship to actions and outcomes. An agent wishing to capitalize on these relationships must learn to distinguish the behaviors according to how they help the agent to maximize utility. The model we use is a partially observable Markov decision process, or POMDP. The video observations are integrated into the POMDP using a dynamic Bayesian network that creates spatial and temporal abstractions amenable to decision making at the high level. The parameters of the model are learned from training data using an a posteriori constrained optimization technique based on the expectation-maximization algorithm. The system automatically discovers classes of behaviors and determines which are important for choosing actions that optimize over the utility of possible outcomes. This type of learning obviates the need for labeled data from expert knowledge about which behaviors are significant and removes bias about what behaviors may be useful to recognize in a particular situation. We show results in three interactions: a single player imitation game, a gestural robotic control problem, and a card game played by two people.  相似文献   

7.
In this paper, we propose an approach for learning appearance models of moving objects directly from compressed video. The appearance of a moving object changes dynamically in video due to varying object poses, lighting conditions, and partial occlusions. Efficiently mining the appearance models of objects is a crucial and challenging technology to support content-based video coding, clustering, indexing, and retrieval at the object level. The proposed approach learns the appearance models of moving objects in the spatial-temporal dimension of video data by taking advantage of the MPEG video compression format. It detects a moving object and recovers the trajectory of each macroblock covered by the object using the motion vector present in the compressed stream. The appearances are then reconstructed in the DCT domain along the object's trajectory, and modeled as a mixture of Gaussians (MoG) using DCT coefficients. We prove that, under certain assumptions, the MoG model learned from the DCT domain can achieve pixel-level accuracy when transformed back to the spatial domain, and has a better band-selectivity compared to the MoG model learned in the spatial domain. We finally cluster the MoG models to merge the appearance models of the same object together for object-level content analysis.  相似文献   

8.
A graphical model for audiovisual object tracking   总被引:3,自引:0,他引:3  
We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.  相似文献   

9.
10.
We present a novel approach for analyzing the quality of multi‐agent crowd simulation algorithms. Our approach is data‐driven, taking as input a set of user‐defined metrics and reference training data, either synthetic or from video footage of real crowds. Given a simulation, we formulate the crowd analysis problem as an anomaly detection problem and exploit state‐of‐the‐art outlier detection algorithms to address it. To that end, we introduce a new framework for the visual analysis of crowd simulations. Our framework allows us to capture potentially erroneous behaviors on a per‐agent basis either by automatically detecting outliers based on individual evaluation metrics or by accounting for multiple evaluation criteria in a principled fashion using Principle Component Analysis and the notion of Pareto Optimality. We discuss optimizations necessary to allow real‐time performance on large datasets and demonstrate the applicability of our framework through the analysis of simulations created by several widely‐used methods, including a simulation from a commercial game.  相似文献   

11.
We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches—abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples.  相似文献   

12.
In this paper, a novel probabilistic topic model is proposed for mining activities from complex video surveillance scenes. In order to handle the temporal nature of the video data, we devise a dynamical causal topic model (DCTM) that can detect the latent topics and causal interactions between them. The model is based on the assumption that all temporal relationships between latent topics at neighboring time steps follow a noisy-OR distribution. And the parameter of the noisy-OR distribution is estimated by a data driven approach based on the idea of nonparametric Granger causality statistic. Furthermore, for convergence analysis during model learning process, the Kullback-Leibler between the prior and the posterior distributions is calculated. At last, using the causality matrix learned by DCTM, the total causal influence of each topic is measured. We evaluate the proposed model through experimentations on several challenging datasets and demonstrate that our model can identify the high influence activity in crowded scenes.  相似文献   

13.
We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.  相似文献   

14.
This paper proposes an end-to-end system to recognize multi-person behaviors in video, unifying different tasks like segmentation, modeling and recognition within a single optical flow based motion analysis framework. We show how optical flow can be used for analyzing activities of individual actors, as opposed to dense crowds, which is what the existing literature has concentrated on mostly. The algorithm consists of two steps — identification of motion patterns and modeling of motion patterns. Activities are analyzed using the underlying motion patterns which are formed by the optical flow field over a period of time. Streaklines are used to capture these motion patterns via integration of the flow field. To recognize the regions of interest, we utilize the Helmholtz decomposition to compute the divergence potential. The extrema or critical points of this potential indicates regions of high activity in the video, which are then represented as motion patterns by clustering the streaklines. We then present a method to compare two videos by measuring the similarity between their motion patterns using a combination of shape theory and subspace analysis. Such an analysis allows us to represent, compare and recognize a wide range of activities. We perform experiments on state-of-the-art datasets and show that the proposed method is suitable for natural videos in the presence of noise, background clutter and high intra class variations. Our method has two significant advantages over recent related approaches — it provides a single framework that takes care of both low-level and high-level visual analysis tasks, and is computationally efficient.  相似文献   

15.
Huge amounts of video are being recorded every day by surveillance systems. Since video is capable of recording and preserving an enormous amount of information which can be used in many applications, it is worth examining the degree of privacy loss that might occur due to public access to the recorded video. A fundamental requirement of privacy solutions is an understanding and analysis of the inference channels than can lead to a breach of privacy. Though inference channels and privacy risks are well studied in traditional data sharing applications (e.g., hospitals sharing patient records for data analysis), privacy assessments of video data have been limited to the direct identifiers such as people’s faces in the video. Other important inference channels such as location (Where), time (When), and activities (What) are generally overlooked. In this paper we propose a privacy loss model that highlights and incorporates identity leakage through multiple inference channels that exist in a video due to what, when, and where information. We model the identity leakage and the sensitive information separately and combine them to calculate the privacy loss. The proposed identity leakage model is able to consolidate the identity leakage through multiple events and multiple cameras. The experimental results are provided to demonstrate the proposed privacy analysis framework.  相似文献   

16.

Human activity recognition is a challenging problem of computer vision and it has different emerging applications. The task of recognizing human activities from video sequence exhibits more challenges because of its highly variable nature and requirement of real time processing of data. This paper proposes a combination of features in a multiresolution framework for human activity recognition. We exploit multiresolution analysis through Daubechies complex wavelet transform (DCxWT). We combine Local binary pattern (LBP) with Zernike moment (ZM) at multiple resolutions of Daubechies complex wavelet decomposition. First, LBP coefficients of DCxWT coefficients of image frames are computed to extract texture features of image, then ZM of these LBP coefficients are computed to extract the shape feature from texture feature for construction of final feature vector. The Multi-class support vector machine classifier is used for classifying the recognized human activities. The proposed method has been tested on various standard publicly available datasets. The experimental results demonstrate that the proposed method works well for multiview human activities as well as performs better than some of the other state-of-the-art methods in terms of different quantitative performance measures.

  相似文献   

17.
Understanding pair-wise activities is an essential step towards studying complex group and crowd behaviors in video. However, such research is often hampered by a lack of datasets that concentrate specifically on Atomic Pair Actions; [Here, we distinguish between the atomic motion of individual objects and the atomic motion of pairs of objects. The term action in Atomic Pair Action means an atomic interaction movement of two objects in video; a pair activity, then, is composed of multiple actions by a pair or multiple pairs of interacting objects ( and ). Please see Section 1 for details.] in addition, the general dearth in computer vision of a standardized, structured approach for reproducing and analyzing the efficacy of different models limits the ability to compare different approaches. In this paper, we introduce the ISI Atomic Pair Actions dataset, a set of 90 videos that concentrate on the Atomic Pair Actions of objects in video, namely converging, diverging, and moving in parallel. We further incorporate a structured, end-to-end analysis methodology, based on workflows, to easily and automatically allow for standardized testing of state-of-the-art models, as well as inter-operability of varied codebases and incorporation of novel models. We demonstrate the efficacy of our structured framework by testing several models on the new dataset. In addition, we make the full dataset (the videos, along with their associated tracks and ground truth, and the exported workflows) publicly available to the research community for free use and extension at <http://research.sethi.org/ricky/datasets/>.  相似文献   

18.
目的 视频精彩片段提取是视频内容标注、基于内容的视频检索等领域的热点研究问题。视频精彩片段提取主要根据视频底层特征进行精彩片段的提取,忽略了用户兴趣对于提取结果的影响,导致提取结果可能与用户期望不相符。另一方面,基于用户兴趣的语义建模需要大量的标注视频训练样本才能获得较为鲁棒的语义分类器,而对于大量训练样本的标注费时费力。考虑到互联网中包含内容丰富且易于获取的图像,将互联网图像中的知识迁移到视频片段的语义模型中可以减少大量的视频数据标注工作。因此,提出利用互联网图像的用户兴趣的视频精彩片段提取框架。方法 利用大量互联网图像对用户兴趣语义进行建模,考虑到从互联网中获取的知识变化多样且有噪声,如果不加选择盲目地使用会影响视频片段提取效果,因此,将图像根据语义近似性进行分组,将语义相似但使用不同关键词检索得到的图像称为近义图像组。在此基础上,提出使用近义语义联合组权重模型权衡,根据图像组与视频的语义相关性为不同图像组分配不同的权重。首先,根据用户兴趣从互联网图像搜索引擎中检索与该兴趣语义相关的图像集,作为用户兴趣精彩片段提取的知识来源;然后,通过对近义语义图像组的联合组权重学习,将图像中习得的知识迁移到视频中;最后,使用图像集中习得的语义模型对待提取片段进行精彩片段提取。结果 本文使用CCV数据库中的视频对本文提出的方法进行验证,同时与多种已有的视频关键帧提取算法进行比较,实验结果显示本文算法的平均准确率达到46.54,较其他算法相比提高了21.6%,同时算法耗时并无增加。此外,为探究优化过程中不同平衡参数对最终结果的影响,进一步验证本文方法的有效性,本文在实验过程中通过移除算法中的正则项来验证每一项对于算法框架的影响。实验结果显示,在移除任何一项后算法的准确率明显降低,这表明本文方法所提出的联合组权重模型对提取用户感兴趣视频片段的有效性。结论 本文提出了一种针对用户兴趣语义的视频精彩片段提取方法,根据用户关注点的不同,为不同用户提取其感兴趣的视频片段。  相似文献   

19.
20.
ContextAs trajectory analysis is widely used in the fields of video surveillance, crowd monitoring, behavioral prediction, and anomaly detection, finding motion patterns is a fundamental task for pedestrian trajectory analysis.ObjectiveIn this paper, we focus on learning dominant motion patterns in unstructured scene.MethodsAs the invisible implicit indicator to scene structure, latent structural information is first defined and learned by clustering source/sink points using CURE algorithm. Considering the basic assumption that most pedestrians would find the similar paths to pass through an unstructured scene if their entry and exit areas are fixed, trajectories are then grouped based on the latent structural information. Finally, the motion patterns are learned for each group, which are characterized by a series of statistical temporal and spatial properties including length, duration and envelopes in polar coordinate space.ResultsExperimental results demonstrate the feasibility and effectiveness of our method, and the learned motion patterns can efficiently describe the statistical spatiotemporal models of the typical pedestrian behaviors in a real scene. Based on the learned motion patterns, abnormal or suspicious trajectories are detected.ConclusionThe performance of our approach shows high spatial accuracy and low computational cost.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号