Abstract: | This paper describes a probabilistic framework for simultaneously performing object tracking and event detection in monocular videos. Mathematically, we cast the problem of jointly tracking and detecting semantic events as a principled model-based search problem in a multi-dimensional state space, where the tracking trajectory and event type are discovered via maximum a posteriori (MAP) optimization. The benefit of this approach comes from its combined utilization of particle probabilistic representation, multiple hypothesis retention, efficient particle propagation, and temporal optimization. We present qualitative and quantitative results from realistic video sequences to demonstrate the effectiveness of this approach. |