Abstract: | A multi-level attention framework for tracking and segmentation of humans and objects under complex occlusions is investigated,
featuring an effective probabilistic appearance-based technique for pixel reclassification during object grouping and splitting.
A novel ’spatial-depth affinity metric’ is introduced in the conventional likelihood function, utilising information of both
spatial locations of pixels and dynamic depth ordering of the component objects in grouping. Depth ordering estimation is
achieved through a combination of top-down and bottom-up approach. Experiments on some realworld difficult scenarios of low
quality and highly compressed videos demonstrate the very promising results achieved. |