Joint face and head tracking inside multi-camera smart rooms |
| |
Authors: | Zhenqiu Zhang Gerasimos Potamianos Andrew W Senior Thomas S Huang |
| |
Affiliation: | (1) Beckman Institute, University of Illinois, Urbana, IL 61801, USA;(2) IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA |
| |
Abstract: | The paper introduces a novel detection and tracking system that provides both frame-view and world-coordinate human location
information, based on video from multiple synchronized and calibrated cameras with overlapping fields of view. The system
is developed and evaluated for the specific scenario of a seminar lecturer presenting in front of an audience inside a “smart
room”, its aim being to track the lecturer’s head centroid in the three-dimensional (3D) space and also yield two-dimensional
(2D) face information in the available camera views. The proposed approach is primarily based on a statistical appearance
model of human faces by means of well-known AdaBoost-like face detectors, extended to address the head pose variation observed
in the smart room scenario of interest. The appearance module is complemented by two novel components and assisted by a simple
tracking drift detection mechanism. The first component of interest is the initialization module, which employs a spatio-temporal
dynamic programming approach with appropriate penalty functions to obtain optimal 3D location hypotheses. The second is an
adaptive subspace learning based 2D tracking scheme with a novel forgetting mechanism, introduced to reduce tracking drift
and increase robustness. System performance is benchmarked on an extensive database of realistic human interaction in the
lecture smart room scenario, collected as part of the European integrated project “CHIL”. The system consistently achieves
excellent tracking precision, with a 3D mean tracking error of less than 16 cm, and is demonstrated to outperform four alternative
tracking schemes. Furthermore, the proposed system performs relatively well in detecting frontal and near-frontal faces in
the available frame views.
This work was performed while Zhenqiu Zhang was on a summer internship with the Human Language Technology Department at the
IBM T.J. Watson Research Center. |
| |
Keywords: | Person tracking Face detection Multi-camera tracking Dynamic programming Adaptive subspace tracking Mean-shift tracking AdaBoost Lecture data Smart rooms |
本文献已被 SpringerLink 等数据库收录! |
|