Real-Time Automated Video and Audio Capture with Multiple Cameras and Microphones期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Real-Time Automated Video and Audio Capture with Multiple Cameras and Microphones

Authors:	Ce Wang Scott Griebel Michael Brandstein and Bo-June Hsu

Affiliation:	(1) Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA;(2) Microsoft Corporation, Redmond, WA 98052, USA

Abstract:	This work presents the acoustic and visual-based tracking system functioning at the Harvard Intelligent Multi-Media Environments Laboratory (HIMMEL). The environment is populated with a number of microphones and steerable video cameras. Acoustic source localization, video-based face tracking and pose estimation, and multi-channel speech enhancement methods are applied in combination to detect and track individuals in a practical environment while also providing an improved audio signal to accompany the video stream. The video portion of the system tracks talkers by utilizing source motion, contour geometry, color data, and simple facial features. Decisions involving which camera to use are based on an estimate of the head's gazing angle. This head pose estimation is achieved using a very general head model which employs hairline features and a learned network classification procedure. Finally, a beamforming and postfiltering microphone array technique is used to create an enhanced speech waveform to accompany the recorded video signal. The system presented in this paper is robust to both visual clutter (e.g. ovals in the scene of interest which are not faces) and audible noise (e.g. reverberations and background noise).

Keywords:	person tracking talker tracking acoustic localization head pose estimation speech enhancement audio enhancement video conferencing
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏