In this work we propose algorithms to learn the locations of static occlusions and reason about both static and dynamic occlusion
scenarios in multi-camera scenes for 3D surveillance (e.g., reconstruction, tracking). We will show that this leads to a computer system which is able to more effectively track (follow)
objects in video when they are obstructed from some of the views. Because of the nature of the application area, our algorithm
will be under the constraints of using few cameras (no more than 3) that are configured wide-baseline. Our algorithm consists
of a learning phase, where a 3D probabilistic model of occlusions is estimated per-voxel, per-view over time via an iterative
framework. In this framework, at each frame the visual hull of each foreground object (person) is computed via a Markov Random
Field that integrates the occlusion model. The model is then updated at each frame using this solution, providing an iterative
process that can accurately estimate the occlusion model over time and overcome the few-camera constraint. We demonstrate
the application of such a model to a number of areas, including visual hull reconstruction, the reconstruction of the occluding
structures themselves, and 3D tracking. 相似文献
3D video billboard clouds reconstruct and represent a dynamic three-dimensional scene using displacement-mapped billboards. They consist of geometric proxy planes augmented with detailed displacement maps and combine the generality of geometry-based 3D video with the regularization properties of image-based 3D video. 3D video billboards are an image-based representation placed in the disparity space of the acquisition cameras and thus provide a regular sampling of the scene with a uniform error model. We propose a general geometry filtering framework which generates time-coherent models and removes reconstruction and quantization noise as well as calibration errors. This replaces the complex and time-consuming sub-pixel matching process in stereo reconstruction with a bilateral filter. Rendering is performed using a GPU-accelerated algorithm which generates consistent view-dependent geometry and textures for each individual frame. In addition, we present a semi-automatic approach for modeling dynamic three-dimensional scenes with a set of multiple 3D video billboards clouds.相似文献
This paper presents a novel method for estimating camera motion and reconstructing human face from a video sequence. The coarse-to-fine method is applied via combining the concepts of Powell’s minimization with gradient descent. Sparse points defining the human face in every frame are tracked using the active appearance model. The case of occluded points, even for self-occlusion, does not pose a problem in the proposed method. Robustness in the presence of noise and 3D accuracy using this method is also demonstrated. Examples of face reconstruction using other methods including trifocal tensor, Powell’s minimization, and gradient descent are also compared to the proposed method. Experiments on both synthetic and real faces are presented and analyzed. Also, different camera movement paths are illustrated. All real-world experiments used an off-the-shelf digital camera carried by a human walking without using any dolly to demonstrate the robustness and practicality of the proposed method. 相似文献
To enable real-time, person-independent 3D registration from 2D video, we developed a 3D cascade regression approach in which facial landmarks remain invariant across pose over a range of approximately 60°. From a single 2D image of a person's face, a dense 3D shape is registered in real time for each frame. The algorithm utilizes a fast cascade regression framework trained on high-resolution 3D face-scans of posed and spontaneous emotion expression. The algorithm first estimates the location of a dense set of landmarks and their visibility, then reconstructs face shapes by fitting a part-based 3D model. Because no assumptions are required about illumination or surface properties, the method can be applied to a wide range of imaging conditions that include 2D video and uncalibrated multi-view video. The method has been validated in a battery of experiments that evaluate its precision of 3D reconstruction, extension to multi-view reconstruction, temporal integration for videos and 3D head-pose estimation. Experimental findings strongly support the validity of real-time, 3D registration and reconstruction from 2D video. The software is available online at http://zface.org. 相似文献
In recent years, the convergence of computer vision and computer graphics has put forth a new field of research that focuses on the reconstruction of real-world scenes from video streams. To make immersive 3D video reality, the whole pipeline spanning from scene acquisition over 3D video reconstruction to real-time rendering needs to be researched. In this paper, we describe latest advancements of our system to record, reconstruct and render free-viewpoint videos of human actors. We apply a silhouette-based non-intrusive motion capture algorithm making use of a 3D human body model to estimate the actor’s parameters of motion from multi-view video streams. A renderer plays back the acquired motion sequence in real-time from any arbitrary perspective. Photo-realistic physical appearance of the moving actor is obtained by generating time-varying multi-view textures from video. This work shows how the motion capture sub-system can be enhanced by incorporating texture information from the input video streams into the tracking process. 3D motion fields are reconstructed from optical flow that are used in combination with silhouette matching to estimate pose parameters. We demonstrate that a high visual quality can be achieved with the proposed approach and validate the enhancements caused by the the motion field step. 相似文献
How to effectively utilize inter-frame redundancies is the key to improve the accuracy and speed of video super-resolution reconstruction methods. Previous methods usually process every frame in the whole video in the same way, and do not make full use of redundant information between frames, resulting in low accuracy or long reconstruction time. In this paper, we propose the idea of reconstructing key frames and non-key frames respectively, and give a video super-resolution reconstruction method based on deep back projection and motion feature fusion. Key-frame reconstruction subnet can obtain key frame features and reconstruction results with high accuracy. For non-key frames, key frame features can be reused by fusing them and motion features, so as to obtain accurate non-key frame features and reconstruction results quickly. Experiments on several public datasets show that the proposed method performs better than the state-of-the-art methods, and has good robustness.
The ability to detect and track human heads and faces in video sequences can be considered as the finest level of any video surveillance system. In this paper, we introduce a general framework for evaluating our recent appearance-based 3D face tracker using dense 3D data. This tracker combines online appearance models with an image registration technique and can run in real-time and is drift insensitive. More precisely, accuracy and usability of this developed tracker are assessed using stereo-based range facial data from which ground truth 3D motions are computed. This evaluation quantifies the monocular tracker accuracy, and identifies its working range in 3D space. Additionally, this evaluation gives some hints on how the tracker can be fully exploited. 相似文献
Reconstructing 3D face models from 2D face images is usually done by using a single reference 3D face model or some gender/ethnicity specific 3D face models. However, different persons, even those of the same gender or ethnicity, usually have significantly different faces in terms of their overall appearance, which forms the base of person recognition via faces. Consequently, existing 3D reference model based methods have limited capability of reconstructing precise 3D face models for a large variety of persons. In this paper, we propose to explore a reservoir of diverse reference models for 3D face reconstruction from forensic mugshot face images, where facial examplars coherent with the input determine the final shape estimation. Specifically, our 3D face reconstruction is formulated as an energy minimization problem with: 1) shading constraint from multiple input face images, 2) distortion and self-occlusion based color consistency between different views, and 3) depth uncertainty based smoothness constraint on adjacent pixels. The proposed energy is minimized in a coarse to fine way, where the shape refinement step is done by using a multi-label segmentation algorithm. Experimental results on challenging datasets demonstrate that the proposed algorithm is capable of recovering high quality 3D face models. We also show that our reconstructed models successfully boost face recognition accuracy. 相似文献
In this paper, we propose a novel framework for 3D facial similarity measures and facial data organization. The 3D facial similarity measures of our method are based on iso-geodesic stripes and conformal parameterization. Using the conformal parameterization, the 3D facial surface can be mapped into a 2D domain and the iso-geodesic stripes of the face can be measured. The measure results can be regarded as the similarity of faces, which is robust to head poses and facial expressions. Based on the measure result, a hierarchical structure of faces can be constructed, which is used to organize different faces. The structure can be utilized to accelerate the face searching speed in a large database. In experiment, we construct the hierarchical structures from two public facial databases: Gavab and Texas3D. The searching speed based on the structure can be increased by 4-6 times without accuracy loss of recognition.
We present a real-time implementation of 2D to 3D video conversion using compressed video. In our method, compressed 2D video
is analyzed by extracting motion vectors. Using the motion vector maps, depth maps are built for each frame and the frames
are segmented to provide object-wise depth ordering. These data are then used to synthesize stereo pairs. 3D video synthesized
in this fashion can be viewed using any stereoscopic display. In our implementation, anaglyph projection was selected as the
3D visualization method, because it is mostly suited to standard displays.
Automatic reconstruction of 3D objects from 2D orthographic views has been a major research issue in CAD/CAM.In this paper,two acceleratin techniques to improve the efficiency of reconstruction are presented.First,some peudo elements are removed by depth and topology information as soon as the wire-frame is constructed ,which reduces the searching space.Second.the proposed algorithm does not establish all possible surfaces in the process of generating 3D faces.The surfaces and edge loops are generated by using the relationship between the boundaries of 3D faces and their projections,This avoids the growth in combinational complexity of previous methods that have to check all possible pairs of 3D candidate edges. 相似文献
Face recognition with variant pose, illumination and expression (PIE) is a challenging problem. In this paper, we propose an analysis-by-synthesis framework for face recognition with variant PIE. First, an efficient two-dimensional (2D)-to-three-dimensional (3D) integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model from a single frontal face image with neutral expression and normal illumination. Then, realistic virtual faces with different PIE are synthesized based on the personalized 3D face to characterize the face subspace. Finally, face recognition is conducted based on these representative virtual faces. Compared with other related work, this framework has following advantages: (1) only one single frontal face is required for face recognition, which avoids the burdensome enrollment work; (2) the synthesized face samples provide the capability to conduct recognition under difficult conditions like complex PIE; and (3) compared with other 3D reconstruction approaches, our proposed 2D-to-3D integrated face reconstruction approach is fully automatic and more efficient. The extensive experimental results show that the synthesized virtual faces significantly improve the accuracy of face recognition with changing PIE. 相似文献