首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Recent studies have demonstrated that high-level semantics in data can be captured using sparse representation. In this paper, we propose an approach to human body pose estimation in static images based on sparse representation. Given a visual input, the objective is to estimate 3D human body pose using feature space information and geometrical information of the pose space. On the assumption that each data point and its neighbors are likely to reside on a locally linear patch of the underlying manifold, our method learns the sparse representation of the new input using both feature and pose space information and then estimates the corresponding 3D pose by a linear combination of the bases of the pose dictionary. Two strategies for dictionary construction are presented: (i) constructing the dictionary by randomly selecting the frames of a sequence and (ii) selecting specific frames of a sequence as dictionary atoms. We analyzed the effect of each strategy on the accuracy of pose estimation. Extensive experiments on datasets of various human activities show that our proposed method outperforms state-of-the-art methods.  相似文献   

3.
Head pose estimation plays an essential role in many high-level face analysis tasks. However, accurate and robust pose estimation with existing approaches remains challenging. In this paper, we propose a novel method for accurate three-dimensional (3D) head pose estimation with noisy depth maps and high-resolution color images that are typically produced by popular RGBD cameras such as the Microsoft Kinect. Our method combines the advantages of the high-resolution RGB image with the 3D information of the depth image. For better accuracy and robustness, features are first detected using only the color image, and then the 3D feature points used for matching are obtained by combining depth information. The outliers are then filtered with depth information using rules proposed for depth consistency, normal consistency, and re-projection consistency, which effectively eliminate the influence of depth noise. The pose parameters are then iteratively optimized using the Extended LM (Levenberg-Marquardt) method. Finally, a Kalman filter is used to smooth the parameters. To evaluate our method, we built a database of more than 10K RGBD images with ground-truth poses recorded using motion capture. Both qualitative and quantitative evaluations show that our method produces notably smaller errors than previous methods.  相似文献   

4.
Automatic initialization and tracking of human pose is an important task in visual surveillance. We present a part-based approach that incorporates a variety of constraints in a unified framework. These constraints include the kinematic constraints between parts that are physically connected to each other, the occlusion of one part by another and the high correlation between the appearance of certain parts, such as the arms. The location probability distribution of each part is determined by evaluating appropriate likelihood measures. The graphical (non-tree) structure representing the interdependencies between parts is utilized to "connect" such part distributions via nonparametric belief propagation. Methods are also developed to perform this optimization efficiently in the large space of pose configurations.  相似文献   

5.
In this paper a real-time 3D pose estimation algorithm using range data is described. The system relies on a novel 3D sensor that generates a dense range image of the scene. By not relying on brightness information, the proposed system guarantees robustness under a variety of illumination conditions, and scene contents. Efficient face detection using global features and exploitation of prior knowledge along with novel feature localization and tracking techniques are described. Experimental results demonstrate accurate estimation of the six degrees of freedom of the head and robustness under occlusions, facial expressions, and head shape variability.  相似文献   

6.
Fast and globally convergent pose estimation from video images   总被引:10,自引:0,他引:10  
Determining the rigid transformation relating 2D images to known 3D geometry is a classical problem in photogrammetry and computer vision. Heretofore, the best methods for solving the problem have relied on iterative optimization methods which cannot be proven to converge and/or which do not effectively account for the orthonormal structure of rotation matrices. We show that the pose estimation problem can be formulated as that of minimizing an error metric based on collinearity in object (as opposed to image) space. Using object space collinearity error, we derive an iterative algorithm which directly computes orthogonal rotation matrices and which is globally convergent. Experimentally, we show that the method is computationally efficient, that it is no less accurate than the best currently employed optimization methods, and that it outperforms all tested methods in robustness to outliers  相似文献   

7.
Reliable manipulation of everyday household objects is essential to the success of service robots. In order to accurately manipulate these objects, robots need to know objects’ full 6-DOF pose, which is challenging due to sensor noise, clutters, and occlusions. In this paper, we present a new approach for effectively guessing the object pose given an observation of just a small patch of the object, by leveraging the fact that many household objects can only keep stable on a planar surface under a small set of poses. In particular, for each stable pose of an object, we slice the object with horizontal planes and extract multiple cross-section 2D contours. The pose estimation is then reduced to find a stable pose whose contour matches best with that of the sensor data, and this can be solved efficiently by cross-correlation. Experiments on the manipulation tasks in the DARPA Robotics Challenge validate our approach. In addition, we also investigate our method’s performance on object recognition tasks raising in the challenge.  相似文献   

8.
《Pattern recognition》2014,47(2):525-534
In this study, we develop a central profile-based 3D face pose estimation algorithm. The central profile is a unique curve on a 3D face surface that starts from forehead center, goes down through nose ridge, nose tip, mouth center, and ends at a chin tip. The points on the central profile are co-planar and belong to a symmetry plane that separates human face into two identical parts. The central profile is protrusive and has a certain length. Most importantly, the normal vectors of the central profile points are parallel to the symmetry plane. Based on the properties of the central profile, Hough transform is employed to determine the symmetry plane by invoking a voting procedure. An objective function is introduced in the parameter space to quantify the vote importance for face profile points and map the central profile to an accumulator cell with the maximal value. Subsequently, a nose model matching algorithm is used to detect nose tip on the central profile. A pitch angle estimation algorithm is also proposed. The pose estimation experiments completed for a synthetic 3D face model and the FRGC v2.0 3D database demonstrate the effectiveness of the proposed pose estimation algorithm. The obtained central profile detection rate is 99.9%, and the nose tip detection rate has reached 98.16% with error not larger than 10 mm.  相似文献   

9.
We present a fast and efficient non-rigid shape tracking method for modeling dynamic 3D objects from multiview video. Starting from an initial mesh representation, the shape of a dynamic object is tracked over time, both in geometry and topology, based on multiview silhouette and 3D scene flow information. The mesh representation of each frame is obtained by deforming the mesh representation of the previous frame towards the optimal surface defined by the time-varying multiview silhouette information with the aid of 3D scene flow vectors. The whole time-varying shape is then represented as a mesh sequence which can efficiently be encoded in terms of restructuring and topological operations, and small-scale vertex displacements along with the initial model. The proposed method has the ability to deal with dynamic objects that may undergo non-rigid transformations and topological changes. The time-varying mesh representations of such non-rigid shapes, which are not necessarily of fixed connectivity, can successfully be tracked thanks to restructuring and topological operations employed in our deformation scheme. We demonstrate the performance of the proposed method both on real and synthetic sequences.  相似文献   

10.
This article presents a comprehensive framework for the recognition of untextured 3D models in a single image. The method proposed here is capable of recovering a 3D pose in a few hundred of milliseconds, which is a difficult challenge using this type of model.  相似文献   

11.
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-baseline cameras. The proposed global occlusion estimation approach can deal with severe inter-person occlusions in one or more views by exploiting information from other views. Image features from non-occluded views are given more weight than image features from occluded views. Self occlusion is handled by local occlusion estimation. The local occlusion estimation is used to update the image likelihood function by sorting body parts as a function of distance to the cameras. The combination of the global and the local occlusion estimation leads to accurate tracking results at much lower computational costs. We evaluate the performance of our approach on a pose estimation data set in which inter-person and self occlusions are present. The results of our experiments show that our approach is able to robustly track multiple people during large movement with severe inter-person occlusions and self occlusions, whilst maintaining near real-time performance.  相似文献   

12.
In this paper, we consider the problem of 2D human pose estimation on stereo image pairs. In particular, we aim at estimating the location, orientation and scale of upper-body parts of people detected in stereo image pairs from realistic stereo videos that can be found in the Internet. To address this task, we propose a novel pictorial structure model to exploit the stereo information included in such stereo image pairs: the Stereo Pictorial Structure (SPS). To validate our proposed model, we contribute a new annotated dataset of stereo image pairs, the Stereo Human Pose Estimation Dataset (SHPED), obtained from YouTube stereoscopic video sequences, depicting people in challenging poses and diverse indoor and outdoor scenarios. The experimental results on SHPED indicates that SPS improves on state-of-the-art monocular models thanks to the appropriate use of the stereo information.  相似文献   

13.
Autonomous robots performing cooperative tasks need to know the relative pose of the other robots in the fleet. Deducing these poses might be performed through structure from motion methods in the applications where there are no landmarks or GPS, for instance, in non-explored indoor environments. Structure from motion is a technique that deduces the pose of cameras only given only the 2D images. This technique relies on a first step that obtains a correspondence between salient points of images. For this reason, the weakness of this method is that poses cannot be estimated if a proper correspondence is not obtained due to low quality of the images or images that do not share enough salient points. We propose, for the first time, an interactive structure-from-motion method to deduce the pose of 2D cameras. Autonomous robots with embedded cameras have to stop when they cannot deduce their position because the structure-from-motion method fails. In these cases, a human interacts by simply mapping a pair of points in the robots’ images. Performing this action the human imposes the correct correspondence between them. Then, the interactive structure from motion is capable of deducing the robots’ lost positions and the fleet of robots can continue their high level task. From the practical point of view, the interactive method allows the whole system to achieve more complex tasks in more complex environments since the human interaction can be seen as a recovering or a reset process.  相似文献   

14.
It is well known that biological motion conveys a wealth of socially meaningful information. From even a brief exposure, biological motion cues enable the recognition of familiar people, and the inference of attributes such as gender, age, mental state, actions and intentions. In this paper we show that from the output of a video-based 3D human tracking algorithm we can infer physical attributes (e.g., gender and weight) and aspects of mental state (e.g., happiness or sadness). In particular, with 3D articulated tracking we avoid the need for view-based models, specific camera viewpoints, and constrained domains. The task is useful for man–machine communication, and it provides a natural benchmark for evaluating the performance of 3D pose tracking methods (vs. conventional Euclidean joint error metrics). We show results on a large corpus of motion capture data and on the output of a simple 3D pose tracker applied to videos of people walking.  相似文献   

15.
Multimedia Tools and Applications - Multi-person 3D pose estimation using a monocular freely moving camera in real-world scenarios remains a challenge. There is a lack of data with 3D ground truth,...  相似文献   

16.
Knowledge about relative poses within a tractor/trailer combination is a vital prerequisite for kinematic modelling and trajectory estimation. In case of autonomous vehicles or driver assistance systems, for example, the monitoring of an attached passive trailer is crucial for operational safety. We propose a camerabased 3D pose estimation system based on a Kalman-filter. It is evaluated against previously published methods for the same problem.  相似文献   

17.
Extracting 3D facial animation parameters from multiview video clips   总被引:1,自引:0,他引:1  
We propose an accurate and inexpensive procedure that estimates 3D facial motion parameters from mirror-reflected multiview video clips. We place two planar mirrors near a subject's cheeks and use a single camera to simultaneously capture a marker's front and side view images. We also propose a novel closed-form linear algorithm to reconstruct 3D positions from real versus mirrored point correspondences in an uncalibrated environment. Our computer simulations reveal that exploiting mirrors' various reflective properties yields a more robust, accurate, and simpler 3D position estimation approach than general-purpose stereo vision methods that use a linear approach or maximum-likelihood optimization. Our experiments show a root mean square (RMS) error of less than 2 mm in 3D space with only 20-point correspondences. For semiautomatic 3D motion tracking, we use an adaptive Kalman predictor and filter to improve stability and infer the occluded markers' position. Our approach tracks more than 50 markers on a subject's face and lips from 30-frame-per-second video clips. We've applied the facial motion parameters estimated from the proposed method to our facial animation system.  相似文献   

18.
Head gaze, or the orientation of the head, is a very important attentional cue in face to face conversation. Some subtleties of the gaze can be lost in common teleconferencing systems, because a single perspective warps spatial characteristics. A recent random hole display is a potentially interesting display for group conversation, as it allows multiple stereo viewers in arbitrary locations, without the restriction of conventional autostereoscopic displays on viewing positions. We represented a remote person as an avatar on a random hole display. We evaluated this system by measuring the ability of multiple observers with different horizontal and vertical viewing angles to accurately and simultaneously judge which targets the avatar is gazing at. We compared three perspective conditions: a conventional 2D view, a monoscopic perspective-correct view, and a stereoscopic perspective-correct views. In the latter two conditions, the random hole display shows three and six views simultaneously. Although the random hole display does not provide high quality view, because it has to distribute display pixels among multiple viewers, the different views are easily distinguished. Results suggest the combined presence of perspective-correct and stereoscopic cues significantly improved the effectiveness with which observers were able to assess the avatar׳s head gaze direction. This motivates the need for stereo in future multiview displays.  相似文献   

19.

Most of existing methods in the field of Human Pose Estimation take high accuracy as main research goal, however, reducing model complexity and improving detection speed are also very important for Human Pose Estimation, especially when running on edge devices with weak computing capability. The core motivation of this article is to reduce the model size of original Human Pose Estimation network while maintaining its performance. To achieve this goal, we present a lightweight Human Pose Estimation network for RGB image input. The network follows Stacked Hourglass network architecture and it is named Capable and Vigorous Campstool Network (CVC-Net). Specifically: 1. In order to reduce the number of model parameters, we proposed a new residual block named Res2Net_depth block, and used it to replace the residual blocks in Hourglass network. 2. We used three techniques to further improve model performance, namely channel attention mechanism, PixelShuffle up-sampling method and a newly designed Cross-Stage Heatmap Fusion method. 3. In coordinate regression step, we adopted Differentiable Spatial to Numerical Transform model combined with Euclidean distance loss, so that the model can be trained end-to-end. We evaluated the CVC-Net on widely-used datasets of different scales, e.g., LSP, MPII and COCO. In Single-Person Pose Estimation tasks, CVC-Net achieved 93.4% in PCK@0.2 score on LSP test set, and 91.6% in PCKh@0.5 score on MPII test set, with only about 4.2 M parameters. In Multi-Person Pose Estimation task, the combination of YOLOv3 and CVC-Net obtained 69.4mAP on COCO test-dev, and inference speed reached 22 FPS on a GTX1660Ti GPU machine. The experimental results showed that CVC-Net can greatly reduce the number of model parameters while ensuring quite high accuracy.

  相似文献   

20.
Robust 3-D-3-D pose estimation   总被引:1,自引:0,他引:1  
The correspondence focuses on the robust 3-D-3-D pose estimation, especially, multiple pose estimation. The robust 3-D-3-D multiple pose estimation problem is formulated as a series of general regressions which involve a successively size-decreasing data set, with each regression relating to one particular pose of interest. Since the first few regressions may carry a severely contaminated Gaussian error noise model, the MF-estimator (Zhuang et al., 1992) is used to solve each regression for each pose of interest. Extensive computer experiments with both real imagery and simulated data are conducted and results are promising. Three distinctive features of the MF-estimator are theoretically discussed and experimentally demonstrated: It is highly robust in the sense that it is not much affected by a possible large portion of outliers or incorrect matches as long as the minimum number of inliers necessary to give a unique solution are provided; It is made virtually independent of initial guesses; It is computationally reasonable and admits an efficient parallel implementation  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号