首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a model-based tracking method for articulated objects in monocular video sequences under varying illumination conditions. The tracking method uses estimates of optical flows constructed by projecting model textures into the camera images and comparing the projected textures with the recorded information. An articulated body is modelled in terms of 3D primitives, each possessing a specified texture on its surface. An important step in model-based tracking of 3D objects is the estimation of the pose of the object during the tracking process. The optimal pose is estimated by minimizing errors between the computed optical flow and the projected 2D velocities of the model textures. This estimation uses a least-squares method with kinematic constraints for the articulated object and a perspective camera model. We test our framework with an articulated robot and show results.  相似文献   

2.
Multiple human pose estimation is an important yet challenging problem. In an operating room (OR) environment, the 3D body poses of surgeons and medical staff can provide important clues for surgical workflow analysis. For that purpose, we propose an algorithm for localizing and recovering body poses of multiple human in an OR environment under a multi-camera setup. Our model builds on 3D Pictorial Structures and 2D body part localization across all camera views, using convolutional neural networks (ConvNets). To evaluate our algorithm, we introduce a dataset captured in a real OR environment. Our dataset is unique, challenging and publicly available with annotated ground truths. Our proposed algorithm yields to promising pose estimation results on this dataset.  相似文献   

3.
We present a novel representation and rendering method for free‐viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi‐automatic, data‐driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free‐viewpoint video generated from broadcast footage of challenging, uncontrolled environments.  相似文献   

4.
《Real》1997,3(6):415-432
Real-time motion capture plays a very important role in various applications, such as 3D interface for virtual reality systems, digital puppetry, and real-time character animation. In this paper we challenge the problem of estimating and recognizing the motion of articulated objects using theoptical motion capturetechnique. In addition, we present an effective method to control the articulated human figure in realtime.The heart of this problem is the estimation of 3D motion and posture of an articulated, volumetric object using feature points from a sequence of multiple perspective views. Under some moderate assumptions such as smooth motion and known initial posture, we develop a model-based technique for the recovery of the 3D location and motion of a rigid object using a variation of Kalman filter. The posture of the 3D volumatric model is updated by the 2D image flow of the feature points for all views. Two novel concepts – the hierarchical Kalman filter (KHF) and the adaptive hierarchical structure (AHS) incorporating the kinematic properties of the articulated object – are proposed to extend our formulation for the rigid object to the articulated one. Our formulation also allows us to avoid two classic problems in 3D tracking: the multi-view correspondence problem, and the occlusion problem. By adding more cameras and placing them appropriately, our approach can deal with the motion of the object in a very wide area. Furthermore, multiple objects can be handled by managing multiple AHSs and processing multiple HKFs.We show the validity of our approach using the synthetic data acquired simultaneously from the multiple virtual camera in a virtual environment (VE) and real data derived from a moving light display with walking motion. The results confirm that the model-based algorithm works well on the tracking of multiple rigid objects.  相似文献   

5.
The photorealistic modeling of large-scale objects, such as urban scenes, requires the combination of range sensing technology and digital photography. In this paper, we attack the key problem of camera pose estimation, in an automatic and efficient way. First, the camera orientation is recovered by matching vanishing points (extracted from 2D images) with 3D directions (derived from a 3D range model). Then, a hypothesis-and-test algorithm computes the camera positions with respect to the 3D range model by matching corresponding 2D and 3D linear features. The camera positions are further optimized by minimizing a line-to-line distance. The advantage of our method over earlier work has to do with the fact that we do not need to rely on extracted planar facades, or other higher-order features; we are utilizing low-level linear features. That makes this method more general, robust, and efficient. We have also developed a user-interface for allowing users to accurately texture-map 2D images onto 3D range models at interactive rates. We have tested our system in a large variety of urban scenes.  相似文献   

6.
We propose a general algorithm for identifying an arbitrary pose of an articulated subject with sparse point features. The algorithm aims to identify a one-to-one correspondence between a model point-set and an observed point-set taken from freeform motion of the articulated subject. We avoid common assumptions such as pose similarity or small motions with respect to the model, and assume no prior knowledge from which to infer an initial or partial correspondence between the two point-sets. The algorithm integrates local segment-based correspondences under a set of affine transformations, and a global hierarchical search strategy. Experimental results, based on synthetic pose and real-world human motion data demonstrate the ability of the algorithm to perform the identification task. Reliability is increasingly compromised with increasing data noise and segmental distortion, but the algorithm can tolerate moderate levels. This work contributes to establishing a crucial self-initializing identification in model-based point-feature tracking for articulated motion.  相似文献   

7.
We present a system that tracks an articulated body performing 3D movement with occlusions using a combination of cameras and mirrors. By integrating cameras and mirrors we get a simultaneous coverage of almost every point on the target and avoid occlusions. The suggested setup is much simpler and easier to handle compared to the equivalent, camera-based setup. Our tracking algorithm is model-based, and errors in the model are treated using the bundle adjustment procedure. In order to deal with the problem of feature visibility, each feature is set to be valid or invalid based on the model and on its expected appearance; this ensures that the system always tracks a set of distinguishable features. The proposed algorithm was able to track targets in 3D using the Gauss–Newton method to minimize geometric errors. We tested our setup by tracking the chameleon’s eyes. Tracking the eyes of a chameleon can be considered as the estimation of the 3D pose of an articulated body, where the head of the chameleon is considered as a rigid body, and each of the two eyes has additional two degrees of freedom. The algorithm proposed can be easily expanded to cope with more complex objects.  相似文献   

8.
A combined 2D, 3D approach is presented that allows for robust tracking of moving people and recognition of actions. It is assumed that the system observes multiple moving objects via a single, uncalibrated video camera. Low-level features are often insufficient for detection, segmentation, and tracking of non-rigid moving objects. Therefore, an improved mechanism is proposed that integrates low-level (image processing), mid-level (recursive 3D trajectory estimation), and high-level (action recognition) processes. A novel extended Kalman filter formulation is used in estimating the relative 3D motion trajectories up to a scale factor. The recursive estimation process provides a prediction and error measure that is exploited in higher-level stages of action recognition. Conversely, higher-level mechanisms provide feedback that allows the system to reliably segment and maintain the tracking of moving objects before, during, and after occlusion. Heading-guided recognition (HGR) is proposed as an efficient method for adaptive classification of activity. The HGR approach is demonstrated using “motion history images” that are then recognized via a mixture-of-Gaussians classifier. The system is tested in recognizing various dynamic human outdoor activities: running, walking, roller blading, and cycling. In addition, experiments with real and synthetic data sets are used to evaluate stability of the trajectory estimator with respect to noise.  相似文献   

9.
Earthwork operations are crucial parts of most construction projects. Heavy construction equipment and workers are often required to work in limited workspaces simultaneously. Struck-by accidents resulting from poor worker and equipment interactions account for a large proportion of accidents and fatalities on construction sites. The emerging technologies based on computer vision and artificial intelligence offer an opportunity to enhance construction safety through advanced monitoring utilizing site cameras. A crucial pre-requisite to the development of safety monitoring applications is the ability to identify accurately and localize the position of the equipment and its critical components in 3D space. This study proposes a workflow for excavator 3D pose estimation based on deep learning using RGB images. In the proposed workflow, an articulated 3D digital twin of an excavator is used to generate the necessary data for training a 3D pose estimation model. In addition, a method for generating hybrid datasets (simulation and laboratory) for adapting the 3D pose estimation model for various scenarios with different camera parameters is proposed. Evaluations prove the capability of the workflow in estimating the 3D pose of excavators. The study concludes by discussing the limitations and future research opportunities.  相似文献   

10.
While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture synchronized video and ground-truth 3D motion. The resulting HumanEva datasets contain multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 40,000 frames of synchronized motion capture and multi-view video (resulting in over one quarter million image frames in total) were collected at 60 Hz with an additional 37,000 time instants of pure motion capture data. A standard set of error measures is defined for evaluating both 2D and 3D pose estimation and tracking algorithms. We also describe a baseline algorithm for 3D articulated tracking that uses a relatively standard Bayesian framework with optimization in the form of Sequential Importance Resampling and Annealed Particle Filtering. In the context of this baseline algorithm we explore a variety of likelihood functions, prior models of human motion and the effects of algorithm parameters. Our experiments suggest that image observation models and motion priors play important roles in performance, and that in a multi-view laboratory environment, where initialization is available, Bayesian filtering tends to perform well. The datasets and the software are made available to the research community. This infrastructure will support the development of new articulated motion and pose estimation algorithms, will provide a baseline for the evaluation and comparison of new methods, and will help establish the current state of the art in human pose estimation and tracking.  相似文献   

11.
目的 视觉里程计(visual odometry,VO)仅需要普通相机即可实现精度可观的自主定位,已经成为计算机视觉和机器人领域的研究热点,但是当前研究及应用大多基于场景为静态的假设,即场景中只有相机运动这一个运动模型,无法处理多个运动模型,因此本文提出一种基于分裂合并运动分割的多运动视觉里程计方法,获得场景中除相机运动外多个运动目标的运动状态。方法 基于传统的视觉里程计框架,引入多模型拟合的方法分割出动态场景中的多个运动模型,采用RANSAC(random sample consensus)方法估计出多个运动模型的运动参数实例;接着将相机运动信息以及各个运动目标的运动信息转换到统一的坐标系中,获得相机的视觉里程计结果,以及场景中各个运动目标对应各个时刻的位姿信息;最后采用局部窗口光束法平差直接对相机的姿态以及计算出来的相机相对于各个运动目标的姿态进行校正,利用相机运动模型的内点和各个时刻获得的相机相对于运动目标的运动参数,对多个运动模型的轨迹进行优化。结果 本文所构建的连续帧运动分割方法能够达到较好的分割结果,具有较好的鲁棒性,连续帧的分割精度均能达到近100%,充分保证后续估计各个运动模型参数的准确性。本文方法不仅能够有效估计出相机的位姿,还能估计出场景中存在的显著移动目标的位姿,在各个分段路径中相机自定位与移动目标的定位结果位置平均误差均小于6%。结论 本文方法能够同时分割出动态场景中的相机自身运动模型和不同运动的动态物体运动模型,进而同时估计出相机和各个动态物体的绝对运动轨迹,构建出多运动视觉里程计过程。  相似文献   

12.
Inserting synthetic objects into video sequences has gained much interest in recent years. Fast and robust vision-based algorithms are necessary to make such an application possible. Traditional pose tracking schemes using recursive structure from motion techniques adopt one Kalman filter and thus only favor a certain type of camera motion. We propose a robust simultaneous pose tracking and structure recovery algorithm using the interacting multiple model (IMM) to improve performance. In particular, a set of three extended Kalman filters (EKFs), each describing a frequently occurring camera motion in real situations (general, pure translation, pure rotation), is applied within the IMM framework to track the pose of a scene. Another set of EKFs,one filter for each model point, is used to refine the positions of the model features in the 3-D space. The filters for pose tracking and structure refinement are executed in an interleaved manner. The results are used for inserting virtual objects into the original video footage. The performance of the algorithm is demonstrated with both synthetic and real data. Comparisons with different approaches have been performed and show that our method is more efficient and accurate.  相似文献   

13.
艾青林  王威  刘刚江 《机器人》2022,44(4):431-442
为解决室内动态环境下现有RGB-D SLAM(同步定位与地图创建)系统定位精度低、建图效果差的问题,提出一种基于网格分割与双地图耦合的RGB-D SLAM算法。基于单应运动补偿与双向补偿光流法,根据几何连通性与深度图像聚类结果实现网格化运动分割,同时保证算法的快速性。利用静态区域内的特征点最小化重投影误差对相机进行位置估计。结合相机位姿、RGB-D图像、网格化运动分割图像,同时构建场景的稀疏点云地图和静态八叉树地图并进行耦合,在关键帧上使用基于网格分割和八叉树地图光线遍历的方法筛选静态地图点,更新稀疏点云地图,保障定位精度。公开数据集和实际动态场景中的实验结果都表明,本文算法能够有效提升室内动态场景中的相机位姿估计精度,实现场景静态八叉树地图的实时构建和更新。此外,本文算法能够实时运行在标准CPU硬件平台上,无需GPU等额外计算资源。  相似文献   

14.
15.
Structured light methods achieve 3D modelling by observing with a camera system, a known pattern projected on the scene. The main drawback of single projection structured light methods is that moving the projector changes significatively the appearance of the scene at every acquisition time. Classical multi-view stereovision approaches based on the appearance matching are then not useable. The presented work is based on a two-cameras and one single slide projector system embedded in a hand-held device for industrial applications (reverse engineering, dimensional control, etc). We propose a method to achieve multi-view modelling for camera pose and surface reconstruction estimation in a joint process. The proposed method is based on the extension of a stereo-correlation criterion. Acquisitions are linked through a generalized expression of local homographies. The constraints brought by this formulation allow an accurate estimation of the modelling parameters for dense reconstruction of the scene and improve the result when dealing with detailed or sharp objects, compared to pairwise stereovision methods.  相似文献   

16.
17.
为了解决类别级三维可形变目标姿态估计问题,基于目标的关键点,提出了一种面向类别的三维可形变目标姿态估计方法。该方法设计了一种基于关键点的端到端深度学习框架,框架以PointNet++为后端网络,通过特征提取、部位分割、关键点提取和基于关键点的姿态估计部分实现可形变目标的姿态估计,具有计算精度高、鲁棒性强等优势。同时,基于ANCSH方法设计了适用于K-AOPE网络的关键点标准化分层表示方法,该方法仅需提取目标少量的关键点即可表示类别物体。为了验证方法的有效性,在公共数据集shape2motion上进行测试。实验结果显示,提出的姿态估计方法(以眼镜类别为例)在旋转角上的误差分别为2.3°、3.1°、3.7°,平移误差分别为0.034、0.030、0.046,连接状态误差为2.4°、2.5°,连接参数误差为1.2°、0.9°,0.008、0.010。与ANCSH方法相比,所提方法具有较高的准确性和鲁棒性。  相似文献   

18.
We develop a method for the estimation of articulated pose, such as that of the human body or the human hand, from a single (monocular) image. Pose estimation is formulated as a statistical inference problem, where the goal is to find a posterior probability distribution over poses as well as a maximum a posteriori (MAP) estimate. The method combines two modeling approaches, one discriminative and the other generative. The discriminative model consists of a set of mapping functions that are constructed automatically from a labeled training set of body poses and their respective image features. The discriminative formulation allows for modeling ambiguous, one-to-many mappings (through the use of multi-modal distributions) that may yield multiple valid articulated pose hypotheses from a single image. The generative model is defined in terms of a computer graphics rendering of poses. While the generative model offers an accurate way to relate observed (image features) and hidden (body pose) random variables, it is difficult to use it directly in pose estimation, since inference is computationally intractable. In contrast, inference with the discriminative model is tractable, but considerably less accurate for the problem of interest. A combined discriminative/generative formulation is derived that leverages the complimentary strengths of both models in a principled framework for articulated pose inference. Two efficient MAP pose estimation algorithms are derived from this formulation; the first is deterministic and the second non-deterministic. Performance of the framework is quantitatively evaluated in estimating articulated pose of both the human hand and human body. Most of this work was done while the first author was with Boston University.  相似文献   

19.
This paper presents a real-time vision based algorithm for 5 degrees-of-freedom pose estimation and set-point control for a Micro Aerial Vehicle (MAV). The camera is mounted on-board a quadrotor helicopter. Camera pose estimation is based on the appearance of two concentric circles which are used as landmark. We show that that by using a calibrated camera, conic sections, and the assumption that yaw is controlled independently, it is possible to determine the six degrees-of-freedom pose of the MAV. First we show how to detect the landmark in the image frame. Then we present a geometric approach for camera pose estimation from the elliptic appearance of a circle in perspective projection. Using this information we are able to determine the pose of the vehicle. Finally, given a set point in the image frame we are able to control the quadrotor such that the feature appears in the respective target position. The performance of the proposed method is presented through experimental results.  相似文献   

20.
This paper presents a CAD-based six-degrees-of-freedom (6-DoF) pose estimation design for random bin picking for multiple objects. A virtual camera generates a point cloud database for the objects using their 3D CAD models. To reduce the computational time of 3D pose estimation, a voxel grid filter reduces the number of points for the 3D cloud of the objects. A voting scheme is used for object recognition and to estimate the 6-DoF pose for different objects. An outlier filter filters out badly matching poses so that the robot arm always picks up the upper object in the bin, which increases the success rate. In a computer simulation using a synthetic scene, the average recognition rate is 97.81 % for three different objects with various poses. A series of experiments have been conducted to validate the proposed method using a Kuka robot arm. The average recognition rate for three objects is 92.39 % and the picking success rate is 89.67 %.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号