首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This contribution addresses the problem of pose estimation and tracking of vehicles in image sequences from traffic scenes recorded by a stationary camera. In a new algorithm, the vehicle pose is estimated by directly matching polyhedral vehicle models to image gradients without an edge segment extraction process. The new approach is significantly more robust than approaches that rely on feature extraction since the new approach exploits more information from the image data. We successfully tracked vehicles that were partially occluded by textured objects, e.g., foliage, where a previous approach based on edge segment extraction failed. Moreover, the new pose estimation approach is also used to determine the orientation and position of the road relative to the camera by matching an intersection model directly to image gradients. Results from various experiments with real world traffic scenes are presented.  相似文献   

2.
This paper presents a robust framework for tracking complex objects in video sequences. Multiple hypothesis tracking (MHT) algorithm reported in (IEEE Trans. Pattern Anal. Mach. Intell. 18(2) (1996)) is modified to accommodate a high level representations (2D edge map, 3D models) of objects for tracking. The framework exploits the advantages of MHT algorithm which is capable of resolving data association/uncertainty and integrates it with object matching techniques to provide a robust behavior while tracking complex objects. To track objects in 2D, a 4D feature is used to represent edge/line segments and are tracked using MHT. In many practical applications 3D models provide more information about the object's pose (i.e., rotation information in the transformation space) which cannot be recovered using 2D edge information. Hence, a 3D model-based object tracking algorithm is also presented. A probabilistic Hausdorff image matching algorithm is incorporated into the framework in order to determine the geometric transformation that best maps the model features onto their corresponding ones in the image plane. 3D model of the object is used to constrain the tracker to operate in a consistent manner. Experimental results on real and synthetic image sequences are presented to demonstrate the efficacy of the proposed framework.  相似文献   

3.
Generating large-scale and high-quality 3D scene reconstruction from monocular images is an essential technical foundation in augmented reality and robotics. However, the apparent shortcomings (e.g., scale ambiguity, dense depth estimation in texture-less areas) make applying monocular 3D reconstruction to real-world practice challenging. In this work, we combine the advantage of deep learning and multi-view geometry to propose RGB-Fusion, which effectively solves the inherent limitations of traditional monocular reconstruction. To eliminate the confinements of tracking accuracy imposed by the prediction deficiency of neural networks, we propose integrating the PnP (Perspective-n-Point) algorithm into the tracking module. We employ 3D ICP (Iterative Closest Point) matching and 2D feature matching to construct separate error terms and jointly optimize them, reducing the dependence on the accuracy of depth prediction and improving pose estimation accuracy. The approximate pose predicted by the neural network is employed as the initial optimization value to avoid the trapping of local minimums. We formulate a depth map refinement strategy based on the uncertainty of the depth value, which can naturally lead to a refined depth map. Through our method, low-uncertainty elements can significantly update the current depth value while avoiding high-uncertainty elements from adversely affecting depth estimation accuracy. Numerical qualitative and quantitative evaluation results of tracking, depth prediction, and 3D reconstruction show that RGB-Fusion exceeds most monocular 3D reconstruction systems.  相似文献   

4.
It is well known that biological motion conveys a wealth of socially meaningful information. From even a brief exposure, biological motion cues enable the recognition of familiar people, and the inference of attributes such as gender, age, mental state, actions and intentions. In this paper we show that from the output of a video-based 3D human tracking algorithm we can infer physical attributes (e.g., gender and weight) and aspects of mental state (e.g., happiness or sadness). In particular, with 3D articulated tracking we avoid the need for view-based models, specific camera viewpoints, and constrained domains. The task is useful for man–machine communication, and it provides a natural benchmark for evaluating the performance of 3D pose tracking methods (vs. conventional Euclidean joint error metrics). We show results on a large corpus of motion capture data and on the output of a simple 3D pose tracker applied to videos of people walking.  相似文献   

5.
基于图像矩的运动目标3D平动视觉跟踪   总被引:5,自引:0,他引:5  
林靖  陈辉堂  王月娟  徐强华  蒋平 《机器人》2000,22(3):217-223
区别于图像的简单几何特征,本文利用图像的全局特征描述子 图像矩特征作为图像 特征信息,实现了基于图像的运动目标3D平动的视觉跟踪.针对任务要求,本文选取了一组 矩特征用以完成任务.基于所选的矩,本文给出了矩特征变化量与相对位姿变化量之间的关 系矩阵,即图像雅可比矩阵,然后利用所推导的图像雅可比矩阵,设计了由图像反馈与目标 运动自适应补偿组成的视觉伺服控制器,实现了在未知目标成 深度及摄像机焦距的情况下 对运动目标的3D平动跟踪.  相似文献   

6.
Robust GPU-assisted camera tracking using free-form surface models   总被引:1,自引:0,他引:1  
We propose a marker-less model-based camera tracking approach, which makes use of GPU-assisted analysis-by-synthesis methods on a very wide field of view (e.g. fish-eye) camera. After an initial registration based on a learned database of robust features, the synthesis part of the tracking is performed on graphics hardware, which simulates internal and external parameters of the camera, this way minimizing lens and viewpoint differences between a model view and a real camera image. Based on an automatically reconstructed free-form surface model we analyze the sensitivity of the tracking to the model accuracy, in particular the case when we represent curved surfaces by planar patches. We also examine accuracy and show on synthetic and on real data that the system does not suffer from drift accumulation. The wide field of view of the camera and the subdivision of our reference model into many textured free-form surface patches make the system robust against illumination changes, moving persons and other occlusions within the environment and provide a camera pose estimate in a fixed and known coordinate system.  相似文献   

7.
当前三维重建系统大多基于特征点法和直接法的同时定位与地图重建(SLAM)系统,特征点法SLAM难以在特征点缺失的地方具有较好的重建结果,直接法SLAM在相机运动过快时难以进行位姿估计,从而造成重建效果不理想.针对上述问题,文中提出基于半直接法SLAM的大场景稠密三维重建系统.通过深度相机(RGB-D相机)扫描,在特征点丰富的区域使用特征点法进行相机位姿估计,在特征点缺失区域使用直接法进行位姿估计,减小光度误差,优化相机位姿.然后使用优化后较准确的相机位姿进行地图构建,采用面元模型,应用构建变形图的方法进行点云的位姿估计和融合,最终获得较理想的三维重建模型.实验表明,文中系统可适用于各个场合的三维重建,得到较理想的三维重建模型.  相似文献   

8.
Whereas 3D surface models are often used for augmented reality (e.g., for occlusion handling or model-based camera tracking), the creation and the use of such dense 3D models in augmented reality applications usually are two separated processes. The 3D surface models are often created in offline preparation steps, which makes it difficult to detect changes and to adapt the 3D model to these changes. This work presents a 3D change detection and model adjustment framework that combines AR techniques with real-time depth imaging to close the loop between dense 3D modeling and augmented reality. The proposed method detects the differences between a scene and a 3D model of the scene in real time. Then, the detected geometric differences are used to update the 3D model, thus bringing AR and 3D modeling closer together. The accuracy of the geometric difference detection depends on the depth measurement accuracy as well as on the accuracy of the intrinsic and extrinsic parameters. To evaluate the influence of these parameters, several experiments were conducted with simulated ground truth data. Furthermore, the evaluation shows the applicability of AR and depth image–based 3D modeling for model-based camera tracking.  相似文献   

9.
10.

Tracking the head in a video stream is a common thread seen within computer vision literature, supplying the research community with a large number of challenging and interesting problems. Head pose estimation from monocular cameras is often considered an extended application after the face tracking task has already been performed. This often involves passing the resultant 2D data through a simpler algorithm that best fits the data to a static 3D model to determine the 3D pose estimate. This work describes the 2.5D constrained local model, combining a deformable 3D shape point model with 2D texture information to provide direct estimation of the pose parameters, avoiding the need for additional optimization strategies. It achieves this through an analytical derivation of a Jacobian matrix describing how changes in the parameters of the model create changes in the shape within the image through a full-perspective camera model. In addition, the model has very low computational complexity and can run in real-time on modern mobile devices such as tablets and laptops. The point distribution model of the face is built in a unique way, so as to minimize the effect of changes in facial expressions on the estimated head pose and hence make the solution more robust. Finally, the texture information is trained via local neural fields—a deep learning approach that utilizes small discriminative patches to exploit spatial relationships between the pixels and provide strong peaks at the optimal locations.

  相似文献   

11.
We propose a vision-based robust automatic 3D object recognition, which provides object identification and 3D pose information by combining feature matching with tracking. For object identification, we propose a robust visual feature and a probabilistic voting scheme. An initial object pose is estimated using correlations between the model image and the 3D CAD model, which are predefined, and the homography, byproduct of the identification. In tracking, a Lie group formalism is used for robust and fast motion computation. Experimental results show that object recognition by the proposed method improves the recognition range considerably. Sungho Kim received the B.S. degree in Electrical Engineering from Korea University, Korea in 2000 and the M.S. degree in Electrical Engineering and Computer Science from Korea Advanced Institute of Science and Technology, Korea in 2002. He is currently pursuing his Ph.D. at the latter institution, concentrating on 3D object recognition and tracking. In So Kweon received the Ph.D. degree in robotics from Carnegie Mellon University, Pittsburgh, PA, in 1990. Since 1992, he has been a Professor of Electrical Engineering at KAIST. His current research interests include human visual perception, object recognition, real-time tracking, vision-based mobile robot localization, volumetric 3D reconstruction, and camera calibration. He is a member of the IEEE, and Korea Robotics Society (KRS).  相似文献   

12.
Head pose estimation is a key task for visual surveillance, HCI and face recognition applications. In this paper, a new approach is proposed for estimating 3D head pose from a monocular image. The approach assumes the full perspective projection camera model. Our approach employs general prior knowledge of face structure and the corresponding geometrical constraints provided by the location of a certain vanishing point to determine the pose of human faces. To achieve this, eye-lines, formed from the far and near eye corners, and mouth-line of the mouth corners are assumed parallel in 3D space. Then the vanishing point of these parallel lines found by the intersection of the eye-line and mouth-line in the image can be used to infer the 3D orientation and location of the human face. In order to deal with the variance of the facial model parameters, e.g. ratio between the eye-line and the mouth-line, an EM framework is applied to update the parameters. We first compute the 3D pose using some initially learnt parameters (such as ratio and length) and then adapt the parameters statistically for individual persons and their facial expressions by minimizing the residual errors between the projection of the model features points and the actual features on the image. In doing so, we assume every facial feature point can be associated to each of features points in 3D model with some a posteriori probability. The expectation step of the EM algorithm provides an iterative framework for computing the a posterori probabilities using Gaussian mixtures defined over the parameters. The robustness analysis of the algorithm on synthetic data and some real images with known ground-truth are included.  相似文献   

13.
To construct a water quality monitoring system, challenging issues need to be addressed regarding the acquisition of target information (e.g. 3D location and occlusion) as well as the behavioural analysis of aquatic organisms. This paper presents a novel 3D information acquisition and location method, by means of an information acquisition platform consisting of a monitoring terminal, frame grabbers, a single camera and a single mirror. Using this platform, we propose a theoretical 2D image model for locating 3D targets and then validate it using data obtained from both real and artificial fish. The proposed model is based on the principles of light refraction, plane mirror imaging, underwater objects and camera imaging as well as the technologies of digital to analog conversion and object segmentation. In contrast with existing methods, our method can accurately reflect 3D information of aquatic organisms, thus providing critical technical support for the development of water quality monitoring systems in the future.  相似文献   

14.
High dimensional pose state space is the main challenge in articulated human pose tracking which makes pose analysis computationally expensive or even infeasible. In this paper, we propose a novel generative approach in the framework of evolutionary computation, by which we try to widen the bottleneck with effective search strategy embedded in the extracted state subspace. Firstly, we use ISOMAP to learn the low-dimensional latent space of pose state in the aim of both reducing dimensionality and extracting the prior knowledge of human motion simultaneously. Then, we propose a manifold reconstruction method to establish smooth mappings between the latent space and original space, which enables us to perform pose analysis in the latent space. In the search strategy, we adopt a new evolutionary approach, clonal selection algorithm (CSA), for pose optimization. We design a CSA based method to estimate human pose from static image, which can be used for initialization of motion tracking. In order to make CSA suitable for motion tracking, we propose a sequential CSA (S-CSA) algorithm by incorporating the temporal continuity information into the traditional CSA. Actually, in a Bayesian inference view, the sequential CSA algorithm is in essence a multilayer importance sampling based particle filter. Our methods are demonstrated in different motion types and different image sequences. Experimental results show that our CSA based pose estimation method can achieve viewpoint invariant 3D pose reconstruction and the S-CSA based motion tracking method can achieve accurate and stable tracking of 3D human motion.  相似文献   

15.
In this paper, we introduce a method to estimate the object’s pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature points from adjacent images in the video sequence. We first demonstrate that centralized pose estimation from the collection of corresponding feature points in the 2D images from all cameras can be obtained as a solution to a generalized Sylvester’s equation. We subsequently derive a distributed solution to pose estimation from multiple cameras and show that it is equivalent to the solution of the centralized pose estimation based on Sylvester’s equation. Specifically, we rely on collaboration among the multiple cameras to provide an iterative refinement of the independent solution to pose estimation obtained for each camera based on Sylvester’s equation. The proposed approach to pose estimation from multiple cameras relies on all of the information available from all cameras to obtain an estimate at each camera even when the image features are not visible to some of the cameras. The resulting pose estimation technique is therefore robust to occlusion and sensor errors from specific camera views. Moreover, the proposed approach does not require matching feature points among images from different camera views nor does it demand reconstruction of 3D points. Furthermore, the computational complexity of the proposed solution grows linearly with the number of cameras. Finally, computer simulation experiments demonstrate the accuracy and speed of our approach to pose estimation from multiple cameras.  相似文献   

16.
三维模板跟踪旨在将预先构建的三维CAD模型与输入图像中的相应目标进行精确配准,在增强现实、机器人等领域具有重要的应用,也是计算机视觉领域的关键问题之一.近年来,三维模板跟踪的准确率和稳定性都得到了持续提升,但仅有少量的工作关注三维模板跟踪数据集的构建.随着深度学习的普及,各领域中大规模数据集的构建越来越被重视,为算法的...  相似文献   

17.
18.
19.
基于人体行为3D模型的2D行为识别   总被引:5,自引:1,他引:4  
针对行为识别中行为者朝向变化带来的问题, 提出了一种基于人体行为3D模型的2D行为识别算法. 在学习行为分类器时, 以3D占据网格表示行为样本, 提取人体3D关节点作为描述行为的特征, 为每一类行为训练一个基于范例的隐马尔可夫模型(Exemplar-based hidden Markov model, EHMM), 同时从3D行为样本中选取若干帧作为3D关键姿势集, 这个集合是连接2D观测样本和人体3D关节点特征的桥梁. 在识别2D行为时, 2D观测样本序列可以由一个或多个非标定的摄像机采集. 首先在3D关键姿势集中为每一帧2D观测样本寻找与之最匹配的3D关键姿势帧, 之后由行为分类器对2D观测样本序列对应的3D关键姿势序列进行识别. 该算法在训练行为分类器时要进行行为者的3D重构和人体3D关节点的提取, 而在识别2D行为时不再需要进行3D重构. 通过在3个数据库上的实验, 证明该算法可以有效识别行为者在任意朝向下的行为, 并可以适应不同的行为采集环境.  相似文献   

20.
苏乐  柴金祥  夏时洪 《软件学报》2016,27(S2):172-183
提出一种基于局部姿态先验的从深度图像中实时在线捕获3D人体运动的方法.关键思路是根据从捕获的深度图像中自动提取具有语义信息的虚拟稀疏3D标记点,从事先建立的异构3D人体姿态数据库中快速检索K个姿态近邻并构建局部姿态先验模型,通过迭代优化求解最大后验概率,实时地在线重建3D人体姿态序列.实验结果表明,该方法能够实时跟踪重建出稳定、准确的3D人体运动姿态序列,并且只需经过个体化人体参数自动标定过程,可跟踪身材尺寸差异较大的不同表演者;帧率约25fps.因此,所提方法可应用于3D游戏/电影制作、人机交互控制等领域.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号