首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Deals with the 3D structure estimation and exploration of static scenes using active vision. Our method is based on the structure from controlled motion approach that constrains camera motions to obtain an optimal estimation of the 3D structure of a geometrical primitive. Since this approach involves gazing on the considered primitive, we have developed perceptual strategies able to perform a succession of robust estimations. This leads to a gaze planning strategy that mainly uses a representation of known and unknown areas as a basis for selecting viewpoints. This approach ensures a reconstruction as complete as possible of the scene  相似文献   

3.
Tracking People on a Torus   总被引:1,自引:0,他引:1  
We present a framework for monocular 3D kinematic pose tracking and viewpoint estimation of periodic and quasi-periodic human motions from an uncalibrated camera. The approach we introduce here is based on learning both the visual observation manifold and the kinematic manifold of the motion using a joint representation. We show that the visual manifold of the observed shape of a human performing a periodic motion, observed from different viewpoints, is topologically equivalent to a torus manifold. The approach we introduce here is based on the supervised learning of both the visual and kinematic manifolds. Instead of learning an embedding of the manifold, we learn the geometric deformation between an ideal manifold (conceptual equivalent topological structure) and a twisted version of the manifold (the data). Experimental results show accurate estimation of the 3D body posture and the viewpoint from a single uncalibrated camera.  相似文献   

4.
This paper first introduces a canonical representation for cylinders. The canonical representation introduced here is closely related to the Plücker line representation. In this paper, we show that this representation is an appropriate one for computer vision applications. In particular, it allows us to easily develop a series of mathematical methods for pose estimation, 3D reconstruction, and motion estimation. One of the major novelties in this paper is the introduction of the main equations dominating the three view geometry of cylinders. We show the relationship between cylinders’ three-view geometry and that of lines (Spetsakis and Aloimonos, 1990; Weng et al., 1993) and points (Shashua, 1995) defined by the trilinear tensor (Hartley, 1997), and propose a linear method, which uses the correspondences between six cylinders across three views in order to recover the motion and structure. Cylindrical pipes and containers are the main components in the majority of chemical, water treatment and power plants, oil platforms, refineries and many other industrial installations. We have developed a professional software, called CyliCon, which allows efficient as-built reconstruction of such installations from a series of pre-calibrated images. Markers are used for this pre-calibration process. The theoretical and practical results in this paper represent the first steps towards marker-less calibration and reconstruction of such industrial sites. Here, the experimental results take advantage of the two-view and three-view geometry of cylinders introduced in this paper to provide initial camera calibration results.  相似文献   

5.
《Real》1999,5(3):215-230
The problem of a real-time pose estimation between a 3D scene and a single camera is a fundamental task in most 3D computer vision and robotics applications such as object tracking, visual servoing, and virtual reality. In this paper we present two fast methods for estimating the 3D pose using 2D to 3D point and line correspondences. The first method is based on the iterative use of a weak perspective camera model and forms a generalization of DeMenthon's method (1995) which consists of determining the pose from point correspondences. In this method the pose is iteratively improved with a weak perspective camera model and at convergence the computed pose corresponds to the perspective camera model. The second method is based on the iterative use of a paraperspective camera model which is a first order approximation of perspective. We describe in detail these two methods for both non-planar and planar objects. Experiments involving synthetic data as well as real range data indicate the feasibility and robustness of these two methods. We analyse the convergence of these methods and we conclude that the iterative paraperspective method has better convergence properties than the iterative weak perspective method. We also introduce a non-linear optimization method for solving the pose problem.  相似文献   

6.
The role of fixation in visual motion analysis   总被引:4,自引:4,他引:0  
How does the ability of humans and primates to fixate at environmental points in the presence of relative motion help their visual systems in solving various tasks? To state the question in a more formal setting, we investigate in this article the following problem: Suppose that we have an active vision system, that is, a camera resting on a platform and being controlled through motors by a computer that has access to the images sensed by the camera in real time. The platform can move freely in the environment. If this machine can fixate on targets being in relative motion with it, can it solve visual tasks in an efficient and robust manner? By restricting our attention to a set of navigational tasks, we find that such an active observer can solve the problems of 3-D motion estimation, egomotion recovery, and estimation of time-to-contact in a very efficient manner, using as input the spatiotemporal derivatives of the image-intensity function (or normal flow). Fixation over time changes the input (motion field) in a controlled way and from this change additional information is derived making the previously mentioned tasks easier to solve.  相似文献   

7.
This paper is centered around landmark detection, tracking, and matching for visual simultaneous localization and mapping using a monocular vision system with active gaze control. We present a system that specializes in creating and maintaining a sparse set of landmarks based on a biologically motivated feature-selection strategy. A visual attention system detects salient features that are highly discriminative and ideal candidates for visual landmarks that are easy to redetect. Features are tracked over several frames to determine stable landmarks and to estimate their 3-D position in the environment. Matching of current landmarks to database entries enables loop closing. Active gaze control allows us to overcome some of the limitations of using a monocular vision system with a relatively small field of view. It supports 1) the tracking of landmarks that enable a better pose estimation, 2) the exploration of regions without landmarks to obtain a better distribution of landmarks in the environment, and 3) the active redetection of landmarks to enable loop closing in situations in which a fixed camera fails to close the loop. Several real-world experiments show that accurate pose estimation is obtained with the presented system and that active camera control outperforms the passive approach.   相似文献   

8.
MonoSLAM: real-time single camera SLAM   总被引:4,自引:0,他引:4  
We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera  相似文献   

9.
In this paper, the visual servoing problem is addressed by coupling nonlinear control theory with a convenient representation of the visual information used by the robot. The visual representation, which is based on a linear camera model, is extremely compact to comply with active vision requirements. The devised control law is proven to ensure global asymptotic stability in the Lyapunov sense, assuming exact model and state measurements. It is also shown that, in the presence of bounded uncertainties, the closed-loop behavior is characterized by a global attractor. The well known pose ambiguity arising from the use of linear camera models is solved at the control level by choosing a hybrid visual state vector including both image space (2D) information and 3D object parameters. A method is expounded for on-line visual state estimation that avoids camera calibration. Simulation and real-time experiments validate the theoretical framework in terms of both system convergence and control robustness.  相似文献   

10.
11.
胡钊政  谈正 《自动化学报》2007,33(5):494-499
利用三正交平移运动, 提出了一种三维结构恢复和直接欧氏重建新算法. 算法仅需利用主动视觉平台控制相机作一组三正交平移运动, 然后通过图像对应点和平移运动的距离就可以恢复平面结构信息和进行欧氏重建. 并且无需假定相机畸变因子为零. 算法计算过程中无需求解相机的内参数, 也无需进行分层重构, 它是一种直接的欧氏重建算法, 避免了传统算法中的相机标定、仿射重建等两大难题, 并且计算过程完全线性化, 简单实用. 最后用模拟实验和真实图像实验对算法进行验证, 实验结果表明了算法的有效性和准确性.  相似文献   

12.
Tracking is a very important research subject in a real-time augmented reality context. The main requirements for trackers are high accuracy and little latency at a reasonable cost. In order to address these issues, a real-time, robust, and efficient 3D model-based tracking algorithm is proposed for a "video see through" monocular vision system. The tracking of objects in the scene amounts to calculating the pose between the camera and the objects. Virtual objects can then be projected into the scene using the pose. In this paper, nonlinear pose estimation is formulated by means of a virtual visual servoing approach. In this context, the derivation of point-to-curves interaction matrices are given for different 3D geometrical primitives including straight lines, circles, cylinders, and spheres. A local moving edges tracker is used in order to provide real-time tracking of points normal to the object contours. Robustness is obtained by integrating an M-estimator into the visual control law via an iteratively reweighted least squares implementation. This approach is then extended to address the 3D model-free augmented reality problem. The method presented in this paper has been validated on several complex image sequences including outdoor environments. Results show the method to be robust to occlusion, changes in illumination, and mistracking.  相似文献   

13.
The view-independent visualization of 3D scenes is most often based on rendering accurate 3D models or utilizes image-based rendering techniques. To compute the 3D structure of a scene from a moving vision sensor or to use image-based rendering approaches, we need to be able to estimate the motion of the sensor from the recorded image information with high accuracy, a problem that has been well-studied. In this work, we investigate the relationship between camera design and our ability to perform accurate 3D photography, by examining the influence of camera design on the estimation of the motion and structure of a scene from video data. By relating the differential structure of the time varying plenoptic function to different known and new camera designs, we can establish a hierarchy of cameras based upon the stability and complexity of the computations necessary to estimate structure and motion. At the low end of this hierarchy is the standard planar pinhole camera for which the structure from motion problem is non-linear and ill-posed. At the high end is a camera, which we call the full field of view polydioptric camera, for which the motion estimation problem can be solved independently of the depth of the scene which leads to fast and robust algorithms for 3D Photography. In between are multiple view cameras with a large field of view which we have built, as well as omni-directional sensors.  相似文献   

14.
We introduce the concept of self-calibration of a 1D projective camera from point correspondences, and describe a method for uniquely determining the two internal parameters of a 1D camera, based on the trifocal tensor of three 1D images. The method requires the estimation of the trifocal tensor which can be achieved linearly with no approximation unlike the trifocal tensor of 2D images and solving for the roots of a cubic polynomial in one variable. Interestingly enough, we prove that a 2D camera undergoing planar motion reduces to a 1D camera. From this observation, we deduce a new method for self-calibrating a 2D camera using planar motions. Both the self-calibration method for a 1D camera and its applications for 2D camera calibration are demonstrated on real image sequences.  相似文献   

15.
2D visual servoing consists in using data provided by a vision sensor for controlling the motions of a dynamic system. Most of visual servoing approaches has relied on the geometric features that have to be tracked and matched in the image acquired by the camera. Recent works have highlighted the interest of taking into account the photometric information of the entire image. This approach was tackled with images of perspective cameras. We propose, in this paper, to extend this technique to central cameras. This generalization allows to apply this kind of method to catadioptric cameras and wide field of view cameras. Several experiments have been successfully done with a fisheye camera in order to control a 6 degrees of freedom robot and with a catadioptric camera for a mobile robot navigation task.  相似文献   

16.
The paper describes the rank 1 weighted factorization solution to the structure from motion problem. This method recovers the 3D structure from the factorization of a data matrix that is rank 1 rather than rank 3. This matrix collects the estimates of the 2D motions of a set of feature points of the rigid object. These estimates are weighted by the inverse of the estimates error standard deviation so that the 2D motion estimates for "sharper" features, which are usually well-estimated, are given more weight, while the noisier motion estimates for "smoother" features are weighted less. We analyze the performance of the rank 1 weighted factorization algorithm to determine what are the most suitable 3D shapes or the best 3D motions to recover the 3D structure of a rigid object from the 2D motions of the features. Our approach is developed for the orthographic camera model. It avoids expensive singular value decompositions by using the power method and is suitable to handle dense sets of feature points and long video sequences. Experimental studies with synthetic and real data illustrate the good performance of our approach.  相似文献   

17.
Registration of 3D data is a key problem in many applications in computer vision, computer graphics and robotics. This paper provides a family of minimal solutions for the 3D-to-3D registration problem in which the 3D data are represented as points and planes. Such scenarios occur frequently when a 3D sensor provides 3D points and our goal is to register them to a 3D object represented by a set of planes. In order to compute the 6 degrees-of-freedom transformation between the sensor and the object, we need at least six points on three or more planes. We systematically investigate and develop pose estimation algorithms for several configurations, including all minimal configurations, that arise from the distribution of points on planes. We also identify the degenerate configurations in such registrations. The underlying algebraic equations used in many registration problems are the same and we show that many 2D-to-3D and 3D-to-3D pose estimation/registration algorithms involving points, lines, and planes can be mapped to the proposed framework. We validate our theory in simulations as well as in three real-world applications: registration of a robotic arm with an object using a contact sensor, registration of planar city models with 3D point clouds obtained using multi-view reconstruction, and registration between depth maps generated by a Kinect sensor.  相似文献   

18.
We present a novel optimisation framework for the estimation of the multi-body motion segmentation and 3D reconstruction of a set of point trajectories in the presence of missing data. The proposed solution not only assigns the trajectories to the correct motion but it also solves for the 3D location of multi-body shape and it fills the missing entries in the measurement matrix. Such a solution is based on two fundamental principles: each of the multi-body motions is controlled by a set of metric constraints that are given by the specific camera model, and the shape matrix that describes the multi-body 3D shape is generally sparse. We jointly include such constraints in a unique optimisation framework which, starting from an initial segmentation, iteratively enforces these set of constraints in three stages. First, metric constraints are used to estimate the 3D metric shape and to fill the missing entries according to an orthographic camera model. Then, wrongly segmented trajectories are detected by using sparse optimisation of the shape matrix. A final reclassification strategy assigns the detected points to the right motion or discards them as outliers. We provide experiments that show consistent improvements to previous approaches both on synthetic and real data.  相似文献   

19.
目的 视觉定位旨在利用易于获取的RGB图像对运动物体进行目标定位及姿态估计。室内场景中普遍存在的物体遮挡、弱纹理区域等干扰极易造成目标关键点的错误估计,严重影响了视觉定位的精度。针对这一问题,本文提出一种主被动融合的室内定位系统,结合固定视角和移动视角的方案优势,实现室内场景中运动目标的精准定位。方法 提出一种基于平面先验的物体位姿估计方法,在关键点检测的单目定位框架基础上,使用平面约束进行3自由度姿态优化,提升固定视角下室内平面中运动目标的定位稳定性。基于无损卡尔曼滤波算法设计了一套数据融合定位系统,将从固定视角得到的被动式定位结果与从移动视角得到的主动式定位结果进行融合,提升了运动目标的位姿估计结果的可靠性。结果 本文提出的主被动融合室内视觉定位系统在iGibson仿真数据集上的平均定位精度为2~3 cm,定位误差在10 cm内的准确率为99%;在真实场景中平均定位精度为3~4 cm,定位误差在10 cm内的准确率在90%以上,实现了cm级的定位精度。结论 提出的室内视觉定位系统融合了被动式和主动式定位方法的优势,能够以较低设备成本实现室内场景中高精度的目标定位结果,并在遮挡、目标...  相似文献   

20.
Accurate visual hand pose estimation at joint level has several applications for human-robot interaction, natural user interfaces and virtual/augmented reality applications. However, it is still an open problem being addressed by the computer vision community. Recent novel deep learning techniques may help circumvent the limitations of standard approaches. However, they require large amounts of accurate annotated data.Hand pose datasets that have been released so far present issues such as limited number of samples, inaccurate data or high-level annotations. Moreover, most of them are focused on depth-based approaches, providing only depth information (missing RGB data).In this work, we present a novel multiview hand pose dataset in which we provide hand color images and different kind of annotations for each sample, i.e. the bounding box and the 2D and 3D location on the joints in the hand. Furthermore, we introduce a simple yet accurate deep learning architecture for real-time robust 2D hand pose estimation. Then, we conduct experiments that show how the use of the proposed dataset in the training stage produces accurate results for 2D hand pose estimation using a single color camera.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号