首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper is concerned with three-dimensional (3D) analysis, and analysis-guided syntheses, of images showing 3-D motion of an observer relative to a scene. There are two objectives of the paper. First, it presents an approach to recovering 3D motion and structure parameters from multiple cues present in a monocular image sequence, such as point features, optical flow, regions, lines, texture gradient, and vanishing line. Second, it introduces the notion that the cues that contribute the most to 3-D interpretation are also the ones that would yield the most realistic synthesis, thus suggesting an approach to analysis guided 3-D representation. For concreteness, the paper focuses on flight image sequences of a planar, textured surface. The integration of information in these diverse cues is carried out using optimization. For reliable estimation, a sequential batch method is used to compute motion and structure. Synthesis is done by using (i) image attributes extracted from the image sequence, and (ii) simple, artificial image attributes which are not present in the original images. For display, real and/or artificial attributes are shown as a monocular or a binocular sequence. Performance evaluation is done through experiments with one synthetic sequence, and two real image sequences digitized from a commercially available video tape and a laserdisc. The attribute based representation of these sequences compressed their sizes by 502 and 367. The visualization sequence appears very similar to the original sequence in informal, monocular as well as stereo viewing on a workstation monitor  相似文献   

2.
为了在没有任何特殊标志的情况下,实现从单目序列图象中分析、估计人手臂的三维运动,提出了一种多约束融合的方法,该方法是利用棍棒模型来模拟人的手臂,首先通过处理单目图象序列来自动获取图象序列中手臂关节点的对应;然后再利用多约束融合及基于图象序列中关节点的对应,即估计尺度意义下关节点的三维相对运动轨迹;最后利用真实图象来获得相应人手臂的三维运动轨迹,并将其与通过运动捕捉系统获得的人手臂的真实三维运动轨迹进行了比较实验。实验结果表明,该方法用于对人手臂的运动分析非常有效。  相似文献   

3.
In this paper we address the problem of recovering 3D non-rigid structure from a sequence of images taken with a stereo pair. We have extended existing non-rigid factorization algorithms to the stereo camera case and presented an algorithm to decompose the measurement matrix into the motion of the left and right cameras and the 3D shape, represented as a linear combination of basis-shapes. The added constraints in the stereo camera case are that both cameras are viewing the same structure and that the relative orientation between both cameras is fixed. Our focus in this paper is on the recovery of flexible 3D shape rather than on the correspondence problem. We propose a method to compute reliable 3D models of deformable structure from stereo images. Our experiments with real data show that improved reconstructions can be achieved using this method. The algorithm includes a non-linear optimization step that minimizes image reprojection error and imposes the correct structure to the motion matrix by choosing an appropriate parameterization. We show that 3D shape and motion estimates can be successfully disambiguated after bundle adjustment and demonstrate this on synthetic and real image sequences. While this optimization step is proposed for the stereo camera case, it can be readily applied to the case of non-rigid structure recovery using a monocular video sequence. Electronic supplementary material Electronic supplementary material is available for this article at and accessible for authorised users.  相似文献   

4.
基于对数极坐标映射的图像拼接方法   总被引:7,自引:1,他引:7       下载免费PDF全文
图像拼接在基于图像的绘制、视频检索以及景物匹配等领域有着广泛的应用,为了获取大画面宽视场的场景表示,针对存在旋转及缩放变化的图像,提出了一种基于对数极坐标映射的图像拼接方法,该方法先将图像从笛卡儿坐标空间转换到对数极坐标空间,使得笛卡儿坐标空间中图像的旋转和缩放转换为对数极坐标空间中图像的二维平移,这样可直接利用相位相关法来估算出图像间的旋转角度和缩放因子,然后以此作为初值,再采用非线性最小化优化算法进一步求精图像间的运动参数来实现图像的配准,最后通过图像融合来实现图像的拼接。实验结果表明,该方法是有效的。  相似文献   

5.
This paper deals with the estimation of motion and structure with an absolute scale factor from stereo image sequences without stereo correspondence. We show that the absolute motion and structure can be determined using only motion correspondences. This property is very useful in two aspects: first, motion correspondence is easier to solve than stereo correspondence because sequences of images can be taken at short time intervals; second, it is not necessary that the rigid scene be included in the intersection of the field of view of the two cameras. It is also shown that the degenerate cases reported in this paper constitute all of the degenerate cases for the scheme and can be easily avoided.  相似文献   

6.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

7.
《Advanced Robotics》2013,27(6-7):893-921
Visual odometry refers to the use of images to estimate the motion of a mobile robot. Real-time systems have already been demonstrated for terrestrial robotic vehicles, while a near real-time system has been successfully used on the Mars Exploration Rovers for planetary exploration. In this paper, we adapt this method to estimate the motion of a hopping rover on an asteroid surface. Due to the limited stereo depth resolution and the continuous rotational motion on a hopping rover, we propose to use a system of multiple monocular cameras. We describe how the scale of the scene observed by different cameras without overlapping views can be transferred between the cameras, allowing us to reconstruct a single continuous trajectory from multiple image sequences. We describe the implementation of our algorithm and its performance under simulation using rendered images.  相似文献   

8.
Structure from motion causally integrated over time   总被引:4,自引:0,他引:4  
We describe an algorithm for reconstructing three-dimensional structure and motion causally, in real time from monocular sequences of images. We prove that the algorithm is minimal and stable, in the sense that the estimation error remains bounded with probability one throughout a sequence of arbitrary length. We discuss a scheme for handling occlusions (point features appearing and disappearing) and drift in the scale factor. These issues are crucial for the algorithm to operate in real time on real scenes. We describe in detail the implementation of the algorithm, which runs on a personal computer and has been made available to the community. We report the performance of our implementation on a few representative long sequences of real and synthetic images. The algorithm, which has been tested extensively over the course of the past few years, exhibits honest performance when the scene contains at least 20-40 points with high contrast, when the relative motion is "slow" compared to the sampling frequency of the frame grabber (30 Hz), and the lens aperture is "large enough" (typically more than 30° of visual field)  相似文献   

9.
We present an approach to jointly estimating camera motion and dense structure of a static scene in terms of depth maps from monocular image sequences in driver-assistance scenarios. At each instant of time, only two consecutive frames are processed as input data of a joint estimator that fully exploits second-order information of the corresponding optimization problem and effectively copes with the non-convexity due to both the imaging geometry and the manifold of motion parameters. Additionally, carefully designed Gaussian approximations enable probabilistic inference based on locally varying confidence and globally varying sensitivity due to the epipolar geometry, with respect to the high-dimensional depth map estimation. Embedding the resulting joint estimator in an online recursive framework achieves a pronounced spatio-temporal filtering effect and robustness. We evaluate hundreds of images taken from a car moving at speed up to 100 km/h and being part of a publicly available benchmark data set. The results compare favorably with two alternative settings: stereo based scene reconstruction and camera motion estimation in batch mode using multiple frames. They, however, require a calibrated camera pair or storage for more than two frames, which is less attractive from a technical viewpoint than the proposed monocular and recursive approach. In addition to real data, a synthetic sequence is considered which provides reliable ground truth.  相似文献   

10.
基于遗传算法的直线光流刚体运动重建   总被引:1,自引:0,他引:1  
建立一种新的基于直线光流场从单目图像序列恢复刚体运动和结构的模型,推导出直线光流场与刚体的运动参数之间的关系,用2个二阶线性微分方程表达这种关系,并提出一种求解刚体运动参数的遗传算法,只需要获得图像平面的2条直线光流即可求解刚体的旋转参数,并用合成图像测试了该算法的有效性。  相似文献   

11.
We present a new variational method for multi-view stereovision and non-rigid three-dimensional motion estimation from multiple video sequences. Our method minimizes the prediction error of the shape and motion estimates. Both problems then translate into a generic image registration task. The latter is entrusted to a global measure of image similarity, chosen depending on imaging conditions and scene properties. Rather than integrating a matching measure computed independently at each surface point, our approach computes a global image-based matching score between the input images and the predicted images. The matching process fully handles projective distortion and partial occlusions. Neighborhood as well as global intensity information can be exploited to improve the robustness to appearance changes due to non-Lambertian materials and illumination changes, without any approximation of shape, motion or visibility. Moreover, our approach results in a simpler, more flexible, and more efficient implementation than in existing methods. The computation time on large datasets does not exceed thirty minutes on a standard workstation. Finally, our method is compliant with a hardware implementation with graphics processor units. Our stereovision algorithm yields very good results on a variety of datasets including specularities and translucency. We have successfully tested our motion estimation algorithm on a very challenging multi-view video sequence of a non-rigid scene. Electronic supplementary material Electronic supplementary material is available for this article at and accessible for authorised users.  相似文献   

12.
目的 传统的单目视觉深度测量方法具有设备简单、价格低廉、运算速度快等优点,但需要对相机进行复杂标定,并且只在特定的场景条件下适用。为此,提出基于运动视差线索的物体深度测量方法,从图像中提取特征点,利用特征点与图像深度的关系得到测量结果。方法 对两幅图像进行分割,获取被测量物体所在区域;然后采用本文提出的改进的尺度不变特征变换SIFT(scale-invariant feature transtorm)算法对两幅图像进行匹配,结合图像匹配和图像分割的结果获取被测量物体的匹配结果;用Graham扫描法求得匹配后特征点的凸包,获取凸包上最长线段的长度;最后利用相机成像的基本原理和三角几何知识求出图像深度。结果 实验结果表明,本文方法在测量精度和实时性两方面都有所提升。当图像中的物体不被遮挡时,实际距离与测量距离之间的误差为2.60%,测量距离的时间消耗为1.577 s;当图像中的物体存在部分遮挡时,该方法也获得了较好的测量结果,实际距离与测量距离之间的误差为3.19%,测量距离所需时间为1.689 s。结论 利用两幅图像上的特征点来估计图像深度,对图像中物体存在部分遮挡情况具有良好的鲁棒性,同时避免了复杂的摄像机标定过程,具有实际应用价值。  相似文献   

13.
场景的深度估计问题是计算机视觉领域中的经典问题之一,也是3维重建和图像合成等应用中的一个重要环节。基于深度学习的单目深度估计技术高速发展,各种网络结构相继提出。本文对基于深度学习的单目深度估计技术最新进展进行了综述,回顾了基于监督学习和基于无监督学习方法的发展历程。重点关注单目深度估计的优化思路及其在深度学习网络结构中的表现,将监督学习方法分为多尺度特征融合的方法、结合条件随机场(conditional random field,CRF)的方法、基于序数关系的方法、结合多元图像信息的方法和其他方法等5类;将无监督学习方法分为基于立体视觉的方法、基于运动恢复结构(structure from motion,SfM)的方法、结合对抗性网络的方法、基于序数关系的方法和结合不确定性的方法等5类。此外,还介绍了单目深度估计任务中常用的数据集和评价指标,并对目前基于深度学习的单目深度估计技术在精确度、泛化性、应用场景和无监督网络中不确定性研究等方面的现状和面临的挑战进行了讨论,为相关领域的研究人员提供一个比较全面的参考。  相似文献   

14.
In this paper we describe an algorithm to recover the scene structure, the trajectories of the moving objects and the camera motion simultaneously given a monocular image sequence. The number of the moving objects is automatically detected without prior motion segmentation. Assuming that the objects are moving linearly with constant speeds, we propose a unified geometrical representation of the static scene and the moving objects. This representation enables the embedding of the motion constraints into the scene structure, which leads to a factorization-based algorithm. We also discuss solutions to the degenerate cases which can be automatically detected by the algorithm. Extension of the algorithm to weak perspective projections is presented as well. Experimental results on synthetic and real images show that the algorithm is reliable under noise.  相似文献   

15.
Tracking both structure and motion of nonrigid objects from monocular images is an important problem in vision. In this paper, a hierarchical method which integrates local analysis (that recovers small details) and global analysis (that appropriately limits possible nonrigid behaviors) is developed to recover dense depth values and nonrigid motion from a sequence of 2D satellite cloud images without any prior knowledge of point correspondences. This problem is challenging not only due to the absence of correspondence information but also due to the lack of depth cues in the 2D cloud images (scaled orthographic projection). In our method, the cloud images are segmented into several small regions and local analysis is performed for each region. A recursive algorithm is proposed to integrate local analysis with appropriate global fluid model constraints, based on which a structure and motion analysis system, SMAS, is developed. We believe that this is the first reported system in estimating dense structure and nonrigid motion under scaled orthographic views using fluid model constraints. Experiments on cloud image sequences captured by meteorological satellites (GOES-8 and GOES-9) have been performed using our system, along with their validation and analyses. Both structure and 3D motion correspondences are estimated to subpixel accuracy. Our results are very encouraging and have many potential applications in earth and space sciences, especially in cloud models for weather prediction  相似文献   

16.
目的单目相机运动轨迹恢复由于输入只有单目视频序列而缺乏尺度信息,生成的轨迹存在严重漂移而无法进行高精度应用。为了能够运用单目相机普及度高、成本低的优势,提出一种基于场景几何的方法在自动驾驶领域进行真实尺度恢复。方法首先使用深度估计网络对连续图像进行相对深度估计,利用估计的深度值将像素点从2维平面投影到3维空间。然后对光流网络估计出的光流进行前后光流一致性计算得到有效匹配点,使用传统方法求解位姿,使相对深度与位姿尺度统一。再利用相对深度值计算表面法向量图求解地面点群,通过几何关系计算相同尺度的相机高度后引入相机先验高度得到初始尺度。最后为了减小图像噪声对尺度造成的偏差,由额外的车辆检测模块计算出的补偿尺度与初始尺度加权得到最终尺度。结果实验在KITTI(Karlsruhe Institute of Technology and Toyota Technological at Chicago)自动驾驶数据集上进行,相机运动轨迹和图像深度均在精度上得到提高。使用深度真实值尺度还原后的相对深度的绝对误差为0.114,使用本文方法进行尺度恢复后的绝对深度的绝对误差为0.116。对得到的相机运动轨...  相似文献   

17.
18.
深度学习单目深度估计研究进展   总被引:1,自引:0,他引:1       下载免费PDF全文
单目深度估计是从单幅图像中获取场景深度信息的重要技术,在智能汽车和机器人定位等领域应用广泛,具有重要的研究价值。随着深度学习技术的发展,涌现出许多基于深度学习的单目深度估计研究,单目深度估计性能也取得了很大进展。本文按照单目深度估计模型采用的训练数据的类型,从3个方面综述了近年来基于深度学习的单目深度估计方法:基于单图像训练的模型、基于多图像训练的模型和基于辅助信息优化训练的单目深度估计模型。同时,本文在综述了单目深度估计研究常用数据集和性能指标基础上,对经典的单目深度估计模型进行了性能比较分析。以单幅图像作为训练数据的模型具有网络结构简单的特点,但泛化性能较差。采用多图像训练的深度估计网络有更强的泛化性,但网络的参数量大、网络收敛速度慢、训练耗时长。引入辅助信息的深度估计网络的深度估计精度得到了进一步提升,但辅助信息的引入会造成网络结构复杂、收敛速度慢等问题。单目深度估计研究还存在许多的难题和挑战。利用多图像输入中包含的潜在信息和特定领域的约束信息,来提高单目深度估计的性能,逐渐成为了单目深度估计研究的趋势。  相似文献   

19.
以多视图几何原理为基础,有效结合卷积神经网络进行图像深度估计和匹配筛选,构造无监督单目视觉里程计方法.针对主流深度估计网络易丢失图像浅层特征的问题,构造一种基于改进密集模块的深度估计网络,有效地聚合浅层特征,提升图像深度估计精度.里程计利用深度估计网络精确预测单目图像深度,利用光流网络获得双向光流,通过前后光流一致性原则筛选高质量匹配.利用多视图几何原理和优化方式求解获得初始位姿和计算深度,并通过特定的尺度对齐原则得到全局尺度一致的6自由度位姿.同时,为了提高网络对场景细节和弱纹理区域的学习能力,将基于特征图合成的特征度量损失结合到网络损失函数中.在KITTI Odometry数据集上进行实验验证,不同阈值下的深度估计取得了85.9%、95.8%、97.2%的准确率.在09和10序列上进行里程计评估,绝对轨迹误差在0.007 m.实验结果验证了所提出方法的有效性和准确性,表明其在深度估计和视觉里程计任务上的性能优于现有方法.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号