首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Generating 3D models of objects from video sequences is an important problem in many multimedia applications ranging from teleconferencing to virtual reality. In this paper, we present a method of estimating the 3D face model from a monocular image sequence, using a few standard results from the affine camera geometry literature in computer vision, and spline fitting techniques using a modified non parametric regression technique. We use the bicubic spline functions to model the depth map, given a set of observation depth maps computed from frame pairs in a video sequence. The minimal number of splines are chosen on the basis of the Schwartz's Criterion. We extend the spline fitting algorithm to hierarchical splines. Note that the camera calibration parameters and the prior knowledge of the object shape is not required by the algorithm. The system has been successfully demonstrated to extract 3D face structure of humans as well as other objects, starting from their image sequences.  相似文献   

2.
This paper presents an object tracking framework based on the mean-shift algorithm, which is a nonparametric technique that uses statistical color distribution of objects. Tracking objects through highly similar-colored background is one of the problems that need to be addressed. In various cases where object and background color distributions are very similar, the color distribution obtained from single frame alone is not sufficient to track objects reliably. To deal with this problem, the proposed algorithm utilizes an adaptive statistical background and foreground modeling to detect the change due to motion using kernel density estimation techniques based on multiple recent frames. The use of multiple frames supplies more information than single frame and thus it provides more accurate modeling of both background and foreground. In addition to color distribution, this statistical multiple frame-based motion representation is integrated into a modified mean-shift algorithm to create more robust object tracking framework. The use of motion distribution provides additional discriminative power to the framework. The superior performance with quantitative results of the framework has been validated using experiments on synthetic and real sequence of images  相似文献   

3.
场景的深度估计问题是计算机视觉领域中的经典问题之一,也是3维重建和图像合成等应用中的一个重要环节。基于深度学习的单目深度估计技术高速发展,各种网络结构相继提出。本文对基于深度学习的单目深度估计技术最新进展进行了综述,回顾了基于监督学习和基于无监督学习方法的发展历程。重点关注单目深度估计的优化思路及其在深度学习网络结构中的表现,将监督学习方法分为多尺度特征融合的方法、结合条件随机场(conditional random field,CRF)的方法、基于序数关系的方法、结合多元图像信息的方法和其他方法等5类;将无监督学习方法分为基于立体视觉的方法、基于运动恢复结构(structure from motion,SfM)的方法、结合对抗性网络的方法、基于序数关系的方法和结合不确定性的方法等5类。此外,还介绍了单目深度估计任务中常用的数据集和评价指标,并对目前基于深度学习的单目深度估计技术在精确度、泛化性、应用场景和无监督网络中不确定性研究等方面的现状和面临的挑战进行了讨论,为相关领域的研究人员提供一个比较全面的参考。  相似文献   

4.
深度学习单目深度估计研究进展   总被引:1,自引:0,他引:1       下载免费PDF全文
单目深度估计是从单幅图像中获取场景深度信息的重要技术,在智能汽车和机器人定位等领域应用广泛,具有重要的研究价值。随着深度学习技术的发展,涌现出许多基于深度学习的单目深度估计研究,单目深度估计性能也取得了很大进展。本文按照单目深度估计模型采用的训练数据的类型,从3个方面综述了近年来基于深度学习的单目深度估计方法:基于单图像训练的模型、基于多图像训练的模型和基于辅助信息优化训练的单目深度估计模型。同时,本文在综述了单目深度估计研究常用数据集和性能指标基础上,对经典的单目深度估计模型进行了性能比较分析。以单幅图像作为训练数据的模型具有网络结构简单的特点,但泛化性能较差。采用多图像训练的深度估计网络有更强的泛化性,但网络的参数量大、网络收敛速度慢、训练耗时长。引入辅助信息的深度估计网络的深度估计精度得到了进一步提升,但辅助信息的引入会造成网络结构复杂、收敛速度慢等问题。单目深度估计研究还存在许多的难题和挑战。利用多图像输入中包含的潜在信息和特定领域的约束信息,来提高单目深度估计的性能,逐渐成为了单目深度估计研究的趋势。  相似文献   

5.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

6.
Estimation of object motion parameters from noisy images   总被引:2,自引:0,他引:2  
An approach is presented for the estimation of object motion parameters based on a sequence of noisy images. The problem considered is that of a rigid body undergoing unknown rotational and translational motion. The measurement data consists of a sequence of noisy image coordinates of two or more object correspondence points. By modeling the object dynamics as a function of time, estimates of the model parameters (including motion parameters) can be extracted from the data using recursive and/or batch techniques. This permits a desired degree of smoothing to be achieved through the use of an arbitrarily large number of images. Some assumptions regarding object structure are presently made. Results are presented for a recursive estimation procedure: the case considered here is that of a sequence of one dimensional images of a two dimensional object. Thus, the object moves in one transverse dimension, and in depth, preserving the fundamental ambiguity of the central projection image model (loss of depth information). An iterated extended Kalman filter is used for the recursive solution. Noise levels of 5-10 percent of the object image size are used. Approximate Cramer-Rao lower bounds are derived for the model parameter estimates as a function of object trajectory and noise level. This approach may be of use in situations where it is difficult to resolve large numbers of object match points, but relatively long sequences of images (10 to 20 or more) are available.  相似文献   

7.
基于单视频图像序列的人体三维姿态重建   总被引:1,自引:0,他引:1  
提出了至少存在一个深度值已知点的约束条件下,基于单视频图像序列重建人体三维姿态的方法.利用已知间距的平面点阵来标定获得摄像机参数,在透视投影模型下,根据单视频图像序列中人体关节点的二维数据,重建其三维信息.并将人体运动序列按照运动突变点划分为若干子序列,有效消除了二义性的干扰,较为精确的实现了人体三维姿态的重建.给出了该方法的实验过程及计算结果,验证了该算法的可行性和精度.  相似文献   

8.
基于DSP/FPGA的嵌入式实时目标跟踪系统   总被引:1,自引:1,他引:1  
田茜  何鑫 《计算机工程》2005,31(15):219-221
提出了一套基于DSP/FPGA的协处理器结构用以实现实时目标跟踪的嵌入式视觉系统。系统由DSP作为主处理器进行全局控制,利用具有流水线并行处理结构的FPGA作为协处理器实时完成DSP分配的处理任务。系统由FPGA快速完成最初的运动估计的结果,DSP在此基础上进一步分析和校正,并将校正信息反馈给FPGA,实现快速而准确的跟踪。  相似文献   

9.
The literature on recursive estimation of structure and motion from monocular image sequences comprises a large number of apparently unrelated models and estimation techniques. We propose a framework that allows us to derive and compare all models by following the idea of dynamical system reduction. The “natural” dynamic model, derived from the rigidity constraint and the projection model, is first reduced by explicitly decoupling structure (depth) from motion. Then, implicit decoupling techniques are explored, which consist of imposing that some function of the unknown parameters is held constant. By appropriately choosing such a function, not only can we account for models seen so far in the literature, but we can also derive novel ones  相似文献   

10.
In this paper, it is introduced an interactive method to object segmentation in image sequences, by combining classical morphological segmentation with motion estimation – the watershed from propagated markers. In this method, the objects are segmented interactively in the first frame and the mask generated by its segmentation provides the markers that will be used to track and segment the object in the next frame. Besides the interactivity, the proposed method has the following important characteristics: generality, rapid response and progressive manual edition. This paper also introduces a new benchmark to do quantitative evaluation of assisted object segmentation methods applied to image sequences. The evaluation is done according to several criteria such as the robustness of segmentation and the easiness to segment the objects through the sequence.  相似文献   

11.
This paper is concerned with three-dimensional (3D) analysis, and analysis-guided syntheses, of images showing 3-D motion of an observer relative to a scene. There are two objectives of the paper. First, it presents an approach to recovering 3D motion and structure parameters from multiple cues present in a monocular image sequence, such as point features, optical flow, regions, lines, texture gradient, and vanishing line. Second, it introduces the notion that the cues that contribute the most to 3-D interpretation are also the ones that would yield the most realistic synthesis, thus suggesting an approach to analysis guided 3-D representation. For concreteness, the paper focuses on flight image sequences of a planar, textured surface. The integration of information in these diverse cues is carried out using optimization. For reliable estimation, a sequential batch method is used to compute motion and structure. Synthesis is done by using (i) image attributes extracted from the image sequence, and (ii) simple, artificial image attributes which are not present in the original images. For display, real and/or artificial attributes are shown as a monocular or a binocular sequence. Performance evaluation is done through experiments with one synthetic sequence, and two real image sequences digitized from a commercially available video tape and a laserdisc. The attribute based representation of these sequences compressed their sizes by 502 and 367. The visualization sequence appears very similar to the original sequence in informal, monocular as well as stereo viewing on a workstation monitor  相似文献   

12.
A method for estimating mobile robot ego‐motion is presented, which relies on tracking contours in real‐time images acquired with a calibrated monocular video system. After fitting an active contour to an object in the image, 3D motion is derived from the affine deformations suffered by the contour in an image sequence. More than one object can be tracked at the same time, yielding some different pose estimations. Then, improvements in pose determination are achieved by fusing all these different estimations. Inertial information is used to obtain better estimates, as it introduces in the tracking algorithm a measure of the real velocity. Inertial information is also used to eliminate some ambiguities arising from the use of a monocular image sequence. As the algorithms developed are intended to be used in real‐time control systems, considerations on computation costs are taken into account. © 2004 Wiley Periodicals, Inc.  相似文献   

13.
We propose a depth and image scene flow estimation method taking the input of a binocular video. The key component is motion-depth temporal consistency preservation, making computation in long sequences reliable. We tackle a number of fundamental technical issues, including connection establishment between motion and depth, structure consistency preservation in multiple frames, and long-range temporal constraint employment for error correction. We address all of them in a unified depth and scene flow estimation framework. Our main contributions include development of motion trajectories, which robustly link frame correspondences in a voting manner, rejection of depth/motion outliers through temporal robust regression, novel edge occurrence map estimation, and introduction of anisotropic smoothing priors for proper regularization.  相似文献   

14.
目前利用自监督单目深度估计方法对城市街道进行深度估计时,由于物体间存在遮挡和运动,导致估计的深度图结果模糊以及出现边界伪影。针对上述问题,通过对损失函数进行设计,提出了一种抗遮挡的单目深度估计方法。该方法采用最小化光度重投影函数,对目标图像前后帧中选择最小误差进行匹配,忽略掉损失较高的被遮挡像素,同时采用自动掩蔽损失来处理物体运动造成的边界伪影。在KITTI数据集上的对比实验结果表明,所提方法估计的深度图结果更加清晰,并能有效减少深度图中的边界伪影。  相似文献   

15.
Moving vehicles are detected and tracked automatically in monocular image sequences from road traffic scenes recorded by a stationary camera. In order to exploit the a priori knowledge about shape and motion of vehicles in traffic scenes, a parameterized vehicle model is used for an intraframe matching process and a recursive estimator based on a motion model is used for motion estimation. An interpretation cycle supports the intraframe matching process with a state MAP-update step. Initial model hypotheses are generated using an image segmentation component which clusters coherently moving image features into candidate representations of images of a moving vehicle. The inclusion of an illumination model allows taking shadow edges of the vehicle into account during the matching process. Only such an elaborate combination of various techniques has enabled us to track vehicles under complex illumination conditions and over long (over 400 frames) monocular image sequences. Results on various real-world road traffic scenes are presented and open problems as well as future work are outlined.  相似文献   

16.
We present an approach to jointly estimating camera motion and dense structure of a static scene in terms of depth maps from monocular image sequences in driver-assistance scenarios. At each instant of time, only two consecutive frames are processed as input data of a joint estimator that fully exploits second-order information of the corresponding optimization problem and effectively copes with the non-convexity due to both the imaging geometry and the manifold of motion parameters. Additionally, carefully designed Gaussian approximations enable probabilistic inference based on locally varying confidence and globally varying sensitivity due to the epipolar geometry, with respect to the high-dimensional depth map estimation. Embedding the resulting joint estimator in an online recursive framework achieves a pronounced spatio-temporal filtering effect and robustness. We evaluate hundreds of images taken from a car moving at speed up to 100 km/h and being part of a publicly available benchmark data set. The results compare favorably with two alternative settings: stereo based scene reconstruction and camera motion estimation in batch mode using multiple frames. They, however, require a calibrated camera pair or storage for more than two frames, which is less attractive from a technical viewpoint than the proposed monocular and recursive approach. In addition to real data, a synthetic sequence is considered which provides reliable ground truth.  相似文献   

17.
基于相对形变模型及正则化技术的人体运动估计   总被引:1,自引:0,他引:1       下载免费PDF全文
为了使根据人体行走的单目动态图象序列,对人体手臂及腿部的运动及结构参数进行估计的结果更为可信、更具鲁棒性,提出了一种基于相对形变模型及正则化技术的人体运动估计方法,该方法首先在物体中心坐标的运动表示方式下,通过在刚体运动模型中加入形变系数的方法给出了基于相对形变概念的非刚体运动模型;然后,根据这一非刚体运动模型进行正则化运动及结构参数的估计,再以正则化的形式融入人体运动的先验知识,使运动估计的结果更具鲁棒性,实验结果证明,该方法有效地反映了人体的非刚体运动模式,运动模型中所加入的相对形变系数也一定程度反映了人体的运动规律。  相似文献   

18.
In this paper, new techniques for deformed image motion estimation and compensation using variable-size block-matching are proposed, which can be applied to an image sequence compression system or a moving object recognition system. The motion estimation and compensation techniques have been successfully applied in the area of image sequence coding. Many research papers on improving the performance of these techniques have been published; many directions are proposed, which can all lead to better performance than the conventional techniques. Among them, both generalized block-matching and variable-size block-matching are successfully applied in reducing the data rate of compensation error and motion information, respectively. These two algorithms have their merits, but suffer from their drawbacks. Moreover, reducing the data rate in compensation error is sometimes increasing the data rate in motion information, or vice versa. Based on these two algorithms, we propose and examine several algorithms which are effective in reducing the data rate. We then incorporate these algorithms into a system, in which they work together to overcome the disadvantages to individual and keep their merits at the same time. The proposed system can optimally balance the amount of data rate in two aspects (i.e., compensation error and motion information). Experimental results show that the proposed system outweighs the conventional techniques. Since we propose a recovery operation which tries to recover the incorrect motion vectors from the global motion, this proposed system can also be applied to the moving object recognition in image sequences.  相似文献   

19.
在自动驾驶、机器人、数字城市以及虚拟/混合现实等应用的驱动下,三维视觉得到了广泛的关注。三维视觉研究主要围绕深度图像获取、视觉定位与制图、三维建模及三维理解等任务而展开。本文围绕上述三维视觉任务,对国内外研究进展进行了综合评述和对比分析。首先,针对深度图像获取任务,从非端到端立体匹配、端到端立体匹配及无监督立体匹配3个方面对立体匹配研究进展进行了回顾,从深度回归网络和深度补全网络两个方面对单目深度估计研究进展进行了回顾。其次,针对视觉定位与制图任务,从端到端视觉定位和非端到端视觉定位两个方面对大场景下的视觉定位研究进展进行了回顾,并从视觉同步定位与地图构建和融合其他传感器的同步定位与地图构建两个方面对同步定位与地图构建的研究进展进行了回顾。再次,针对三维建模任务,从深度三维表征学习、深度三维生成模型、结构化表征学习与生成模型以及基于深度学习的三维重建等4个方面对三维几何建模研究进展进行了回顾,并从多视RGB重建、单深度相机和多深度相机方法以及单视图RGB方法等3个方面对人体动态建模研究进展进行了回顾。最后,针对三维理解任务,从点云语义分割和点云实例分割两个方面对点云语义理解研究进展进行了回顾。在此基础上,给出了三维视觉研究的未来发展趋势,旨在为相关研究者提供参考。  相似文献   

20.
研究了移动机器人的视觉定位和目标的运动估计。采用单目视觉系统,借助人工标识物,由小孔成像模型及空间几何关系,推导出视觉测距模型,并实现了移动机器人的自定位和目标的定位。通过序列图像,应用基于特征的运动分析方法估计球体的运动参数,推导出移动机器人对运动目标的跟踪模型。球体定位实验结果表明:该方法的定位精度较高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号