首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper addresses the issue of optimal motion and structure estimation from monocular image sequences of a rigid scene. The new method has the following characteristics: (1) the dimension of the search space in the nonlinear optimization is drastically reduced by exploiting the relationship between structure and motion parameters; (2) the degree of reliability of the observations and estimates is effectively taken into account; (3) the proposed formulation allows arbitrary interframe motion; (4) the information about the structure of the scene, acquired from previous images, is systematically integrated into the new estimations; (5) the integration of multiple views using this method gives a large 2.5D visual map, much larger than that covered by any single view. It is shown also that the scale factor associated with any two consecutive images in a monocular sequence is determined by the scale factor of the first two images. Our simulation results and experiments with long image sequences of real world scenes indicate that the optimization method developed in this paper not only greatly reduces the computational complexity but also substantially improves the motion and structure estimates over those produced by the linear algorithms.  相似文献   

2.
This paper presents a 2D to 3D conversion scheme to generate a 3D human model using a single depth image with several color images. In building a complete 3D model, no prior knowledge such as a pre-computed scene structure and photometric and geometric calibrations is required since the depth camera can directly acquire the calibrated geometric and color information in real time. The proposed method deals with a self-occlusion problem which often occurs in images captured by a monocular camera. When an image is obtained from a fixed view, it may not have data for a certain part of an object due to occlusion. The proposed method consists of following steps to resolve this problem. First, the noise in a depth image is reduced by using a series of image processing techniques. Second, a 3D mesh surface is constructed using the proposed depth image-based modeling method. Third, the occlusion problem is resolved by removing the unwanted triangles in the occlusion region and filling the corresponding hole. Finally, textures are extracted and mapped to the 3D surface of the model to provide photo-realistic appearance. Comparison results with the related work demonstrate the efficiency of our method in terms of visual quality and computation time. It can be utilized in creating 3D human models in many 3D applications.  相似文献   

3.
We present an approach which exploits the coupling between human actions and scene geometry to use human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints on the scene. These constraints are then used to improve single-view 3D scene understanding approaches. The proposed method is validated on monocular time-lapse sequences from YouTube and still images of indoor scenes gathered from the Internet. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.  相似文献   

4.
针对大范围三维重建, 重建效率较低和重建稳定性、精度差等问题, 提出了一种基于场景图分割的大范围混合式多视图三维重建方法.该方法首先使用多层次加权核K均值算法进行场景图分割; 然后,分别对每个子场景图进行混合式重建, 生成对应的子模型, 通过场景图分割、混合式重建和局部优化等方法提高重建效率、降低计算资源消耗, 并综合采用强化的最佳影像选择标准、稳健的三角测量方法和迭代优化等策略, 提高重建精度和稳健性; 最后, 对所有子模型进行合并, 完成大范围三维重建.分别使用互联网收集数据和无人机航拍数据进行了验证, 并与1DSFM、HSFM算法在计算精度和计算效率等方面进行了比较.实验结果表明, 本文算法大大提高了计算效率、计算精度, 能充分保证重建模型的完整性, 并具备单机大范围场景三维重建能力.  相似文献   

5.
为构建基于单目视觉的快速便捷式三维扫描系统,提出一种高精度的单目视觉几何投影的场景建模方法,并开发一种低成本高精度的三维扫描系统。首先,获取平面标定点的图像坐标,经投影变换将其转换为摄像坐标系下的三维空间位置坐标,分别建立平移台面和底座的三维空间平面方程;其次,通过移动平移台面求取同名标定点的空间坐标,求解平移台面的平移向量,并通过落在平移台面和底座上的激光线条求解激光平面;最后,提取图像中的激光光条中心点并将其变换为物体表面的三维点云数据。实验结果表明,投影变换求得的平面方程误差小于0.2%,扫描结果误差低于0.05mm。  相似文献   

6.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

7.
StOMP algorithm is well suited to large-scale underdetermined applications in sparse vector estimations. It can reduce computation complexity and has some attractive asymptotical statistical properties.However,the estimation speed is at the cost of accuracy violation. This paper suggests an improvement on the StOMP algorithm that is more efficient in finding a sparse solution to the large-scale underdetermined problems. Also,compared with StOMP,this modified algorithm can not only more accurately estimate parameters for the distribution of matched filter coefficients,but also improve estimation accuracy for the sparse vector itself. Theoretical success boundary is provided based on a large-system limit for approximate recovery of sparse vector by modified algorithm,which validates that the modified algorithm is more efficient than StOMP. Actual computations with simulated data show that without significant increment in computation time,the proposed algorithm can greatly improve the estimation accuracy.  相似文献   

8.
Achieving convincing visual consistency between virtual objects and a real scene mainly relies on the lighting effects of virtual-real composition scenes. The problem becomes more challenging in lighting virtual objects in a single real image. Recently,scene understanding from a single image has made great progress. The estimated geometry,semantic labels and intrinsic components provide mostly coarse information,and are not accurate enough to re-render the whole scene. However,carefully integrating the estimated coarse information can lead to an estimate of the illumination parameters of the real scene. We present a novel method that uses the coarse information estimated by current scene understanding technology to estimate the parameters of a ray-based illumination model to light virtual objects in a real scene. Our key idea is to estimate the illumination via a sparse set of small 3D surfaces using normal and semantic constraints. The coarse shading image obtained by intrinsic image decomposition is considered as the irradiance of the selected small surfaces. The virtual objects are illuminated by the estimated illumination parameters. Experimental results show that our method can convincingly light virtual objects in a single real image,without any pre-recorded 3D geometry,reflectance,illumination acquisition equipment or imaging information of the image.  相似文献   

9.
Image fusion is a process that multiple images of a scene are combined to form a single image. The aim of image fusion is to preserve the full content and retain important features of each original image. In this paper, we propose a novel approach based on wavelet transform to capture and fusion of real-world rough surface textures, which are commonly used in multimedia applications and referred to as3D surface texture. These textures are different from 2D textures as their appearances can vary dramatically with different illumination conditions due to complex surface geometry and reflectance properties. In our approach, we first extract gradient/height and albedo maps from sample 3D surface texture images as their representation. Then we measure saliency of wavelet coefficients of these 3D surface texture representations. The saliency values reflect the meaningful content of the wavelet coefficients and are consistent with human visual perception. Finally we fuse the gradient/height and albedo maps based on the measured saliency values. This novel scheme aims to preserve the original texture patterns together with geometry and reflectance characteristics from input images. Experimental results show that the proposed approach can not only capture and fuse 3D surface texture under arbitrary illumination directions, but also has the ability to retain the surface geometry properties and preserve perceptual features in the original images.  相似文献   

10.
王伟  任国恒  陈立勇  张效尉 《自动化学报》2019,45(11):2187-2198
在基于图像的城市场景三维重建中,场景分段平面重建算法可以克服场景中的弱纹理、光照变化等因素的影响而快速恢复场景完整的近似结构.然而,在初始空间点较为稀疏、候选平面集不完备、图像过分割质量较低等问题存在时,可靠性往往较低.为了解决此问题,本文根据城市场景的结构特征构造了一种新颖的融合场景结构先验、空间点可见性与颜色相似性的平面可靠性度量,然后采用图像区域与相应平面协同优化的方式对场景结构进行了推断.实验结果表明,本文算法利用稀疏空间点即可有效重建出完整的场景结构,整体上具有较高的精度与效率.  相似文献   

11.
Generating 3D models of objects from video sequences is an important problem in many multimedia applications ranging from teleconferencing to virtual reality. In this paper, we present a method of estimating the 3D face model from a monocular image sequence, using a few standard results from the affine camera geometry literature in computer vision, and spline fitting techniques using a modified non parametric regression technique. We use the bicubic spline functions to model the depth map, given a set of observation depth maps computed from frame pairs in a video sequence. The minimal number of splines are chosen on the basis of the Schwartz's Criterion. We extend the spline fitting algorithm to hierarchical splines. Note that the camera calibration parameters and the prior knowledge of the object shape is not required by the algorithm. The system has been successfully demonstrated to extract 3D face structure of humans as well as other objects, starting from their image sequences.  相似文献   

12.
针对单目3D目标检测网络训练约束少、模型预测精度低的问题,通过网络结构改进、透视投影约束建立以及损失函数优化等步骤,提出了一种基于透视投影的单目3D目标检测网络.首先,在透视投影机理的基础上,利用世界、相机以及目标三者之间的变换关系,建立一种利用消失点(VP)求解3D目标边界框的模型;其次,运用空间几何关系和先验尺寸信息,将其简化为方位角、目标尺寸与3D边界框的约束关系;最后,根据尺寸约束的单峰、易回归优势,进一步提出一种学习型的方位角—尺寸的损失函数,提高了网络的学习效率和预测精度.在模型训练中,针对单目3D目标检测网络未约束3D中心的缺陷,基于3D边界框和2D边框的空间几何关系,提出联合约束方位角、尺寸、3D中心的训练策略.在KITTI和SUN-RGBD数据集上进行实验验证,结果显示本文算法能获得更准确的目标检测结果,表明在3D目标检测方面该方法比其他算法更有效.  相似文献   

13.
Putting Objects in Perspective   总被引:2,自引:0,他引:2  
Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface orientations, and camera viewpoint. Most object detection methods consider all scales and locations in the image as equally likely. We show that with probabilistic estimates of 3D geometry, both in terms of surfaces and world coordinates, we can put objects into perspective and model the scale and location variance in the image. Our approach reflects the cyclical nature of the problem by allowing probabilistic object hypotheses to refine geometry and vice-versa. Our framework allows painless substitution of almost any object detector and is easily extended to include other aspects of image understanding. Our results confirm the benefits of our integrated approach.  相似文献   

14.
The wide availability of affordable RGB-D sensors changes the landscape of indoor scene analysis. Years of research on simultaneous localization and mapping (SLAM) have made it possible to merge multiple RGB-D images into a single point cloud and provide a 3D model for a complete indoor scene. However, these reconstructed models only have geometry information, not including semantic knowledge. The advancements in robot autonomy and capabilities for carrying out more complex tasks in unstructured environments can be greatly enhanced by endowing environment models with semantic knowledge. Towards this goal, we propose a novel approach to generate 3D semantic maps for an indoor scene. Our approach creates a 3D reconstructed map from a RGB-D image sequence firstly, then we jointly infer the semantic object category and structural class for each point of the global map. 12 object categories (e.g. walls, tables, chairs) and 4 structural classes (ground, structure, furniture and props) are labeled in the global map. In this way, we can totally understand both the object and structure information. In order to get semantic information, we compute semantic segmentation for each RGB-D image and merge the labeling results by a Dense Conditional Random Field. Different from previous techniques, we use temporal information and higher-order cliques to enforce the label consistency for each image labeling result. Our experiments demonstrate that temporal information and higher-order cliques are significant for the semantic mapping procedure and can improve the precision of the semantic mapping results.  相似文献   

15.
Tracking both structure and motion of nonrigid objects from monocular images is an important problem in vision. In this paper, a hierarchical method which integrates local analysis (that recovers small details) and global analysis (that appropriately limits possible nonrigid behaviors) is developed to recover dense depth values and nonrigid motion from a sequence of 2D satellite cloud images without any prior knowledge of point correspondences. This problem is challenging not only due to the absence of correspondence information but also due to the lack of depth cues in the 2D cloud images (scaled orthographic projection). In our method, the cloud images are segmented into several small regions and local analysis is performed for each region. A recursive algorithm is proposed to integrate local analysis with appropriate global fluid model constraints, based on which a structure and motion analysis system, SMAS, is developed. We believe that this is the first reported system in estimating dense structure and nonrigid motion under scaled orthographic views using fluid model constraints. Experiments on cloud image sequences captured by meteorological satellites (GOES-8 and GOES-9) have been performed using our system, along with their validation and analyses. Both structure and 3D motion correspondences are estimated to subpixel accuracy. Our results are very encouraging and have many potential applications in earth and space sciences, especially in cloud models for weather prediction  相似文献   

16.
在室内单目视觉导航任务中,场景的深度信息十分重要.但单目深度估计是一个不适定问题,精度较低.目前, 2D激光雷达在室内导航任务中应用广泛,且价格低廉.因此,本文提出一种融合2D激光雷达的室内单目深度估计算法来提高深度估计精度.本文在编解码结构上增加了2D激光雷达的特征提取,通过跳跃连接增加单目深度估计结果的细节信息,并提出一种运用通道注意力机制融合2D激光雷达特征和RGB图像特征的方法.本文在公开数据集NYUDv2上对算法进行验证,并针对本文算法的应用场景,制作了带有2D激光雷达数据的深度数据集.实验表明,本文提出的算法在公开数据集和自制数据集中均优于现有的单目深度估计.  相似文献   

17.
目的 深度信息的获取是3维重建、虚拟现实等应用的关键技术,基于单目视觉的深度信息获取是非接触式3维测量技术中成本最低、也是技术难度最大的手段。传统的单目方法多基于线性透视、纹理梯度、运动视差、聚焦散焦等深度线索来对深度信息进行求取,计算量大,对相机精度要求高,应用场景受限,本文基于固定光强的点光源在场景中的移动所带来的物体表面亮度的变化,提出一种简单快捷的单目深度提取方法。方法 首先根据体表面反射模型,得到光源照射下的物体表面的辐亮度,然后结合光度立体学推导物体表面辐亮度与摄像机图像亮度之间的关系,在得到此关系式后,设计实验,依据点光源移动所带来的图像亮度的变化对深度信息进行求解。结果 该算法在简单场景和一些日常场景下均取得了较好的恢复效果,深度估计值与实际深度值之间的误差小于10%。结论 本文方法通过光源移动带来的图像亮度变化估计深度信息,避免了复杂的相机标定过程,计算复杂度小,是一种全新的场景深度信息获取方法。  相似文献   

18.
A lattice-based MRF model for dynamic near-regular texture tracking   总被引:1,自引:0,他引:1  
A near-regular texture (NRT) is a geometric and photometric deformation from its regular origin - a congruent wallpaper pattern formed by 2D translations of a single tile. A dynamic NRT is an NRT under motion. Although NRTs are pervasive in man-made and natural environments, effective computational algorithms for NRTs are few. This paper addresses specific computational challenges in modeling and tracking dynamic NRTs, including ambiguous correspondences, occlusions, and drastic illumination and appearance variations. We propose a lattice-based Markov-random-field (MRF) model for dynamic NRTs in a 3D spatiotemporal space. Our model consists of a global lattice structure that characterizes the topological constraint among multiple textons and an image observation model that handles local geometry and appearance variations. Based on the proposed MRF model, we develop a tracking algorithm that utilizes belief propagation and particle filtering to effectively handle the special challenges of the dynamic NRT tracking without any assumption on the motion types or lighting conditions. We provide quantitative evaluations of the proposed method against existing tracking algorithms and demonstrate its applications in video editing  相似文献   

19.
Generating large-scale and high-quality 3D scene reconstruction from monocular images is an essential technical foundation in augmented reality and robotics. However, the apparent shortcomings (e.g., scale ambiguity, dense depth estimation in texture-less areas) make applying monocular 3D reconstruction to real-world practice challenging. In this work, we combine the advantage of deep learning and multi-view geometry to propose RGB-Fusion, which effectively solves the inherent limitations of traditional monocular reconstruction. To eliminate the confinements of tracking accuracy imposed by the prediction deficiency of neural networks, we propose integrating the PnP (Perspective-n-Point) algorithm into the tracking module. We employ 3D ICP (Iterative Closest Point) matching and 2D feature matching to construct separate error terms and jointly optimize them, reducing the dependence on the accuracy of depth prediction and improving pose estimation accuracy. The approximate pose predicted by the neural network is employed as the initial optimization value to avoid the trapping of local minimums. We formulate a depth map refinement strategy based on the uncertainty of the depth value, which can naturally lead to a refined depth map. Through our method, low-uncertainty elements can significantly update the current depth value while avoiding high-uncertainty elements from adversely affecting depth estimation accuracy. Numerical qualitative and quantitative evaluation results of tracking, depth prediction, and 3D reconstruction show that RGB-Fusion exceeds most monocular 3D reconstruction systems.  相似文献   

20.
A geometric approach to shape from defocus   总被引:3,自引:0,他引:3  
We introduce a novel approach to shape from defocus, i.e., the problem of inferring the three-dimensional (3D) geometry of a scene from a collection of defocused images. Typically, in shape from defocus, the task of extracting geometry also requires deblurring the given images. A common approach to bypass this task relies on approximating the scene locally by a plane parallel to the image (the so-called equifocal assumption). We show that this approximation is indeed not necessary, as one can estimate 3D geometry while avoiding deblurring without strong assumptions on the scene. Solving the problem of shape from defocus requires modeling how light interacts with the optics before reaching the imaging surface. This interaction is described by the so-called point spread function (PSF). When the form of the PSF is known, we propose an optimal method to infer 3D geometry from defocused images that involves computing orthogonal operators which are regularized via functional singular value decomposition. When the form of the PSF is unknown, we propose a simple and efficient method that first learns a set of projection operators from blurred images and then uses these operators to estimate the 3D geometry of the scene from novel blurred images. Our experiments on both real and synthetic images show that the performance of the algorithm is relatively insensitive to the form of the PSF Our general approach is to minimize the Euclidean norm of the difference between the estimated images and the observed images. The method is geometric in that we reduce the minimization to performing projections onto linear subspaces, by using inner product structures on both infinite and finite-dimensional Hilbert spaces. Both proposed algorithms involve only simple matrix-vector multiplications which can be implemented in real-time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号