首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Local occlusion cue has been successfully exploited to infer depth ordering from monocular image. However, due to uncertainty of occluded relations, inconsistent results frequently arise, especially for the image of complex scenarios. We propose a depth propagation mechanism which incorporates local occlusion and global ground cues together in the way of probabilistic-to-energetic Bayesian framework. By maximizing posterior namely minimizing energy of latent relative depth variables with well-defined pairwise occlusion priori, we recover correct depth ordering in monocular setting. Our model can guarantee the consistency of relative depth labeling in automatically constructed topological graph via transferring more confident aligned multi-depth cues amongst different segments. Experiments demonstrate that more reasonable and accurate outcomes can be achieved by our depth propagation mechanism and they are also superior to common-used occlusion-based approaches in complex nature.

  相似文献   

2.
融合对象性和视觉显著度的单目图像2D转3D   总被引:1,自引:0,他引:1       下载免费PDF全文
受对象性测度和视觉显著度的启发,提出一种适用于单目图像2D转3D的对象窗深度中心环绕分布假设,给出融合对象性测度和视觉显著度的单目图像深度估计算法。首先计算图像的视觉显著度并将其映射成深度;其次在图像上随机采样若干个窗,并计算这些窗的对象性测度;再次,定义一个能量函数用于度量深度和对象性测度对彼此的影响程度,并通过迭代优化的方法改进深度和对象性测度的估计结果;最后,根据深度信息进行3D视频合成。实验结果表明,融入对象性测度信息后,显著改进了基于视觉显著度2D转3D的深度估计质量,保证了估计深度在对象边界处的不连续过渡和其他区域的平滑过渡。  相似文献   

3.
Tracking both structure and motion of nonrigid objects from monocular images is an important problem in vision. In this paper, a hierarchical method which integrates local analysis (that recovers small details) and global analysis (that appropriately limits possible nonrigid behaviors) is developed to recover dense depth values and nonrigid motion from a sequence of 2D satellite cloud images without any prior knowledge of point correspondences. This problem is challenging not only due to the absence of correspondence information but also due to the lack of depth cues in the 2D cloud images (scaled orthographic projection). In our method, the cloud images are segmented into several small regions and local analysis is performed for each region. A recursive algorithm is proposed to integrate local analysis with appropriate global fluid model constraints, based on which a structure and motion analysis system, SMAS, is developed. We believe that this is the first reported system in estimating dense structure and nonrigid motion under scaled orthographic views using fluid model constraints. Experiments on cloud image sequences captured by meteorological satellites (GOES-8 and GOES-9) have been performed using our system, along with their validation and analyses. Both structure and 3D motion correspondences are estimated to subpixel accuracy. Our results are very encouraging and have many potential applications in earth and space sciences, especially in cloud models for weather prediction  相似文献   

4.
深度学习单目深度估计研究进展   总被引:1,自引:0,他引:1       下载免费PDF全文
单目深度估计是从单幅图像中获取场景深度信息的重要技术,在智能汽车和机器人定位等领域应用广泛,具有重要的研究价值。随着深度学习技术的发展,涌现出许多基于深度学习的单目深度估计研究,单目深度估计性能也取得了很大进展。本文按照单目深度估计模型采用的训练数据的类型,从3个方面综述了近年来基于深度学习的单目深度估计方法:基于单图像训练的模型、基于多图像训练的模型和基于辅助信息优化训练的单目深度估计模型。同时,本文在综述了单目深度估计研究常用数据集和性能指标基础上,对经典的单目深度估计模型进行了性能比较分析。以单幅图像作为训练数据的模型具有网络结构简单的特点,但泛化性能较差。采用多图像训练的深度估计网络有更强的泛化性,但网络的参数量大、网络收敛速度慢、训练耗时长。引入辅助信息的深度估计网络的深度估计精度得到了进一步提升,但辅助信息的引入会造成网络结构复杂、收敛速度慢等问题。单目深度估计研究还存在许多的难题和挑战。利用多图像输入中包含的潜在信息和特定领域的约束信息,来提高单目深度估计的性能,逐渐成为了单目深度估计研究的趋势。  相似文献   

5.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

6.
We propose an edge-based method for 6DOF pose tracking of rigid objects using a monocular RGB camera. One of the critical problem for edge-based methods is to search the object contour points in the image corresponding to the known 3D model points. However, previous methods often produce false object contour points in case of cluttered backgrounds and partial occlusions. In this paper, we propose a novel edge-based 3D objects tracking method to tackle this problem. To search the object contour points, foreground and background clutter points are first filtered out using edge color cue, then object contour points are searched by maximizing their edge confidence which combines edge color and distance cues. Furthermore, the edge confidence is integrated into the edge-based energy function to reduce the influence of false contour points caused by cluttered backgrounds and partial occlusions. We also extend our method to multi-object tracking which can handle mutual occlusions. We compare our method with the recent state-of-art methods on challenging public datasets. Experiments demonstrate that our method improves robustness and accuracy against cluttered backgrounds and partial occlusions.  相似文献   

7.
Shape recovery from a monocular image is addressed. It is often said that the information conveyed by an image is insufficient to reconstruct 3D shapes of objects in the image. This implies that shape recovery from an image necessitates the use of additional plausible constraints on typical structures and features of the objects in an ordinary scene. We propose a hypothesization and verification method for 3D shape recovery based on geometrical constraints peculiar to man-made objects. The objective is to increase the robustness of computer vision systems. One difficulty with this method lies in the mutual dependency between proper assignment of constraints to the regions in a given image and recovery of a consistent 3D shape. A concurrent mechanism has been implemented which is based on energy minimization using a parallel network for relaxation. This mechanism is capable of maintaining consistency between constraint assignment and shape recovery.  相似文献   

8.

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

  相似文献   

9.
场景的深度估计问题是计算机视觉领域中的经典问题之一,也是3维重建和图像合成等应用中的一个重要环节。基于深度学习的单目深度估计技术高速发展,各种网络结构相继提出。本文对基于深度学习的单目深度估计技术最新进展进行了综述,回顾了基于监督学习和基于无监督学习方法的发展历程。重点关注单目深度估计的优化思路及其在深度学习网络结构中的表现,将监督学习方法分为多尺度特征融合的方法、结合条件随机场(conditional random field,CRF)的方法、基于序数关系的方法、结合多元图像信息的方法和其他方法等5类;将无监督学习方法分为基于立体视觉的方法、基于运动恢复结构(structure from motion,SfM)的方法、结合对抗性网络的方法、基于序数关系的方法和结合不确定性的方法等5类。此外,还介绍了单目深度估计任务中常用的数据集和评价指标,并对目前基于深度学习的单目深度估计技术在精确度、泛化性、应用场景和无监督网络中不确定性研究等方面的现状和面临的挑战进行了讨论,为相关领域的研究人员提供一个比较全面的参考。  相似文献   

10.
场景理解是智能自主机器人领域的一个重要研究方向,而图像分割是场景理解的基础.但是,不完备的训练数据集,以及真实环境中的罕见情形,会导致在图像分割时存在先验知识不完备的情况,进而影响图像分割的效果.因此,提出在彩色深度(RGB–D)图像上使用抽象的支撑语义关系来解决多样的物体形态所面对的先验知识不完备问题.在先验知识不完备情况下,针对自底向上的图像分割过程中被过度分割出的物体块,首先对物体块间的支撑语义关系进行建模并计算其支撑概率,然后构造能够度量场景总体稳定性的能量函数,最后通过Swendsen-Wang割(SWC)随机图分割算法最小化该能量函数的值,将物体块间的支撑概率转化为强支撑语义关系并完成物体块合并,实现先验知识不完备情况下的图像分割.实验结果证明,结合支撑语义关系的图像分割能够在先验知识不完备的情况下,将同一物体被过度分割的部分重新合并起来,从而提升了图像分割的准确性.  相似文献   

11.
This paper presents a computational model to recover the most likely interpretation of the 3D scene structure from a planar image, where some objects may occlude others. The estimated scene interpretation is obtained by integrating some global and local cues and provides both the complete disoccluded objects that form the scene and their ordering according to depth. Our method first computes several distal scenes which are compatible with the proximal planar image. To compute these different hypothesized scenes, we propose a perceptually inspired object disocclusion method, which works by minimizing the Euler’s elastica as well as by incorporating the relatability of partially occluded contours and the convexity of the disoccluded objects. Then, to estimate the preferred scene, we rely on a Bayesian model and define probabilities taking into account the global complexity of the objects in the hypothesized scenes as well as the effort of bringing these objects in their relative position in the planar image, which is also measured by an Euler’s elastica-based quantity. The model is illustrated with numerical experiments on, both, synthetic and real images showing the ability of our model to reconstruct the occluded objects and the preferred perceptual order among them. We also present results on images of the Berkeley dataset with provided figure-ground ground-truth labeling.  相似文献   

12.
In this paper, we propose a novel stereo method for registering foreground objects in a pair of thermal and visible videos of close-range scenes. In our stereo matching, we use Local Self-Similarity (LSS) as similarity metric between thermal and visible images. In order to accurately assign disparities to depth discontinuities and occluded Region Of Interest (ROI), we have integrated color and motion cues as soft constraints in an energy minimization framework. The optimal disparity map is approximated for image ROIs using a Belief Propagation (BP) algorithm. We tested our registration method on several challenging close-range indoor video frames of multiple people at different depths, with different clothing, and different poses. We show that our global optimization algorithm significantly outperforms the existing state-of-the art method, especially for disparity assignment of occluded people at different depth in close-range surveillance scenes and for relatively large camera baseline.  相似文献   

13.
在室内单目视觉导航任务中,场景的深度信息十分重要.但单目深度估计是一个不适定问题,精度较低.目前, 2D激光雷达在室内导航任务中应用广泛,且价格低廉.因此,本文提出一种融合2D激光雷达的室内单目深度估计算法来提高深度估计精度.本文在编解码结构上增加了2D激光雷达的特征提取,通过跳跃连接增加单目深度估计结果的细节信息,并提出一种运用通道注意力机制融合2D激光雷达特征和RGB图像特征的方法.本文在公开数据集NYUDv2上对算法进行验证,并针对本文算法的应用场景,制作了带有2D激光雷达数据的深度数据集.实验表明,本文提出的算法在公开数据集和自制数据集中均优于现有的单目深度估计.  相似文献   

14.
Generating large-scale and high-quality 3D scene reconstruction from monocular images is an essential technical foundation in augmented reality and robotics. However, the apparent shortcomings (e.g., scale ambiguity, dense depth estimation in texture-less areas) make applying monocular 3D reconstruction to real-world practice challenging. In this work, we combine the advantage of deep learning and multi-view geometry to propose RGB-Fusion, which effectively solves the inherent limitations of traditional monocular reconstruction. To eliminate the confinements of tracking accuracy imposed by the prediction deficiency of neural networks, we propose integrating the PnP (Perspective-n-Point) algorithm into the tracking module. We employ 3D ICP (Iterative Closest Point) matching and 2D feature matching to construct separate error terms and jointly optimize them, reducing the dependence on the accuracy of depth prediction and improving pose estimation accuracy. The approximate pose predicted by the neural network is employed as the initial optimization value to avoid the trapping of local minimums. We formulate a depth map refinement strategy based on the uncertainty of the depth value, which can naturally lead to a refined depth map. Through our method, low-uncertainty elements can significantly update the current depth value while avoiding high-uncertainty elements from adversely affecting depth estimation accuracy. Numerical qualitative and quantitative evaluation results of tracking, depth prediction, and 3D reconstruction show that RGB-Fusion exceeds most monocular 3D reconstruction systems.  相似文献   

15.
Semantic image segmentation aims to partition an image into non-overlapping regions and assign a pre-defined object class label to each region. In this paper, a semantic method combining low-level features and high-level contextual cues is proposed to segment natural scene images. The proposed method first takes the gist representation of an image as its global feature. The image is then over-segmented into many super-pixels and histogram representations of these super-pixels are used as local features. In addition, co-occurrence and spatial layout relations among object classes are exploited as contextual cues. Finally the features and cues are integrated into the inference framework based on conditional random field by defining specific potential terms and introducing weighting functions. The proposed method has been compared with state-of-the-art methods on the MSRC database, and the experimental results show its effectiveness.  相似文献   

16.
Presents a solution to a particular curve (surface) fitting problem and demonstrate its application in modeling objects from monocular image sequences. The curve-fitting algorithm is based on a modified nonparametric regression method, which forms the core contribution of this work. This method is far more effective compared to standard estimation techniques, such as the maximum likelihood estimation method, and can take into account the discontinuities present in the curve. Next, the theoretical results of this 1D curve estimation technique ate extended significantly for an object modeling problem. The input to the algorithm is a monocular image sequence of an object undergoing rigid motion. By using the affine camera projection geometry and a given choice of an image frame pair in the sequence, we adopt the KvD (Koenderink and van Doorn, 1991) model to express the depth at each point on the object as a function of the unknown out-of-plane rotation, and some measurable quantities computed directly from the optical flow. This is repeated for multiple image pairs (keeping one fixed image frame which we formally call the base image and choosing another frame from the sequence). The depth map is next estimated from these equations using the modified nonparametric regression analysis. We conducted experiments on various image sequences to verify the effectiveness of the technique. The results obtained using our curve-fitting technique can be refined further by hierarchical techniques, as well as by nonlinear optimization techniques in structure from motion  相似文献   

17.
余航  焦李成  刘芳 《自动化学报》2014,40(1):100-116
基于聚类的分割算法能够有效地分析目标特征在特征域的分布结构,进而准确判断目标的所属类别,但难以利用图像的空间和边缘信息,而基于区域增长的分割算法能够在空间域利用多种图像信息计算目标之间的相似性,但缺乏对特征结构本身的深层挖掘,容易出现欠分割或过分割的结果. 本文结合这两种算法各自的优势,针对合成孔径雷达(Synthetic aperture radar,SAR)图像的特点,提出了一种基于上下文分析的无监督分层迭代算法. 该算法使用过分割区域作为操作单元,以提高分割速度,降低SAR图像相干斑噪声的影响. 在合并过分割区域时,该算法采用了分层迭代的策略:首先,设计了一种改进的模糊C均值聚类算法,对过分割区域的外观特征进行聚类分析,获得其类别标记,该类别标记包含了特征的分布结构信息. 然后,利用多种SAR图像特征对同类区域的空域上下文进行分析,使用区域迭代增长算法对全局范围内的相似区域进行合并,直到不存在满足合并条件的过分割区域对为止,再重新执行聚类算法. 这两种子算法分层交替迭代,扬长避短,实现了一种有效的方法来组织和利用多种信息对SAR图像进行分割. 对模拟和真实SAR图像的实验表明,本文提出的算法能够在区域一致性和细节保留之间做到很好的平衡,准确地分割出各类目标区域,对相干斑噪声具有很强的鲁棒性.  相似文献   

18.
Pan  Baiyu  Zhang  Liming  Yin  Hanxiong  Lan  Jun  Cao  Feilong 《Multimedia Tools and Applications》2021,80(13):19179-19201

3D movies/videos have become increasingly popular in the market; however, they are usually produced by professionals. This paper presents a new technique for the automatic conversion of 2D to 3D video based on RGB-D sensors, which can be easily conducted by ordinary users. To generate a 3D image, one approach is to combine the original 2D color image and its corresponding depth map together to perform depth image-based rendering (DIBR). An RGB-D sensor is one of the inexpensive ways to capture an image and its corresponding depth map. The quality of the depth map and the DIBR algorithm are crucial to this process. Our approach is twofold. First, the depth maps captured directly by RGB-D sensors are generally of poor quality because there are many regions missing depth information, especially near the edges of objects. This paper proposes a new RGB-D sensor based depth map inpainting method that divides the regions with missing depths into interior holes and border holes. Different schemes are used to inpaint the different types of holes. Second, an improved hole filling approach for DIBR is proposed to synthesize the 3D images by using the corresponding color images and the inpainted depth maps. Extensive experiments were conducted on different evaluation datasets. The results show the effectiveness of our method.

  相似文献   

19.
为了提取彩色图像中线目标,该文提出了一种新的彩色图像分割算法,首先对图像进行分水岭分割得到初始过分割图像,并通过模糊聚类方法得到区域分类概率,然后根据图像的边缘信息和空间特性,得到区域的线方向邻接区域,最后通过迭代的方法,利用线方向邻接区域信息更新区域的分类概率。实验结果显示了很好的提取效果。  相似文献   

20.
In most cases, the conventional pencil-drawing-synthesized methods were in terms of geometry and stroke, or only used classic edge detection method to extract image edge characters. In this paper, we propose a new method to produce pencil drawing from natural image. The synthesized result can not only generate pencil sketch drawing, but also can save the color tone of natural image and the drawing style is flexible. The sketch and style are learned from the edge of original natural image and one pencil image exemplar of artist’s work. They are accomplished through using the convolutional neural network feature maps of a natural image and an exemplar pencil drawing style image. Large-scale bound-constrained optimization (L-BFGS) is applied to synthesize the new pencil sketch whose style is similar to the exemplar pencil sketch. We evaluate the proposed method by applying it to different kinds of images and textures. Experimental results demonstrate that our method is better than conventional method in clarity and color tone. Besides, our method is also flexible in drawing style.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号