首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对传统单幅图像深度估计线索不足及深度估计精度不准的问题,提出一种基于非参数化采样的单幅图像深度估计方法。该方法利用非参数化的学习手段,将现有RGBD数据集中的深度信息迁移到输入图像中去。首先计算输入图像和现有RGBD数据集多尺度的高层次图像特征;然后,在现有RGBD数据集中,基于高层次的图像特征通过kNN最近邻搜索找到若干与输入图像特征最匹配的候选图像,并将这些候选图像对通过SIFT流形变到输入图像进行对齐。最后,对候选深度图进行插值和平滑等优化操作便可以得到最后的深度图。实验结果表明,与现有算法相比,该方法估计得到的深度图精度更高,对输入图像的整体结构保持得更好。  相似文献   

2.
Obtaining exact depth from binocular disparities is hard if camera calibration is needed. We will show that qualitative information can be obtained from stereo disparities with little computation and without prior knowledge (or computation) of camera parameters. First, we derive two expressions that order all matched points in the images by depth in two distinct ways from image coordinates only. Using one for tilt estimation and point separation (in depth) demonstrates some anomalies observed in psychophysical experiments, most notably the “induced size effect.” We apply the same approach to detect qualitative changes in the curvature of a contour on the surface of an object, with eitherx- ory-coordinate fixed. Second, we develop an algorithm to compute axes of zero-curvature from disparities alone. The algorithm is shown to be quite robust against violations of its basic assumptions for synthetic data with relatively large controlled deviations. It performs almost as well on real images, as demonstrated on an image of four cans at different orientations.  相似文献   

3.
This paper presents a homotopy-based algorithm for a simultaneous recovery of defocus blur and the affine parameters of apparent shifts between planar patches of two pictures. These parameters are recovered from two images of the same scene acquired by a camera evolving in time and/or space and for which the intrinsic parameters are known. Using limited Taylor's expansion one of the images (and its partial derivatives) is expressed as a function of the partial derivatives of the two images, the blur difference, the affine parameters and a continuous parameter derived from homotopy methods. All of these unknowns can thus be directly computed by resolving a system of equations at a single scale. The proposed algorithm is tested using synthetic and real images. The results confirm that dense and accurate estimation of the previously mentioned parameters can be obtained.  相似文献   

4.
In this paper, we propose a novel stereo method for registering foreground objects in a pair of thermal and visible videos of close-range scenes. In our stereo matching, we use Local Self-Similarity (LSS) as similarity metric between thermal and visible images. In order to accurately assign disparities to depth discontinuities and occluded Region Of Interest (ROI), we have integrated color and motion cues as soft constraints in an energy minimization framework. The optimal disparity map is approximated for image ROIs using a Belief Propagation (BP) algorithm. We tested our registration method on several challenging close-range indoor video frames of multiple people at different depths, with different clothing, and different poses. We show that our global optimization algorithm significantly outperforms the existing state-of-the art method, especially for disparity assignment of occluded people at different depth in close-range surveillance scenes and for relatively large camera baseline.  相似文献   

5.
Edge and Depth from Focus   总被引:2,自引:0,他引:2  
This paper proposes a novel method to obtain the reliable edge and depth information by integrating a set of multi-focus images, i.e., a sequence of images taken by systematically varying a camera parameter focus. In previous work on depth measurement using focusing or defocusing, the accuracy depends upon the size and location of local windows where the amount of blur is measured. In contrast, no windowing is needed in our method; the blur is evaluated from the intensity change along corresponding pixels in the multi-focus images. Such a blur analysis enables us not only to detect the edge points without using spatial differentiation but also to estimate the depth with high accuracy. In addition, the analysis result is stable because the proposed method involves integral computations such as summation and least-square model fitting. This paper first discusses the fundamental properties of multi-focus images based on a step edge model. Then, two algorithms are presented: edge detection using an accumulated defocus image which represents the spatial distribution of blur, and depth estimation using a spatio-focal image which represents the intensity distribution along focus axis. The experimental results demonstrate that the highly precise measurement has been achieved: 0.5 pixel position fluctuation in edge detection and 0.2% error at 2.4 m in depth estimation.  相似文献   

6.
Several strategies to retrieve depth information from a sequence of images have been described so far. In this paper a method that turns around the existing symbiosis between stereovision and motion is introduced; motion minimizes correspondence ambiguities, and stereovision enhances motion information. The central idea behind our approach is to transpose the spatially defined problem of disparity estimation into the spatial–temporal domain. Motion is analyzed in the original sequences by means of the so-called permanency effect and the disparities are calculated from the resulting two-dimensional motion charge maps. This is an important contribution to the traditional stereovision depth analysis, where disparity is got from the image luminescence. In our approach, disparity is studied from a motion-based persistency charge measure.  相似文献   

7.
为了对图像中的显著目标进行更精确的识别,提出一种基于前景优化和概率估计的区域显著性检测算法.所提出算法主要包括前景与背景线索选择、前景线索优化及基于概率估计的显著性区域检测3部分.首先,采用简单线性迭代聚类算法对图像进行初始分割;然后,分别检测图像的背景线索和前景线索,并利用背景线索对前景线索进行优化;最后,采用概率估计算法分别对基于背景线索和优化后前景线索进行显著性区域检测,并对两者结果进行融合.对比实验表明,所提出算法相比其他算法取得了较高的查准率,具有较好的检测性能.  相似文献   

8.
This paper presents a novel vision-based global localization that uses hybrid maps of objects and spatial layouts. We model indoor environments with a stereo camera using the following visual cues: local invariant features for object recognition and their 3D positions for object pose estimation. We also use the depth information at the horizontal centerline of image where the optical axis passes through, which is similar to the data from a 2D laser range finder. This allows us to build our topological node that is composed of a horizontal depth map and an object location map. The horizontal depth map describes the explicit spatial layout of each local space and provides metric information to compute the spatial relationships between adjacent spaces, while the object location map contains the pose information of objects found in each local space and the visual features for object recognition. Based on this map representation, we suggest a coarse-to-fine strategy for global localization. The coarse pose is estimated by means of object recognition and SVD-based point cloud fitting, and then is refined by stochastic scan matching. Experimental results show that our approaches can be used for an effective vision-based map representation as well as for global localization methods.  相似文献   

9.
单目图像的深度估计可以从相似图像及其对应的深度信息中获得。然而,图像匹 配歧义和估计深度的不均匀性问题制约了这类算法的性能。为此,提出了一种基于卷积神经网 络(CNN)特征提取和加权深度迁移的单目图像深度估计算法。首先提取 CNN 特征计算输入图像 在数据集中的近邻图像;然后获得各候选近邻图像和输入图像间的像素级稠密空间形变函数; 再将形变函数迁移至候选深度图像集,同时引入基于 SIFT 的迁移权重 SSW,并通过对加权迁 移后的候选深度图进行优化获得最终的深度信息。实验结果表明,该方法显著降低了估计深度 图的平均误差,改善了深度估计的质量。  相似文献   

10.
This paper presents an algorithm for a dense computation of the difference in blur between two images. The two images are acquired by varying the intrinsic parameters of the camera. The image formation system is assumed to be passive. Estimation of depth from the blur difference is straightforward. The algorithm is based on a local image decomposition technique using the Hermite polynomial basis. We show that any coefficient of the Hermite polynomial computed using the more blurred image is a function of the partial derivatives of the other image and the blur difference. Hence, the blur difference is computed by resolving a system of equations. The resulting estimation is dense and involves simple local operations carried out in the spatial domain. The mathematical developments underlying estimation of the blur in both 1D and 2D images are presented. The behavior of the algorithm is studied for constant images, step edges, line edges, and junctions. The selection of its parameters is discussed. The proposed algorithm is tested using synthetic and real images. The results obtained are accurate and dense. They are compared with those obtained using an existing algorithm.  相似文献   

11.
Super-resolution mapping (SRM) is a technique for exploring spatial distribution information of the land-cover classes at finer spatial resolution. The soft-then-hard super-resolution mapping (STHSRM) algorithm is a type of SRM algorithm that first estimates the soft class values for sub-pixels at the target fine spatial resolution and then predicts the hard class labels for sub-pixels. The sub-pixel shifted images from the same area can be incorporated to improve the accuracy of STHSRM algorithm. In this article, multiscale sub-pixel shifted images (MSSI) based on the fine-scale model and the coarse-scale model are utilized to increase the accuracy of STHSRM. First, class fraction images are derived from multiple sub-pixel shifted coarse spatial resolution images by soft classification. Then using the sub-pixel/sub-pixel spatial attraction model as fine-scale and the sub-pixel/pixel spatial attraction model as coarse scale, all MSSI can be derived from fraction images. The MSSI for each class are then integrated to obtain the desired fine spatial resolution images. Finally, the integrated fine spatial resolution images are used to allocate classes for sub-pixel. Experiments on two synthetic remote sensing images and a real hyperspectral remote sensing imagery show that the proposed method produces higher mapping accuracy result.  相似文献   

12.
The potential of multitemporal coarse spatial resolution remotely sensed images for vegetation monitoring is reduced in fragmented landscapes, where most of the pixels are composed of a mixture of different surfaces. Several approaches have been proposed for the estimation of reflectance or NDVI values of the different land-cover classes included in a low resolution mixed pixel. In this paper, we propose a novel approach for the estimation of sub-pixel NDVI values from multitemporal coarse resolution satellite data. Sub-pixel NDVIs for the different land-cover classes are calculated by solving a weighted linear system of equations for each pixel of a coarse resolution image, exploiting information about within-pixel fractional cover derived from a high resolution land-use map. The weights assigned to the different pixels of the image for the estimation of sub-pixel NDVIs of a target pixel i are calculated taking into account both the spatial distance between each pixel and the target and their spectral dissimilarity estimated on medium-resolution remote-sensing images acquired in different periods of the year. The algorithm was applied to daily and 16-day composite MODIS NDVI images, using Landsat-5 TM images for calculation of weights and accuracy evaluation.Results showed that application of the algorithm provided good estimates of sub-pixel NDVIs even for poorly represented land-cover classes (i.e., with a low total cover in the test area). No significant accuracy differences were found between results obtained on daily and composite MODIS images. The main advantage of the proposed technique with respect to others is that the inclusion of the spectral term in weight calculation allows an accurate estimate of sub-pixel NDVI time series even for land-cover classes characterized by large and rapid spatial variations in their spectral properties.  相似文献   

13.
3-D Depth Reconstruction from a Single Still Image   总被引:4,自引:0,他引:4  
We consider the task of 3-d depth estimation from a single still image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured indoor and outdoor environments which include forests, sidewalks, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the value of the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multiscale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. We further propose a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.  相似文献   

14.
This paper describes an algorithm to continually and accurately estimate the absolute location of a diagnostic or surgical tool (such as a laser) pointed at the human retina, from a series of image frames. We treat the problem as a registration problem using diagnostic images to build a spatial map of the retina and then registering each online image against this map. Since the image location where the laser strikes the retina is easily found, this registration determines the position of the laser in the global coordinate system defined by the spatial map. For each online image, the algorithm computes similarity invariants, locally valid despite the curved nature of the retina, from constellations of vascular landmarks. These are detected using a high-speed algorithm that iteratively traces the blood vessel structure. Invariant indexing establishes initial correspondences between landmarks from the online image and landmarks stored in the spatial map. Robust alignment and verification steps extend the similarity transformation computed from these initial correspondences to a global, high-order transformation. In initial experimentation, the method has achieved 100 percent success on 1024 /spl times/ 1024 retina images. With a version of the tracing algorithm optimized for speed on 512 /spl times/ 512 images, the computation time is only 51 milliseconds per image on a 900MHz PentiumIII processor and a 97 percent success rate is achieved. The median registration error in either case is about 1 pixel.  相似文献   

15.
Numerous studies have been conducted to compare the classification accuracy of coral reef maps produced from satellite and aerial imagery with different sensor characteristics such as spatial or spectral resolution, or under different environmental conditions. However, in additional to these physical environment and sensor design factors, the ecologically determined spatial complexity of the reef itself presents significant challenges for remote sensing objectives. While previous studies have considered the spatial resolution of the sensors, none have directly drawn the link from sensor spatial resolution to the scale and patterns in the heterogeneity of reef benthos. In this paper, we will study how the accuracy of a commonly used maximum likelihood classification (MLC) algorithm is affected by spatial elements typical of a Caribbean atoll system present in high spectral and spatial resolution imagery.The results indicate that the degree to which ecologically determined spatial factors influence accuracy is dependent on both the amount of coral cover on the reef and the spatial resolution of the images being classified, and may be a contributing factor to the differences in the accuracies obtained for mapping reefs in different geographical locations. Differences in accuracy are also obtained due to the methods of pixel selection for training the maximum likelihood classification algorithm. With respect to estimation of live coral cover, a method which randomly selects training samples from all samples in each class provides better estimates for lower resolution images while a method biased to select the pixels with the highest substrate purity gave better estimations for higher resolution images.  相似文献   

16.
传统的以彩色图像为指导的深度图像超分辨率(SR)重建方法,参考图像必须为高分 辨率彩色图像,彩色图像的分辨率决定了深度图像的放大上限。同时,实际应用中可能只存在低 分辨率彩色图像,此时上述方法也不再适用。为此,探讨使用任意分辨率彩色图像为指导的深度 图像SR 重建方法。首先,使用大量不同类别的图像SR 算法对输入彩色图像进行上采样,得到 高分辨率彩色图像并以此作为指导图像,然后采用基于二阶总广义变分方法,将由低分辨率彩色 图像重建得到的图像作为正则约束项,添加图像边缘信息,构建目标函数,将深度图像SR 重建 问题转化为最优化问题,再通过原-对偶方法求解,最终得到高分辨率深度图像。探讨了之前被 相关方法所忽略的情形,该方法可以适用于任意分辨率的彩色指导图像。并且通过相关实验发现 了令人惊异的现象,即通过使用低分辨率彩色图像放大后作为指导,可以得到与使用高分辨率彩 色指导图像相近甚至更好的结果,对相关问题的研究和应用具有一定参考意义。  相似文献   

17.
Sam Y. Sung  Tianming Hu   《Knowledge》2006,19(8):687-695
This work is on the use of multiple attributes or features and spatial relationships, with the help of a user interface based on an iconic paradigm, to retrieve images represented by iconic pictures. An icon has texture, color, and text attributes. Texture is represented by three statistical textural properties, namely, coarseness, contrast, and directionality. For text, the vector space model is used. For color, a representation based on a modified color histogram method which is less storage-intensive is proposed. The final icon similarity is the combination of the attribute similarity values using a proven adaptive algorithm. 2-D strings and its variants are commonly used to represent spatial relationships and perform spatial reasoning. We extended the method to include similarity ranking by using different similarity functions for different spatial relationships and an efficient embedding algorithm. Furthermore, our method solves the problem of query expressiveness which all methods based on 2-D string representations suffer from.  相似文献   

18.
This paper reports on an experimental approach to adjusting stereo parameters automatically and thereby providing a low eye strain, easily accommodated stereo view for computer graphics applications. To this end, the concept of virtual eye separation is defined. Experiment 1 shows that dynamic changes in virtual eye separation are not noticed if they occur over a period of a few seconds. Experiment 2 shows that when subjects are given control over their virtual eye separation, they change it depending on the amount of depth in the scene. Based partly on these results, an algorithm is presented for enhancing stereo depth cues for moving computer generated 3D images. It has the effect of doubling the stereo depth in flat scenes and limiting the stereo depth for deep scenes. It also reduces the occurrence of double images and the discrepancy between focus and vergence. The algorithm is applied dynamically in real time with an optional damping factor applied so the disparities never change too abruptly. Finally, Experiment 3 provides a qualitative assessment of the algorithm with a dynamic “flight” over a digital elevation map  相似文献   

19.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

20.
We address the problem of depth and ego-motion estimation from omnidirectional images. We propose a correspondence-free structure-from-motion problem for sequences of images mapped on the 2-sphere. A novel graph-based variational framework is first proposed for depth estimation between pairs of images. The estimation is cast as a TV-L1 optimization problem that is solved by a fast graph-based algorithm. The ego-motion is then estimated directly from the depth information without explicit computation of the optical flow. Both problems are finally addressed together in an iterative algorithm that alternates between depth and ego-motion estimation for fast computation of 3D information from motion in image sequences. Experimental results demonstrate the effective performance of the proposed algorithm for 3D reconstruction from synthetic and natural omnidirectional images.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号