席林  孙韶媛  李琳娜  邹芳喻 《激光与红外》2012,42(11):1311-1315
提出一种通过非线性学习模型来估计单目红外图像深度的算法。该算法首先通过逐步线性回归和独立成分分析(ICA)寻找对于红外图像深度相关性较强的特征,然后以具有核函数的非线性支持向量机(SVM)为模型基础,采用监督学习的方法对红外图像深度特征进行回归分析并训练,在训练过程中通过已知数据回归后的最小均方误差对模型参数进行修正,训练后的模型可对单目红外图像的深度分布进行估计。实验结果证明,利用该模型能较一致地估计单目红外图像的深度信息。  相似文献   

提出一种基于监督学习得到深度估计模型的单目车载红外图像深度估计方法。首先用核主成分分析法(KPCA)筛选红外图像特征。将最初提取的红外图像特征用核函数非线性映射到一个线性可分的高维特征空间,再完成主成分分析(PCA),得到降维后的红外图像特征。然后以BP神经网络为模型基础,对红外图像特征和深度值进行训练,训练后的深度估计模型可对单目车载红外图像的深度分布进行估计。实验结果证明,利用该模型估计的单目车载红外图像的深度信息与原红外图像的深度信息一致。  相似文献   

针对从单目视觉图像中估计深度信息时存在的预测精度不够准确的问题,该文提出一种基于金字塔池化网络的道路场景深度估计方法。该方法利用4个残差网络块的组合提取道路场景图像特征,然后通过上采样将特征图逐渐恢复到原始图像尺寸,多个残差网络块的加入增加网络模型的深度;考虑到上采样过程中不同尺度信息的多样性,将提取特征过程中各种尺寸的特征图与上采样过程中相同尺寸的特征图进行融合,从而提高深度估计的精确度。此外,对4个残差网络块提取的高级特征采用金字塔池化网络块进行场景解析,最后将金字塔池化网络块输出的特征图恢复到原始图像尺寸并与上采样模块的输出一同输入预测层。通过在KITTI数据集上进行实验,结果表明该文所提的基于金字塔池化网络的道路场景深度估计方法优于现有的估计方法。  相似文献   

提出一种基于单目双焦及SIFT特征匹配的深度估计方法.根据空间景物的深度在不同焦距下其成像的矢量位置和对应的焦距形成几何关系的原理,通过单相机获取两幅不同焦距下的图像后,运用SIFT算法对两幅图像进行特征提取和特征匹配,得出同一景物像素点距中心点的偏移位置比,从而通过几何关系公式计算出像素点的深度值,以此获取深度图.经实验验证了该方法的可行性,实验结果表明,使用该方法获取深度值仅需单台相机这一设备,方法简单易行,且成本低,具有广阔的应用范围.  相似文献   

赵霖  赵滟  靳捷 《信号处理》2022,38(5):1088-1097
自监督单目深度估计在自动驾驶、智能制造等领域有着广泛的应用。然而由于自监督训练存在大量训练噪声,其估计精度受到了极大限制。针对自监督单目深度估计算法中深度估计精度有限的问题,本文提出了一种基于局部注意力机制和迭代调优的自监督单目深度估计框架。首先,对于深度估计网络,基于局部像素间深度值的高度相关性,本文设计了一种局部注意力机制来融合高分辨率特征图的局部特征,提升深度估计的准确性;其次,对于位姿估计网络,本文设计了一种迭代调优的位姿估计结构,利用残差优化的方式降低位姿估计难度,提升位姿估计的准确性进而提升深度估计网络的性能。实验表明,本文提出的改进自监督单目深度估计算法有效提升了深度估计的精度。   相似文献   

2D视频转3D视频是解决3D片源不足的主要手段,而单幅图像的深度估计是其中的关键步骤.提出基于加权SIFT流深度迁移和能量模型优化的单幅图像深度提取方法.首先利用图像的全局描述符从深度图数据库中检索出近邻图像;其次通过SIFT流建立输入图像和近邻图像之间像素级稠密对应关系;再次由SIFT流误差计算迁移权重,将近邻图像对应像素点的深度乘以权重后迁移到输入图像上;然后利用均值滤波对迁移后的近邻图像深度进行融合;最后建立深度图优化能量模型,在尽量接近迁移后近邻图像深度的前提下,平滑梯度较小区域的深度.实验结果表明,该方法降低了估计深度图的平均相对误差,增强了深度图的均匀性.  相似文献   

本文针对单目深度估计模型深度序数回归算法中全图像编码器易丢失较大像素值像素特征信息和位置信息的缺点,提出一种基于CBAM的深度序数回归方法.首先,将CBAM嵌入到深度序数回归算法中作为全图像编码器,依次采用通道注意力机制和空间注意力机制来捕获图像完整的特征信息和位置信息,通过获得的注意力图重新调整原始特征;其次,对像素的深度值进行离散,将深度估计重新转化为序数回归问题;最后,使用回归损失函数对网络进行训练.实验结果表明,相比于其他有监督学习、半监督学习和无监督学习的方法,该方法在KITTI数据集上取得更好的效果.  相似文献   

刘香凝  赵洋  王荣刚 《信号处理》2020,36(9):1450-1456
单幅图像的深度估计是场景几何理解过程中的一个重要步骤,但由于尺度模糊,也被计算机视觉领域普遍认为是一个典型的不适定问题。近年来,尽管监督学习方法在单目深度估计中取得了基本令人满意的效果,但需要对数据集进行大量真实深度值的标记,这是一项成本较高的工作。此外,由于物体的运动、遮挡、光照等常见问题,单目深度估计的表现并不尽如人意,尤其是在物体边缘和弱纹理区域。为了解决这些问题,本文提出了一种基于自注意力的多阶段无监督单目深度估计网络。该方法具有以下特点:1)多阶段网络结构对训练过程中的深度估计具有较强的约束和监督作用;2)通过掩模加权重构损失和左右视差一致性损失对网络进行优化;3)采用自注意力机制捕捉更多上下文信息,进而提升预测结果。实验结果表明,该方法在KITTI数据集上的深度估计效果达到甚至超过了已有方法。   相似文献   

SLAM即同时定位与地图构建,一直是机器人和计算机视觉的研究热点。尤其是视觉SLAM技术,21世纪以来在理论和实践上均取得了明显的突破,已逐步迈向市场应用。建图作为SLAM的两大目标之一,可以满足更多的应用需求。本文在给定相机轨迹的情况下,提出一种视觉SLAM单目半稠密建图方法,利用极线搜索和块匹配技术,加入图像变换和逆深度高斯深度滤波器处理,以期避免单目稠密建图严重依赖纹理、计算量大的缺点,提高单目半稠密建图的准确性和鲁棒性。经测试显示,改进的单目半稠密建图方法在检测梯度变化明显像素点上更加准确,深度估计的平均误差和平方误差分别减少了9%和47%,是一种可行有效的视觉SLAM单目半稠密建图解决方案。  相似文献   

针对雾天图像数据集匮乏问题,提出一种基于深度估计的雾天模拟方法。自适应调整亮度与饱和度对清晰原图像进行预处理,采用自监督单目深度挖掘网络生成图像的深度图,利用引导滤波优化深度图,设定模拟图像能见度获得透射率图,通过暗通道图区分天空区域并估计大气光值,最终由大气散射模型得到设定能见度下的雾天模拟图像。实验数据显示,该方法有效改善了模拟图像目标不清晰、雾气边缘锐化问题,在模拟能见度为2000 m以下的雾天图像时效果稳定,其雾天模拟图像与真实雾天图像的特征评价指标平均误差率为6.28%,表明该方法具有可行性,可对自然环境下清晰图像进行雾天模拟以解决雾天图像数据集匮乏与能见度数据缺失的问题。  相似文献   

基于单目视觉的无人机障碍探测算法研究   总被引:1,自引:0,他引:1  
针对低空飞行无人机的避障问题,研究了一种基于单目视觉的障碍物深度提取算法。在无人机前视摄像机获取的图像中,通过Harris角点检测算法提取角点作为特征点,使用归一化互相关算法进行角点的匹配,根据图像序列中特征点间距离的变化和无人机的运动计算障碍物的深度。对用于单个障碍物深度计算方法进行了改进,使用RANSAC算法区分不同深度的障碍物。仿真表明,该算法可以有效发现并区分不同深度的障碍物。  相似文献   

No-reference quality assessment of images has received considerable attention. However, the accuracy of such assessment remains questionable because of its weak biological basis. In this paper, we propose a novel quality assessment model based on the superpixel index and biological binocular mechanisms. The technical contributions of our model are the introduction of local monocular superpixel features and three global binocular visual features. We utilize monocular superpixel segmentation to extract two types of entropies as the local visual features for accurate quality-aware feature extraction. In addition, natural scene statistics features are extracted from the binocular visual information to complement the local monocular features and quantify the naturalness of the stereoscopic images. Finally, a regression model is learned to evaluate the quality of the stereoscopic images. Experimental results from three popular databases demonstrate that the proposed model has a more reliable performance than earlier models in terms of prediction accuracy and generalizability.  相似文献   

Human visual theory is closely related to stereo image quality assessment (SIQA), which determines whether the evaluation results of SIQA method can keep good consistency with subjective perception. Many SIQA methods are not fully based on human visual theory, so there is still room for improvement. The research on the visual system tends to the dorsal and ventral pathways, which ignores the information differences in the early visual pathways. It is worth noting that the ON and OFF receptive fields in retinal ganglion cells (RGCs) respond asymmetrically to the statistical features of images. Inspired by this, in this paper, we propose an SIQA method based on monocular and binocular visual features, which takes into account the difference of ON and OFF response features in early visual pathways. Moreover, the different information interaction mechanisms of visual cortex are used to fuse the response maps information of left and right images. Final, monocular and binocular features are extracted and sent to support vector regression (SVR) for quality prediction. Experimental results show that the proposed method is superior to several mainstream SIQA metrics on four publicly available stereo image databases.  相似文献   

In this paper, a convolutional neural network (CNN) with multi-loss constraints is designed for stereoscopic image quality assessment (SIQA). A stereoscopic image not only contains monocular information, but also provides binocular information which is as identically crucial as the former. So we take the image patches of left-view images, right-view images and the difference images as the inputs of the network to utilize monocular information and binocular information. Moreover, we propose a method to obtain proxy label of each image patch. It preserves the quality difference between different regions and views. In addition, the multiple loss functions with adaptive loss weights are introduced in the network, which consider both local features and global features and constrain the feature learning from multiple perspectives. And the adaptive loss weights also make the multi-loss CNN more flexible. The experimental results on four public SIQA databases show that the proposed method is superior to other existing SIQA methods with state-of-the-art performance.  相似文献   

热成像能够反映场景的温度分布,对热成像进行深度估计,可以恢复出场景的三维温度场,在故障诊断、夜视导航等领域具有重要意义。本文提出一种面向单目热成像深度估计的非参深度采样方法。为了克服热像纹理缺乏、轮廓模糊的缺点,使用了空间金字塔匹配(Spatial Pyramid Matching,SPM)来进行热像的特征分析。首先,基于SPM特征匹配,从数据库中筛选出与待估计深度的热像具有相似场景的候选热像;然后,采用SIFT Flow变形算法对候选热像的深度图进行采样,并将深度信息传递给待估计的热像。实验结果表明,这种方法能够对单目热像进行有效的深度估计,与同类算法相比具有明显优势。  相似文献   

In the field of intelligent transportation, autonomous driving technologies, especially visual sensing solutions have attracted increasing attention in recent years. There are still some challenges in pedestrian location based on the monocular camera, as the pedestrian is a non-rigid object and its depth information cannot be obtained from the monocular camera easily and accurately. In this paper, a pedestrian location framework based on monocular cameras is proposed. The framework consists of three parts: coarse positioning, auxiliary information generation and information fusion. In the part of coarse positioning, the human skeleton information is obtained from the monocular images and a light-weight feed-forward neural network is used to predict the pedestrian position based on the skeleton information. In the part of auxiliary information generation, pseudo-LiDAR points with pedestrian depth information are generated from the monocular images through an auxiliary network. Finally, the outputs of the above two parts are fused to achieve the pedestrian location. The experimental results on KITTI dataset show that our method has achieved better performance than other methods.  相似文献   

This paper presents a novel intelligent system for the automatic visual inspection of vessels consisting of three processing levels: (a) data acquisition: images are collected using a magnetic climbing robot equipped with a low-cost monocular camera for hull inspection; (b) feature extraction: all the images are characterized by 12 features consisting of color moments in each channel of the HSV space; (c) classification: a novel tool, based on an ensemble of classifiers, is proposed to classify sub-images as rust or non-rust. This paper provides a helpful roadmap to guide future research on the detection of rusting of metals using image processing.  相似文献   

计算机立体视觉中,获取含单目特征的立体图像对中特征的视差分布一直是个难点.有别于传统的基于特征属性的匹配,本文在FACADE(Form-And-Color-And-Depth)视觉理论及其神经元动力学方程基础上,构造出FACADE"双侧竞争"双目滤波器,它采用不同的竞争策略对表示单目特征的神经元和表示双目特征的神经元分别进行处理,从而将立体图像对中处于不同深度上的特征分配到不同的神经元表示平面上.实验结果表明用神经元动力学方法来获取包含单目特征的立体图像对的视差分布是可行的.  相似文献   

A challenging problem confronted when designing a blind/no-reference (NR) stereoscopic image quality assessment (SIQA) algorithm is to simulate the quality assessment (QA) behavior of the human visual system (HVS) during binocular vision. An effective way to solve this problem is to estimate the quality of the merged single view created in the human brain which is also referred to as the cyclopean image. However, due to the difficulty in modeling the binocular fusion and rivalry properties of the HVS, obtaining effective cyclopean images for QA is non-trivial, and consequently previous NR SIQA algorithms either require the MOS/DMOS values of the distorted 3D images for training or ignore the quality analysis of the merged cyclopean view. In this paper, we focus on (1) constructing accurate and appropriate cyclopean views for QA of stereoscopic images by adaptively analyzing the distortion information of two monocular views, and (2) training NR SIQA models without requiring the assistance of the MOS/DMOS values in existing databases. Accordingly, we present an effective opinion-unaware SIQA algorithm called MUSIQUE-3D, which blindly assesses the quality of multiply and singly distorted stereoscopic images by analyzing quality degradations of both monocular and cyclopean views. The monocular view quality is estimated by an extended version of the MUSIQUE algorithm, and the cyclopean view quality is computed from the distortion parameter values predicted by a two-layer classification-regression model trained on a large 3D image dataset. Tests on various 3D image databases demonstrate the superiority of our method as compared with other state-of-the-art SIQA algorithms.  相似文献   

针对立体图像质量预测准确性不足的问题,该文提出了一种结合空间域和变换域提取质量感知特征的无参考立体图像质量评价模型。在空间域和变换域分别提取输入的左、右视图的自然场景统计特征,并在变换域提取合成独眼图的自然场景统计特征,然后将其输入到支持向量回归(SVR)中,训练从特征域到质量分数域的预测模型,并以此建立SIQA客观质量评价模型。在4个公开的立体图像数据库上与一些主流的立体图像质量评价算法进行对比,以在LIVE 3D Phase I图像库中的性能测试为例,Spearman秩相关系数、皮尔逊线性相关系数和均方根误差分别达到0.967,0.946和5.603,验证了所提算法的有效性。  相似文献   

