首页 | 本学科首页   官方微博 | 高级检索  
     

基于场景模态深度理解网络的单目图像深度理解
引用本文:陈扬,李大威.基于场景模态深度理解网络的单目图像深度理解[J].计算机工程,2021,47(2):268-278.
作者姓名:陈扬  李大威
作者单位:东华大学 信息科学与技术学院, 上海 201620
基金项目:国家自然科学基金;上海市自然科学基金
摘    要:基于深度卷积神经网络的图像处理方法得到的单目深度图像质量远高于传统图像处理方法,但该方法对无用特征的训练易产生误差积累,且基于回归求解的连续深度距离预测精度较低,导致图像深度信息提取不精确、目标边缘模糊与图像细节缺失。提出一种应用于单目彩色图像的场景模态深度理解网络。建立以堆叠沙漏为主框架的网络模型,通过反复进行自下而上和自上而下的特征提取过程融合低层次纹理与高级语义特征,在每层网络训练中结合离散的深度标签和真实深度图像降低深度理解难度,插入误差修正子模块和极大似然译码优化子模块以准确提取深度特征。实验结果表明,该网络获取的深度信息更准确,其在NYUv2数据集上绝对相关误差较ACAN网络降低0.72%,在KITTI数据集上均方相关误差较GASDA网络降低41.28%,与DORN等深度网络相比,其预测的深度图像包含更多细节信息且目标轮廓更清晰。

关 键 词:单目深度理解  场景模态标签  有序回归  误差修正  极大似然译码  
收稿时间:2020-09-25
修稿时间:2020-10-28

Monocular Image Depth Understanding Based on Scene Modality Depth Understanding Network
CHEN Yang,LI Dawei.Monocular Image Depth Understanding Based on Scene Modality Depth Understanding Network[J].Computer Engineering,2021,47(2):268-278.
Authors:CHEN Yang  LI Dawei
Affiliation:College of Information Sciences and Technology, Donghua University, Shanghai 201620, China
Abstract:The monocular depth image quality obtained by the image processing method based on Depth Convolution Neural Network(DCNN)is much higher than that of traditional image processing methods.However,this method is prone to error accumulation in the training of useless features,and the accuracy of continuous depth distance prediction based on regression solution is low,which leads to inaccurate image depth information extraction,blurred target edge and lack of image details.This paper proposes a Scene Modality Depth Understanding Network(SMDUN)for monocular color images.A network model based on stacked hourglass is established.Through repeated bottom-up and top-down processes,low-level texture and highlevel semantic features are fused.In each layer of network training,discrete depth tags and real depth images are combined to reduce the difficulty of depth understanding.Error correction sub module and maximum likelihood decoding optimization sub module are inserted to accurately extract depth features.Experimental results show that the network can obtain more accurate depth information,the Absolute Relative Error(AbsRel)of NYUv2 dataset is 0.72% lower than that of ACAN network,and the Mean Squared Relative Error(MSqRel)of KITTI dataset is 41.28% lower than that of GASDA network.Compared with DORN and other depth networks,the predicted depth image contains more detail information and the target contour is clearer.
Keywords:monocular depth understanding  scene modality labeling  ordinal regression  error correction  maximum likelihood decoding
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号