基于场景模态深度理解网络的单目图像深度理解 Monocular Image Depth Understanding Based on Scene Modality Depth Understanding Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于场景模态深度理解网络的单目图像深度理解

引用本文：	陈扬,李大威.基于场景模态深度理解网络的单目图像深度理解[J].计算机工程,2021,47(2):268-278.

作者姓名：	陈扬李大威

作者单位：	东华大学信息科学与技术学院, 上海 201620

基金项目：	国家自然科学基金;上海市自然科学基金

摘要：	基于深度卷积神经网络的图像处理方法得到的单目深度图像质量远高于传统图像处理方法，但该方法对无用特征的训练易产生误差积累，且基于回归求解的连续深度距离预测精度较低，导致图像深度信息提取不精确、目标边缘模糊与图像细节缺失。提出一种应用于单目彩色图像的场景模态深度理解网络。建立以堆叠沙漏为主框架的网络模型，通过反复进行自下而上和自上而下的特征提取过程融合低层次纹理与高级语义特征，在每层网络训练中结合离散的深度标签和真实深度图像降低深度理解难度，插入误差修正子模块和极大似然译码优化子模块以准确提取深度特征。实验结果表明，该网络获取的深度信息更准确，其在NYUv2数据集上绝对相关误差较ACAN网络降低0.72%，在KITTI数据集上均方相关误差较GASDA网络降低41.28%，与DORN等深度网络相比，其预测的深度图像包含更多细节信息且目标轮廓更清晰。
关键词：	单目深度理解场景模态标签有序回归误差修正极大似然译码
收稿时间：	2020-09-25
修稿时间：	2020-10-28
Monocular Image Depth Understanding Based on Scene Modality Depth Understanding Network

CHEN Yang,LI Dawei.Monocular Image Depth Understanding Based on Scene Modality Depth Understanding Network[J].Computer Engineering,2021,47(2):268-278.

Authors:	CHEN Yang LI Dawei

Affiliation:	College of Information Sciences and Technology, Donghua University, Shanghai 201620, China

Abstract:	The monocular depth image quality obtained by the image processing method based on Depth Convolution Neural Network(DCNN)is much higher than that of traditional image processing methods.However,this method is prone to error accumulation in the training of useless features,and the accuracy of continuous depth distance prediction based on regression solution is low,which leads to inaccurate image depth information extraction,blurred target edge and lack of image details.This paper proposes a Scene Modality Depth Understanding Network(SMDUN)for monocular color images.A network model based on stacked hourglass is established.Through repeated bottom-up and top-down processes,low-level texture and highlevel semantic features are fused.In each layer of network training,discrete depth tags and real depth images are combined to reduce the difficulty of depth understanding.Error correction sub module and maximum likelihood decoding optimization sub module are inserted to accurately extract depth features.Experimental results show that the network can obtain more accurate depth information,the Absolute Relative Error(AbsRel)of NYUv2 dataset is 0.72% lower than that of ACAN network,and the Mean Squared Relative Error(MSqRel)of KITTI dataset is 41.28% lower than that of GASDA network.Compared with DORN and other depth networks,the predicted depth image contains more detail information and the target contour is clearer.

Keywords:	monocular depth understanding scene modality labeling ordinal regression error correction maximum likelihood decoding
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏