基于语义导向的光场图像深度估计 Depth Estimation Based on Semantic Guidance for Light Field Image期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于语义导向的光场图像深度估计

引用本文：	邓慧萍, 盛志超, 向森, 吴谨. 基于语义导向的光场图像深度估计[J]. 电子与信息学报, 2022, 44(8): 2940-2948. doi: 10.11999/JEIT210545

作者姓名：	邓慧萍盛志超向森吴谨

作者单位：	1.武汉科技大学信息科学与工程学院武汉 430081;;2.武汉科技大学冶金自动化与检测技术教育部工程研究中心武汉 430081

摘要：	光场图像的深度估计是3维重建、自动驾驶、对象跟踪等应用中的关键技术。然而，现有的深度学习方法忽略了光场图像的几何特性，在边缘、弱纹理等区域表现出较差的学习能力，导致深度图像细节的缺失。该文提出了一种基于语义导向的光场图像深度估计网络，利用上下文信息来解决复杂区域的不适应问题。设计了语义感知模块的编解码结构来重构空间信息以更好地捕捉物体边界，空间金字塔池化结构利用空洞卷积增大感受野，挖掘多尺度的上下文内容信息；通过无降维的自适应特征注意力模块局部跨通道交互，消除信息冗余的同时有效融合多路特征；最后引入堆叠沙漏串联多个沙漏模块，通过编解码结构得到更加丰富的上下文信息。在HCI 4D光场数据集上的实验结果表明，该方法表现出较高的准确性和泛化能力，优于所比较的深度估计的方法，且保留较好的边缘细节。
关键词：	光场图像深度估计语义感知注意力机制
收稿时间：	2021-06-08
修稿时间：	2022-03-10
Depth Estimation Based on Semantic Guidance for Light Field Image

DENG Huiping, SHENG Zhichao, XIANG Sen, WU Jing. Depth Estimation Based on Semantic Guidance for Light Field Image[J]. Journal of Electronics & Information Technology, 2022, 44(8): 2940-2948. doi: 10.11999/JEIT210545

Authors:	DENG Huiping SHENG Zhichao XIANG Sen WU Jing

Affiliation:	1. School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China;;2. Engineering Research Center for Metallurgical Automation and Measurement Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China

Abstract:	Light Field Depth Estimation(LFDE) is critical to the related applications such as 3D reconstruction, automatic driving and object tracking. However, the existing depth learning-based methods bring details lost on the edge, weak texture and other complex areas, because of ignoring the geometric characteristics of the light field image in the learning network. This paper proposes a semantic guided LFDE network, which utilizes contextual information of light field images to solve ill posed problems in complex regions. Encoder-decoder structure of semantic perception module is designed to reconstruct the spatial information for better obtaining the object boundary. The spatial pyramid pooling structure uses the atrous convolution to increase the receptive field and capture the multi-scale contextual information. Then, an adaptive local cross-channel interaction feature attention module without dimensionality reduction is used to eliminate information redundancy, and multi-channels are effectively fused. Finally, the stacked hourglass is introduced to connect multiple hourglass modules in series, and more rich context information is obtained by using the encoder-decoder structure. The experimental results on 4D light field dataset new HCI demonstrate that the proposed method has higher accuracy and generalization ability, which is superior to the depth estimation method compared, and retains better edge details.

Keywords:	Light field image Depth estimation Semantic perception Attention mechanism

	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏