面向RGB-D场景解析的三维空间结构化编码深度网络 Three-dimensional spatial structured encoding deep network for RGB-D scene parsing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向RGB-D场景解析的三维空间结构化编码深度网络

引用本文：	王泽宇,吴艳霞,张国印,布树辉.面向RGB-D场景解析的三维空间结构化编码深度网络[J].计算机应用,2017,37(12):3458-3466.

作者姓名：	王泽宇吴艳霞张国印布树辉

作者单位：	1. 哈尔滨工程大学计算机科学与技术学院, 哈尔滨 150001;2. 西北工业大学航空学院, 西安 710072

基金项目：	国家重点研发计划项目（2016YFB1000400）；国家自然科学基金资助项目（60903098）；中央高校自由探索基金资助项目（HEUCF100606）。

摘要：	有效的RGB-D图像特征提取和准确的3D空间结构化学习是提升RGB-D场景解析结果的关键。目前，全卷积神经网络（FCNN）具有强大的特征提取能力，但是，该网络无法充分地学习3D空间结构化信息。为此，提出了一种新颖的三维空间结构化编码深度网络，内嵌的结构化学习层有机地结合了图模型网络和空间结构化编码算法。该算法能够比较准确地学习和描述物体所处3D空间的物体分布。通过该深度网络，不仅能够提取包含多层形状和深度信息的分层视觉特征（HVF）和分层深度特征（HDF），而且可以生成包含3D结构化信息的空间关系特征，进而得到融合上述3类特征的混合特征，从而能够更准确地表达RGB-D图像的语义信息。实验结果表明，在NYUDv2和SUNRGBD标准RGB-D数据集上，该深度网络较现有先进的场景解析方法能够显著提升RGB-D场景解析的结果。
关键词：	全卷积神经网络图模型空间结构化编码算法分层视觉特征分层深度特征空间关系特征混合特征
收稿时间：	2017-05-15
修稿时间：	2017-07-24
Three-dimensional spatial structured encoding deep network for RGB-D scene parsing

WANG Zeyu,WU Yanxia,ZHANG Guoyin,BU Shuhui.Three-dimensional spatial structured encoding deep network for RGB-D scene parsing[J].journal of Computer Applications,2017,37(12):3458-3466.

Authors:	WANG Zeyu WU Yanxia ZHANG Guoyin BU Shuhui

Affiliation:	1. College of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China;2. School of Aeronautics, Northwestern Polytechnical University, Xi'an Shaanxi 710072, China

Abstract:	Efficient feature extraction from RGB-D images and accurate 3D spatial structure learning are two key points for improving the performance of RGB-D scene parsing. Recently, Fully Convolutional Neural Network (FCNN) has powerful ability of feature extraction, however, FCNN can not learn 3D spatial structure information sufficiently. In order to solve the problem, a new neural network architecture called Three-dimensional Spatial Structured Encoding Deep Network (3D-SSEDN) was proposed. The graphical model network and spatial structured encoding algorithm were organically combined by the embedded structural learning layer, the 3D spatial distribution of objects could be precisely learned and described. Through the proposed 3D-SSEDN, not only the Hierarchical Visual Feature (HVF) and Hierarchical Depth Feature (HDF) containing hierarchical shape and depth information could be extracted, but also the spatial structure feature containing 3D structural information could be generated. Furthermore, the hybrid feature could be obtained by fusing the above three kinds of features, thus the semantic information of RGB-D images could be accurately expressed. The experimental results on the standard RGB-D datasets of NYUDv2 and SUNRGBD show that, compared with the most previous state-of-the-art scene parsing methods, the proposed 3D-SSEDN can significantly improve the performance of RGB-D scene parsing.

Keywords:	Fully Convolutional Neural Network (FCNN) graphical model spatial structured encoding algorithm Hierarchical Visual Feature (HVF) Hierarchical Depth Feature (HDF) spatial structure feature hybrid feature

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏