首页 | 本学科首页   官方微博 | 高级检索  
     

基于3维点云鸟瞰图的高精度实时目标检测
引用本文:张易,项志宇,乔程昱,陈舒雅.基于3维点云鸟瞰图的高精度实时目标检测[J].机器人,2020,42(2):148-156.
作者姓名:张易  项志宇  乔程昱  陈舒雅
作者单位:浙江大学信息与电子工程学院, 浙江 杭州 310027
摘    要:针对基于3维点云的目标检测问题,提出了一种高精度实时的单阶段深度神经网络,分别在网络特征提取、损失函数设计和训练数据增强等3个方面提出了新的解决方案.首先对点云直接进行体素化来构建鸟瞰图.在特征提取阶段,使用残差结构提取高层语义特征,并融合多层次特征输出稠密的特征图.在回归鸟瞰图上的目标框的同时,在损失函数中考虑二次偏移量以实现更高精度的收敛.在网络训练中,使用不同帧3维点云混合的方式进行数据增强,提高网络的泛化性能.基于KITTI鸟瞰图目标检测数据集的实验结果表明,本文提出的网络仅使用雷达点云的位置信息,在性能上不仅优于目前最先进的鸟瞰图目标检测网络,而且优于融合图像和点云的检测方案,且整个网络运行速度达到20帧/秒,满足实时性要求.

关 键 词:3维点云  鸟瞰图  卷积神经网络  单阶段目标检测
收稿时间:2019-05-08

High-precision Real-time Object Detection Based on Bird's Eye Viewfrom 3D Point Clouds
ZHANG Yi,XIANG Zhiyu,QIAO Chengyu,CHEN Shuya.High-precision Real-time Object Detection Based on Bird's Eye Viewfrom 3D Point Clouds[J].Robot,2020,42(2):148-156.
Authors:ZHANG Yi  XIANG Zhiyu  QIAO Chengyu  CHEN Shuya
Affiliation:College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Abstract:For the problem of object detection from 3D point clouds, a high-precision and real-time single-stage deep neural network is proposed, which includes new solutions in three aspects: network feature extraction, loss function design and data augmentation. Firstly, the point clouds are directly voxelized to build a bird's eye view (BEV). In the step of feature extraction, the residual structure is used to extract high-level semantic features, and the multi-level features are combined to output dense feature map. While regressing the bounding boxes of objects from the BEV, the quadratic offset is considered in the loss function to achieve the convergence with higher precision. In training process, data augmentation is adopted by mixing 3D point clouds from different frames to improve the generalization of the network. The experimental results based on the KITTI BEV object detection dataset show that the proposed network only using the position information of the lidar point cloud, is not only better than the state-of-the-art BEV object detection network in performance, but also outperforms the methods that fuse images and point clouds. And the speed of the entire network reaches 20 frame/s, which meets the real-time requirement.
Keywords:3D point cloud  bird's eye view (BEV)  convolutional neural network (CNN)  single-stage object detection  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《机器人》浏览原始摘要信息
点击此处可从《机器人》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号