首页 | 本学科首页   官方微博 | 高级检索  
     


Multi-view convolutional vision transformer for 3D object recognition
Affiliation:1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;2. Engineering Research Center of Mine Digitization, Ministry of Education of the Peoples Republic of China, Xuzhou 221116, China;3. Disaster Intelligent Prevention and Control and Emergency Rescue Innovation Research Center, Xuzhou 221116, China;4. Jiangsu Junsheng Wanbang Holding Group Co., Ltd., Xuzhou 221116, China;5. School of Electrical Engineering and Computer Science, University of Ottawa, Canada
Abstract:With the rapid development of three-dimensional (3D) vision technology and the increasing application of 3D objects, there is an urgent need for 3D object recognition in the fields of computer vision, virtual reality, and artificial intelligence robots. The view-based method projects 3D objects into two-dimensional (2D) images from different viewpoints and applies convolutional neural networks (CNN) to model the projected views. Although these methods have achieved excellent recognition performance, there is not sufficient information interaction between the features of different views in these methods. Inspired by the recent success achieved by vision transformer (ViT) in image recognition, we propose a hybrid network by taking advantage of CNN to extract multi-scale local information of each view, and of transformer to capture the relevance of multi-scale information between different views. To verify the effectiveness of our multi-view convolutional vision transformer (MVCVT), we conduct experiments on two public benchmarks, ModelNet40 and ModelNet10, and compare with those of some state-of-the-art methods. The final results show that MVCVT has competitive performance in 3D object recognition.
Keywords:Multi-view  3D object recognition  Feature fusion  Convolutional neural networks
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号