首页 | 本学科首页   官方微博 | 高级检索  
     


Multi-view motion modelled deep attention networks (M2DA-Net) for video based sign language recognition
Affiliation:1. Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia;2. Department of Computer Science, Prince Sultan University, Riyadh 11586, Saudi Arabia;3. College of Applied Computer Sciences, King Saud University, Saudi Arabia;4. Turabah University College, Computer Sciences Program, Taif University, Taif 21944, Saudi Arabia;5. Software Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia;6. Centre of Smart Robotics Research (CS2R), King Saud University, Riyadh 11543, Saudi Arabia;7. Artificial Intelligence Center of Advanced Studies (Thakaa), King Saud University, Saudi Arabia.
Abstract:Currently, video-based Sign language recognition (SLR) has been extensively studied using deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, using multi view attention mechanism along with CNNs could be an appealing solution that can be considered in order to make the machine interpretation process immune to finger self-occlusions. The proposed multi stream CNN mixes spatial and motion modelled video sequences to create a low dimensional feature vector at multiple stages in the CNN pipeline. Hence, we solve the view invariance problem into a video classification problem using attention model CNNs. For superior network performance during training, the signs are learned through a motion attention network thus focusing on the parts that play a major role in generating a view based paired pooling using a trainable view pair pooling network (VPPN). The VPPN, pairs views to produce a maximally distributed discriminating features from all the views for an improved sign recognition. The results showed an increase in recognition accuracies on 2D video sign language datasets. Similar results were obtained on benchmark action datasets such as NTU RGB D, MuHAVi, WEIZMANN and NUMA as there is no multi view sign language dataset except ours.
Keywords:Multi view  Sign language recognition  Deep learning  Attention models  Motion modelled
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号