首页 | 本学科首页   官方微博 | 高级检索  
     


Leveraging Transfer Learning for Spatio-Temporal Human Activity Recognition from Video Sequences
Authors:Umair Muneer Butt  Hadiqa Aman Ullah  Sukumar Letchmunan  Iqra Tariq  Fadratul Hafinaz Hassan  Tieng Wei Koh
Affiliation:1.School of Computer Sciences, Universiti Sains Malaysia, Penang, 1180, Malaysia2 Department of Computer Science, The University of Chenab, Gujrat, 50700, Pakistan3 Department of Software Engineering and Information System, Universiti Putra Malaysia, Selangor, 43400, Malaysia
Abstract:Human Activity Recognition (HAR) is an active research area due to its applications in pervasive computing, human-computer interaction, artificial intelligence, health care, and social sciences. Moreover, dynamic environments and anthropometric differences between individuals make it harder to recognize actions. This study focused on human activity in video sequences acquired with an RGB camera because of its vast range of real-world applications. It uses two-stream ConvNet to extract spatial and temporal information and proposes a fine-tuned deep neural network. Moreover, the transfer learning paradigm is adopted to extract varied and fixed frames while reusing object identification information. Six state-of-the-art pre-trained models are exploited to find the best model for spatial feature extraction. For temporal sequence, this study uses dense optical flow following the two-stream ConvNet and Bidirectional Long Short Term Memory (BiLSTM) to capture long-term dependencies. Two state-of-the-art datasets, UCF101 and HMDB51, are used for evaluation purposes. In addition, seven state-of-the-art optimizers are used to fine-tune the proposed network parameters. Furthermore, this study utilizes an ensemble mechanism to aggregate spatial-temporal features using a four-stream Convolutional Neural Network (CNN), where two streams use RGB data. In contrast, the other uses optical flow images. Finally, the proposed ensemble approach using max hard voting outperforms state-of-the-art methods with 96.30% and 90.07% accuracies on the UCF101 and HMDB51 datasets.
Keywords:Human activity recognition  deep learning  transfer learning  neural network  ensemble learning  spatio-temporal
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号