首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到13条相似文献,搜索用时 0 毫秒
1.
Over the past few years, skeleton-based action recognition has attracted great success because the skeleton data is immune to illumination variation, view-point variation, background clutter, scaling, and camera motion. However, effective modeling of the latent information of skeleton data is still a challenging problem. Therefore, in this paper, we propose a novel idea of action embedding with a self-attention Transformer network for skeleton-based action recognition. Our proposed technology mainly comprises of two modules as, (i) action embedding and (ii) self-attention Transformer. The action embedding encodes the relationship between corresponding body joints (e.g., joints of both hands move together for performing clapping action) and thus captures the spatial features of joints. Meanwhile, temporal features and dependencies of body joints are modeled using Transformer architecture. Our method works in a single-stream (end-to-end) fashion, where multiple-layer perceptron (MLP) is used for classification. We carry out an ablation study and evaluate the performance of our model on a small-scale SYSU-3D dataset and large-scale NTU-RGB+D and NTU-RGB+D 120 datasets where the results establish that our method performs better than other state-of-the-art architectures.  相似文献   

2.
3.
Analysis of human behavior through visual information has been one of the active research areas in computer vision community during the last decade. Vision-based human action recognition (HAR) is a crucial part of human behavior analysis, which is also of great demand in a wide range of applications. HAR was initially performed via images from a conventional camera; however, depth sensors have recently embedded as an additional informative resource to cameras. In this paper, we have proposed a novel approach to largely improve the performance of human action recognition using Complex Network-based feature extraction from RGB-D information. Accordingly, the constructed complex network is employed for single-person action recognition from skeletal data consisting of 3D positions of body joints. The indirect features help the model cope with the majority of challenges in action recognition. In this paper, the meta-path concept in the complex network has been presented to lessen the unusual actions structure challenges. Further, it boosts recognition performance. The extensive experimental results on two widely adopted benchmark datasets, the MSR-Action Pairs, and MSR Daily Activity3D indicate the efficiency and validity of the method.  相似文献   

4.
5.
Human action recognition plays an important role in modern intelligent systems, such as human–computer interaction (HCI), sport analysis, and somatosensory game. Compared with conventional 2-D based human action analysis, using Kinect sensor can obtain depth information of human action, which is significant for human action recognition. In this paper, we propose a joint angle sequence model for recognizing human actions, where depth images are acquired by using Kinect sensor. We design an improved DTW method to improve the matching accuracy. Comprehensive experiments show the effectiveness and robustness of our proposed method.  相似文献   

6.
7.
Constructing the bag-of-features model from Space–time interest points (STIPs) has been successfully utilized for human action recognition. However, how to eliminate a large number of irrelevant STIPs for representing a specific action in realistic scenarios as well as how to select discriminative codewords for effective bag-of-features model still need to be further investigated. In this paper, we propose to select more representative codewords based on our pruned interest points algorithm so as to reduce computational cost as well as improve recognition performance. By taking human perception into account, attention based saliency map is employed to choose salient interest points which fall into salient regions, since visual saliency can provide strong evidence for the location of acting subjects. After salient interest points are identified, each human action is represented with the bag-of-features model. In order to obtain more discriminative codewords, an unsupervised codeword selection algorithm is utilized. Finally, the Support Vector Machine (SVM) method is employed to perform human action recognition. Comprehensive experimental results on the widely used and challenging Hollywood-2 Human Action (HOHA-2) dataset and YouTube dataset demonstrate that our proposed method is computationally efficient while achieving improved performance in recognizing realistic human actions.  相似文献   

8.
9.
传统的单词包(Bag-Of-Words, BOW)算法由于缺少特征之间的分布信息容易造成动作混淆,并且单词包大小的选择对识别结果具有较大影响。为了体现兴趣点的分布信息,该文在时空邻域内计算兴趣点之间的位置关系作为其局部时空分布一致性特征,并提出了融合兴趣点表观特征的增强单词包算法,采用多类分类支持向量机(Support Vector Machine, SVM)实现分类识别。分别针对单人和多人动作识别,在KTH数据集和UT-interaction数据集上进行实验。与传统单词包算法相比,增强单词包算法不仅提高了识别效率,而且削弱了单词包大小变化对识别率的影响,实验结果验证了算法的有效性。  相似文献   

10.
Much of the existing work on action recognition combines simple features with complex classifiers or models to represent an action. Parameters of such models usually do not have any physical meaning nor do they provide any qualitative insight relating the action to the actual motion of the body or its parts. In this paper, we propose a new representation of human actions called sequence of the most informative joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action based on highly interpretable measures such as the mean or variance of joint angle trajectories. We then represent the action as a sequence of these most informative joints. Experiments on multiple databases show that the SMIJ representation is discriminative for human action recognition and performs better than several state-of-the-art algorithms.  相似文献   

11.
This paper presents a novel No-Reference Video Quality Assessment (NR-VQA) model that utilizes proposed 3D steerable wavelet transform-based Natural Video Statistics (NVS) features as well as human perceptual features. Additionally, we proposed a novel two-stage regression scheme that significantly improves the overall performance of quality estimation. In the first stage, transform-based NVS and human perceptual features are separately passed through the proposed hybrid regression scheme: Support Vector Regression (SVR) followed by Polynomial curve fitting. The two visual quality scores predicted from the first stage are then used as features for the similar second stage. This predicts the final quality scores of distorted videos by achieving score level fusion. Extensive experiments were conducted using five authentic and four synthetic distortion databases. Experimental results demonstrate that the proposed method outperforms other published state-of-the-art benchmark methods on synthetic distortion databases and is among the top performers on authentic distortion databases. The source code is available at https://github.com/anishVNIT/two-stage-vqa.  相似文献   

12.
13.
In this paper, we propose a novel approach for key frames extraction on human action recognition from 3D video sequences. To represent human actions, an Energy Feature (EF), combining kinetic energy and potential energy, is extracted from 3D video sequences. A Self-adaptive Weighted Affinity Propagation (SWAP) algorithm is then proposed to extract the key frames. Finally, we employ SVM to recognize human actions on the EFs of selected key frames. The experiments show the information including whole action course can be effectively extracted by our method, and we obtain good recognition performance without losing classification accuracy. Moreover, the recognition speed is greatly improved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号