首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we introduce a novel method for action/movement recognition in motion capture data. The joints orientation angles and the forward differences of these angles in different temporal scales are used to represent a motion capture sequence. Initially K-means is applied on training data to discover the most representative patterns on orientation angles and their forward differences. A novel K-means variant that takes into account the periodic nature of angular data is applied on the former. Each frame is then assigned to one or more of these patterns and histograms that describe the frequency of occurrence of these patterns for each movement are constructed. Nearest neighbour and SVM classification are used for action recognition on the test data. The effectiveness and robustness of this method is shown through extensive experimental results on four standard databases of motion capture data and various experimental setups.  相似文献   

2.
3.
4.
Action recognition in video is one of the most important and challenging tasks in computer vision. How to efficiently combine the spatial-temporal information to represent video plays a crucial role for action recognition. In this paper, a recurrent hybrid network architecture is designed for action recognition by fusing multi-source features: a two-stream CNNs for learning semantic features, a two-stream single-layer LSTM for learning long-term temporal feature, and an Improved Dense Trajectories (IDT) stream for learning short-term temporal motion feature. In order to mitigate the overfitting issue on small-scale dataset, a video data augmentation method is used to increase the amount of training data, as well as a two-step training strategy is adopted to train our recurrent hybrid network. Experiment results on two challenging datasets UCF-101 and HMDB-51 demonstrate that the proposed method can reach the state-of-the-art performance.  相似文献   

5.
As a challenging task of video classification, action recognition has become a significant topic of computer vision community. The most popular methods based on two-stream architecture up to now are still simply fusing the prediction scores of each stream. In that case, the complementary characteristics of two streams cannot be fully utilized and the effect of shallower features is often overlooked. In addition, the equal treatment to features may weaken the role of the feature contributing significantly to the classification. Accordingly, a novel network called Multiple Depth-levels Features Fusion Enhanced Network (MDFFEN) is proposed. It improves on two aspects of two-stream architecture. In terms of the two-stream interaction mechanism, multiple depth-levels features fusion (MDFF) is formed to aggregate spatial–temporal features extracted from several sub-modules of original two streams by spatial–temporal features fusion (STFF). And with respect to further refining the spatiotemporal features, we propose a group-wise spatial-channel enhance (GSCE) module to highlight the meaningful regions and expressive channels automatically by priority assignment. The competitive results are achieved after we validate MDFFEN on three public challenging action recognition datasets, HDMB51, UCF101 and ChaLearn LAP IsoGD.  相似文献   

6.
Multi-label recognition is a fundamental, and yet is a challenging task in computer vision. Recently, deep learning models have achieved great progress towards learning discriminative features from input images. However, conventional approaches are unable to model the inter-class discrepancies among features in multi-label images, since they are designed to work for image-level feature discrimination. In this paper, we propose a unified deep network to learn discriminative features for the multi-label task. Given a multi-label image, the proposed method first disentangles features corresponding to different classes. Then, it discriminates between these classes via increasing the inter-class distance while decreasing the intra-class differences in the output space. By regularizing the whole network with the proposed loss, the performance of applying the well-known ResNet-101 is improved significantly. Extensive experiments have been performed on COCO-2014, VOC2007 and VOC2012 datasets, which demonstrate that the proposed method outperforms state-of-the-art approaches by a significant margin of 3.5% on large-scale COCO dataset. Moreover, analysis of the discriminative feature learning approach shows that it can be plugged into various types of multi-label methods as a general module.  相似文献   

7.
在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。  相似文献   

8.
Recently, video action recognition about two-stream network is still a popular research topic in computer vision. However, most of current two-stream-based methods have two redundancy issues, including: inter-frame redundancy and intra-frame redundancy. To solve the above problems, a Spatial-Temporal Saliency Action Mask Attention network (STSAMANet) is built for action recognition. First, this paper introduces a key-frame mechanism to eliminate inter-frame redundancy. This mechanism can compute key frames on each video sequence to get the greatest difference between frames. Then, Mask R-CNN detection technology is introduced to build a saliency attention layer to eliminate intra-frame redundancy. This layer is to focus on the saliency human body and objects for each action class. We experiment on two public video action datasets, i.e., the UCF101 dataset and Penn Action dataset to verify the effectiveness of our method in action recognition.  相似文献   

9.
10.
提出一个新的步态识别算法.首先通过计算轮廓图距离来检测异常步态轮廓图,并利用平均近邻图与平均轮廓图重建异常的图像.然后将对象的步态能量图分解为两部分,并分别为每一部分生成一系列扩展图像,从而构造出能量分解图.接着根据主干图与步态偏移图的对应关系消除静态形状信息,并进行脚部区域校正,从而构造出运动偏移图.最后,使用能量分解图和运动偏移图共同进行分类.实验结果表明,本文算法的识别率远远高于3个典型算法.  相似文献   

11.
In video-based action recognition, using videos with different frame numbers to train a two-stream network can result in data skew problems. Moreover, extracting the key frames from a video is crucial for improving the training and recognition efficiency of action recognition systems. However, previous works suffer from problems of information loss and optical-flow interference when handling videos with different frame numbers. In this paper, an augmented two-stream network (ATSNet) is proposed to achieve robust action recognition. A frame-number-unified strategy is first incorporated into the temporal stream network to unify the frame numbers of videos. Subsequently, the grayscale statistics of the optical-flow images are extracted to filter out any invalid optical-flow images and produce the dynamic fusion weights for the two branch networks to adapt to different action videos. Experiments conducted on the UCF101 dataset demonstrate that ATSNet outperforms previously defined methods, improving the recognition accuracy by 1.13%.  相似文献   

12.
13.
In the action recognition, a proper frame sampling method can not only reduce redundant video information, but also improve the accuracy of action recognition. In this paper, an action density based non-isometric frame sampling method, namely NFS, is proposed to discard the redundant video information and sample the rational frames in videos for neural networks to achieve great accuracy on human action recognition, in which action density is introduced in our method to indicate the intensity of actions in videos. Particularly, the action density determination mechanism, focused-clips division mechanism, and reinforcement learning based frame sampling (RLFS) mechanism are proposed in NFS method. Via the evaluations with various neural networks and datasets, our results show that the proposed NFS method can achieve great effectiveness in frame sampling and can assist in achieving better accuracy on action recognition in comparison with existing methods.  相似文献   

14.
Although multiple methods have been proposed for human action recognition, the existing multi-view approaches cannot well discover meaningful relationship among multiple action categories from different views. To handle this problem, this paper proposes an multi-view learning approach for multi-view action recognition. First, the proposed method leverages the popular visual representation method, bag-of-visual-words (BoVW)/fisher vector (FV), to represent individual videos in each view. Second, the sparse coding algorithm is utilized to transfer the low-level features of various views into the discriminative and high-level semantics space. Third, we employ the multi-task learning (MTL) approach for joint action modeling and discovery of latent relationship among different action categories. The extensive experimental results on M2I and IXMAS datasets have demonstrated the effectiveness of our proposed approach. Moreover, the experiments further demonstrate that the discovered latent relationship can benefit multi-view model learning to augment the performance of action recognition.  相似文献   

15.
多分类器融合的指纹全局特征协同识别   总被引:3,自引:0,他引:3  
指纹识别是生物特征识别中的热点,指纹全局特征识别具有明显的优势,但是单一分类器一般不能取得满意的识别效果。本文采用贝叶斯理论分析了常见的积、和、中值以及投票多分类器融合方法,并根据实际的选举情形,对投票法进行了2种改进。然后对3种指纹全局特征协同识别分类器:灰度值、主分量以及方向场分类器进行决策层融合,并采用了一种崭新而高效的协同模式识别方法。对FVC2002指纹库的实验表明:该方法具有较好的分类性能,预处理与特征提取简单、计算复杂度低、识别速度快、对污损指纹具有可靠的识别率、鲁棒性强,而且应用于身份认证中也取得了较好的认证效果。  相似文献   

16.
17.
18.
A new semi-serial fusion method of multiple feature based on learning using privileged information(LUPI) model was put forward.The exploitation of LUPI paradigm permits the improvement of the learning accuracy and its stability,by additional information and computations using optimization methods.The execution time is also reduced,by sparsity and dimension of testing feature.The essence of improvements obtained using multiple features types for the emotion recognition(speech expression recognition),is particularly applicable when there is only one modality but still need to improve the recognition.The results show that the LUPI in unimodal case is effective when the size of the feature is considerable.In comparison to other methods using one type of features or combining them in a concatenated way,this new method outperforms others in recognition accuracy,execution reduction,and stability.  相似文献   

19.
20.
Modeling and reasoning of the interactions between multiple entities (actors and objects) are beneficial for the action recognition task. In this paper, we propose a 3D Deformable Convolution Temporal Reasoning (DCTR) network to model and reason about the latent relationship dependencies between different entities in videos. The proposed DCTR network consists of a spatial modeling module and a temporal reasoning module. The spatial modeling module uses 3D deformable convolution to capture relationship dependencies between different entities in the same frame, while the temporal reasoning module uses Conv-LSTM to reason about the changes of multiple entity relationship dependencies in the temporal dimension. Experiments on the Moments-in-Time dataset, UCF101 dataset and HMDB51 dataset demonstrate that the proposed method outperforms several state-of-the-art methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号