首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
Ling  Hefei  Chen  Yao  Chen  Jiazhong  Wu  Lei  Shi  Yuxuan  Deng  Jing 《Multimedia Tools and Applications》2020,79(37-38):26913-26926
Multimedia Tools and Applications - With the emergence of a large number of video resources, video action recognition is attracting much attention. Recently, realizing the outstanding performance...  相似文献   

4.
Efficient modeling of actions is critical for recognizing human actions. Recently, bag of video words (BoVW) representation, in which features computed around spatiotemporal interest points are quantized into video words based on their appearance similarity, has been widely and successfully explored. The performance of this representation however, is highly sensitive to two main factors: the granularity, and therefore, the size of vocabulary, and the space in which features and words are clustered, i.e., the distance measure between data points at different levels of the hierarchy. The goal of this paper is to propose a representation and learning framework that addresses both these limitations.We present a principled approach to learning a semantic vocabulary from a large amount of video words using Diffusion Maps embedding. As opposed to flat vocabularies used in traditional methods, we propose to exploit the hierarchical nature of feature vocabularies representative of human actions. Spatiotemporal features computed around interest points in videos form the lowest level of representation. Video words are then obtained by clustering those spatiotemporal features. Each video word is then represented by a vector of Pointwise Mutual Information (PMI) between that video word and training video clips, and is treated as a mid-level feature. At the highest level of the hierarchy, our goal is to further cluster the mid-level features, while exploiting semantically meaningful distance measures between them. We conjecture that the mid-level features produced by similar video sources (action classes) must lie on a certain manifold. To capture the relationship between these features, and retain it during clustering, we propose to use diffusion distance as a measure of similarity between them. The underlying idea is to embed the mid-level features into a lower-dimensional space, so as to construct a compact yet discriminative, high level vocabulary. Unlike some of the supervised vocabulary construction approaches and the unsupervised methods such as pLSA and LDA, Diffusion Maps can capture local relationship between the mid-level features on the manifold. We have tested our approach on diverse datasets and have obtained very promising results.  相似文献   

5.
近年来,利用计算机技术实现基于多模态数据的情绪识别成为自然人机交互和人工智能领域重要 的研究方向之一。利用视觉模态信息的情绪识别工作通常都将重点放在脸部特征上,很少考虑动作特征以及融合 动作特征的多模态特征。虽然动作与情绪之间有着紧密的联系,但是从视觉模态中提取有效的动作信息用于情绪 识别的难度较大。以动作与情绪的关系作为出发点,在经典的 MELD 多模态情绪识别数据集中引入视觉模态的 动作数据,采用 ST-GCN 网络模型提取肢体动作特征,并利用该特征实现基于 LSTM 网络模型的单模态情绪识别。 进一步在 MELD 数据集文本特征和音频特征的基础上引入肢体动作特征,提升了基于 LSTM 网络融合模型的多 模态情绪识别准确率,并且结合文本特征和肢体动作特征提升了上下文记忆模型的文本单模态情绪识别准确率, 实验显示虽然肢体动作特征用于单模态情绪识别的准确度无法超越传统的文本特征和音频特征,但是该特征对于 多模态情绪识别具有重要作用。基于单模态和多模态特征的情绪识别实验验证了人体动作中含有情绪信息,利用 肢体动作特征实现多模态情绪识别具有重要的发展潜力。  相似文献   

6.
Ongoing human action recognition is a challenging problem that has many applications, such as video surveillance, patient monitoring, human–computer interaction, etc. This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data. Unlike the after-the-fact classification of completed activities, this work aims at achieving early recognition of ongoing activities. The proposed method is time efficient as it is based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance. The histograms are then compared with the Bhattacharyya distance and warped by a dynamic time warping process to achieve their optimal alignment. This process, implemented by our dynamic programming-based solution, has the advantage of allowing some stretching flexibility to accommodate for possible action length changes. We have shown the success and effectiveness of our solution by testing it on large datasets and comparing it with several state-of-the-art methods. In particular, we were able to achieve excellent recognition rates that have outperformed many well known methods.  相似文献   

7.
Pattern Analysis and Applications - Human action recognition from realistic video data constitutes a challenging and relevant research area. Leading the state of the art we can find those methods...  相似文献   

8.
This paper presents a novel and efficient framework for human action recognition based on modeling the motion of human body-parts. Intuitively, a collective understanding of human body-part movements can lead to better understanding and representation of any human action. In this paper, we propose a generative representation of the motion of human body-parts to learn and classify human actions. The proposed representation combines the advantages of both local and global representations, encoding the relevant motion information as well as being robust to local appearance changes. Our work is motivated by the pictorial structures model and the framework of sparse representations for recognition. Human body-part movements are represented efficiently through quantization in the polar space. The key discrimination within each action is efficiently encoded by sparse representation for classification. The proposed framework is evaluated on both the KTH and the UCF Sport action datasets and results compared against several state-of-the-art methods.  相似文献   

9.
Pattern Analysis and Applications - Recently, deep neural networks (DNNs) have shown the remarkable success of feature representations in computer vision, audio analysis, and natural language...  相似文献   

10.
Multimedia Tools and Applications - Currently RNN-based methods achieve excellent performance on action recognition using skeletons. But the inputs of these approaches are limited to coordinates of...  相似文献   

11.
动作识别中局部时空特征的运动表示方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
近年来,基于局部时空特征的运动表征方法已被越来越多地运用于视频中的动作识别问题,相关研究人员已经提出了多种特征检测和描述方法,并取得了良好的效果。但上述方法在适应摄像头移动、光照以及穿着变化等方面还存在明显不足。为此,提出了基于时空兴趣点局部时空特征的运动表示方法,实现了基于时空单词的动作识别。首先采用基于Gabor滤波器和Gaussian滤波器相结合的检测算法从视频中提取时空兴趣点,然后抽取兴趣点的静态特征、运动特征和时空特征,并分别对运动进行表征,最后利用基于时空码本的动作分类器对动作进行分类识别。在Weizmann和KTH两个行为数据集进行了测试,实验结果表明:基于时空特征的运动表示能够更好地适应摄像头移动、光照变化以及施动者的穿着和动作差异等环境因素的影响,取得更好的识别效果。  相似文献   

12.
Yi  Yun  Wang  Hanli  Zhang  Bowen 《Multimedia Tools and Applications》2017,76(18):18891-18913
Multimedia Tools and Applications - Human action recognition in realistic videos is an important and challenging task. Recent studies demonstrate that multi-feature fusion can significantly improve...  相似文献   

13.
In this paper,we address the problem of recognizing human actions from videos.Most of the existing approaches employ low-level features(e.g.,local features and global features)to represent an action video.However,algorithms based on low-level features are not robust to complex environments such as cluttered background,camera movement and illumination change.Therefore,we propose a novel random forest learning framework to construct a discriminative and informative mid-level feature from low-level features of densely sampled 3D cuboids.Each cuboid is classified by the corresponding random forests with a novel fusion scheme,and the cuboid’s posterior probabilities of all categories are normalized to generate a histogram.After that,we obtain our mid-level feature by concatenating histograms of all the cuboids.Since a single low-level feature is not enough to capture the variations of human actions,multiple complementary low-level features(i.e.,optical flow and histogram of gradient 3D features)are employed to describe 3D cuboids.Moreover,temporal context between local cuboids is exploited as another type of low-level feature.The above three low-level features(i.e.,optical flow,histogram of gradient 3D features and temporal context)are effectively fused in the proposed learning framework.Finally,the mid-level feature is employed by a random forest classifier for robust action recognition.Experiments on the Weizmann,UCF sports,Ballet,and multi-view IXMAS datasets demonstrate that out mid-level feature learned from multiple low-level features can achieve a superior performance over state-of-the-art methods.  相似文献   

14.
Chen  Yanfang  Wang  Liwei  Li  Chuankun  Hou  Yonghong  Li  Wanqing 《Multimedia Tools and Applications》2020,79(3-4):1707-1725

With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.

  相似文献   

15.
Human action recognition is a challenging computer vision task and many efforts have been made to improve the performance. Most previous work has concentrated on the hand-crafted features or spatial-temporal features learned from multiple contiguous frames. In this paper, we present a dual-channel model to decouple the spatial and temporal feature extraction. More specifically, we propose to capture the complementary static form information from single frame and dynamic motion information from multi-frame differences in two separate channels. In both channels we use two stacked classical subspace networks to learn hierarchical representations, which are subsequently fused for action recognition. Our model is trained and evaluated on three typical benchmarks: KTH, UCF and Hollywood2 datasets. The experimental results illustrate that our approach achieves comparable performances to the state-of-the-art methods. In addition, both feature analysis and control experiments are also carried out to demonstrate the effectiveness of the proposed approach for feature extraction and thereby action recognition.  相似文献   

16.
A key assumption of traditional machine learning approach is that the test data are draw from the same distribution as the training data. However, this assumption does not hold in many real-world scenarios. For example, in facial expression recognition, the appearance of an expression may vary significantly for different people. As a result, previous work has shown that learning from adequate person-specific data can improve the expression recognition performance over the one from generic data. However, person-specific data is typically very sparse in real-world applications due to the difficulties of data collection and labeling, and learning from sparse data may suffer from serious over-fitting. In this paper, we propose to learn a person-specific model through transfer learning. By transferring the informative knowledge from other people, it allows us to learn an accurate model for a new subject with only a small amount of person-specific data. We conduct extensive experiments to compare different person-specific models for facial expression and action unit (AU) recognition, and show that transfer learning significantly improves the recognition performance with a small amount of training data.  相似文献   

17.
18.

Human action recognition with a dual-stream architecture using linear dynamical systems (LDSs) approach is discussed in this paper. First, a slice process is established to extract original slices from video sequences. Two slicing methods are adopted to subtract or reserve the remaining frames in the video sequences. By applying background subtraction to adjacent frames of the original slices, difference slices are also expressed. To capture the spatial component of the background and difference expressed in each slice simultaneously, a framework based on pre-trained convolutional neural networks (CNNs) is introduced for dual-stream deep feature extraction. Subsequently, LDSs are established to model the timing relationship between adjacent slices and obtain the temporal component of the background and difference features, which are expressed as linear dynamical background feature (LD-BF) and linear dynamical difference feature (LD-DF). Practical experiments were conducted to demonstrate the effectiveness and robustness of the proposed approach using different datasets. Specifically, our experiments were conducted on the UCF50, UCF101, and hmdb51 datasets. The impact of retaining various principal component analysis (PCA) feature dimensions and distinct slicing methods in terms of detail recognition were evaluated. In particular, combining LD-BF with LD-DF under appropriate feature dimensions and slicing methods further improved the accuracy for the UCF50, UCF101, and hmdb51 datasets. In addition, the computational cost of the feature extraction process was evaluated to illustrate the efficiency of the proposed approach. The experimental results show that the proposed approach is competitive with state-of-the-art approaches in the three datasets.

  相似文献   

19.
International Journal on Document Analysis and Recognition (IJDAR) - Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant...  相似文献   

20.
Li  Jianxin  Liu  Minjie  Ma  Dongliang  Huang  Jinyu  Ke  Min  Zhang  Tao 《The Journal of supercomputing》2020,76(3):2139-2157
The Journal of Supercomputing - Human action recognition under complex environment is a challenging work, while in deep learning and in these specific difficulty recognition tasks, the multi-label...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号