共查询到20条相似文献,搜索用时 437 毫秒
1.
Human pose estimation aims at predicting the poses of human body parts in images or videos. Since pose motions are often driven by some specific human actions, knowing the body pose of a human is critical for action recognition. This survey focuses on recent progress of human pose estimation and its application to action recognition. We attempt to provide a comprehensive review of recent bottom-up and top-down deep human pose estimation models, as well as how pose estimation systems can be used for action recognition. Thanks to the availability of commodity depth sensors like Kinect and its capability for skeletal tracking, there has been a large body of literature on 3D skeleton-based action recognition, and there are already survey papers such as [1] about this topic. In this survey, we focus on 2D skeleton-based action recognition where the human poses are estimated from regular RGB images instead of depth images. We summarize the performance of recent action recognition methods that use pose estimated from color images as input, then show that there is much room for improvements in this direction. 相似文献
2.
3.
4.
Human action recognition in videos is still an important while challenging task. Existing methods based on RGB image or optical flow are easily affected by clutters and ambiguous backgrounds. In this paper, we propose a novel Pose-Guided Inflated 3D ConvNet framework (PI3D) to address this issue. First, we design a spatial–temporal pose module, which provides essential clues for the Inflated 3D ConvNet (I3D). The pose module consists of pose estimation and pose-based action recognition. Second, for multi-person estimation task, the introduced pose estimation network can determine the action most relevant to the action category. Third, we propose a hierarchical pose-based network to learn the spatial–temporal features of human pose. Moreover, the pose-based network and I3D network are fused at the last convolutional layer without loss of performance. Finally, the experimental results on four data sets (HMDB-51, SYSU 3D, JHMDB and Sub-JHMDB) demonstrate that the proposed PI3D framework outperforms the existing methods on human action recognition. This work also shows that posture cues significantly improve the performance of I3D. 相似文献
5.
6.
基于非负稀疏表示的SAR图像目标识别方法 总被引:1,自引:0,他引:1
针对合成孔径雷达(SAR)图像目标识别中存在物体遮挡的情况,该文提出一种基于非负稀疏表示的分类方法。通过分析L0范数和L1范数最小化在求解非负稀疏表示问题上的区别,证明在一定条件下,L1范数最小化方法除了保持解的稀疏性还能得到与输入信号更加相似的原子集合,因此也更加适用于分类问题中。在运动和静止目标获取与识别(MSTAR)数据集上的识别实验结果表明,采用L1范数的非负稀疏表示分类方法能达到较好的识别性能,并且相对传统方法对存在遮挡情况下的识别问题更稳健。 相似文献
7.
Taking fully into consideration the fact that one human action can be intuitively considered as a sequence of key poses and atomic motions in a particular order, a human action recognition method using multi-layer codebooks of key poses and atomic motions is proposed in this paper. Inspired by the dynamics models of human joints, normalized relative orientations are computed as features for each limb of human body. In order to extract key poses and atomic motions precisely, feature sequences are segmented into pose feature segments and motion feature segments dynamically, based on the potential differences of feature sequences. Multi-layer codebooks of each human action are constructed with the key poses extracted from pose feature segments and the atomic motions extracted from motion feature segments associated with each two key poses. The multi-layer codebooks represent action patterns of each human action, which can be used to recognize human actions with the proposed pattern-matching method. Three classification methods are employed for action recognition based on the multi-layer codebooks. Two public action datasets, i.e., CAD-60 and MSRC-12 datasets, are used to demonstrate the advantages of the proposed method. The experimental results show that the proposed method can obtain a comparable or better performance compared with the state-of-the-art methods. 相似文献
8.
9.
数据降维是处理高维数据的有效手段。子空间学 习算法由于其计算量小,性能较为出 色而广泛应用于模式识别等领域,传统的子空间学习算法均可归纳为图嵌入算法框架中。稀 疏表达是近年来的一个研究热点,并广泛应用于信号处理和模式识别等领域,但计算复杂度 较高。在稀疏表达的基础上,研究者提出了协作表达。相比稀疏表达,协作表达算法由于其 有一个闭式解,因而计算量较小且判别性能较好,可以看成是数据表达的一种有效方法。本 文从协作表达的角度来解释图嵌入算法,将图嵌入算法看作是一类回归模型。通过最小化类 内重构误差散度的同时最大化类间重构误差散度,提出了一种新的图嵌入算法,即重构判别 分析,并将它应用于该回归模型中,然后将问题归结为一广义的特征值问题,算法在某种程 度上能有效避免子空间学习过程中矩阵的奇异性问题。在人脸识别上的实验验证了算法的正 确性和有效性。 相似文献
10.
在人脸识别中,人脸图像受到表情、光照、遮挡、姿态变化、特别是训练样本数量的影响,而现实中经常只获得少量的训练样本,由于原始样本生成虚拟样本可以增加训练样本的数量,分析提出原始样本与轴对称样本融合的协同表示算法。首先生成镜像样本与轴对称样本,再在协同表示分类器下分类,最后加权值融合,分析不同权值下的人脸识别率。实验结果显示原始样本、镜像样本与轴对称样本融合能提高识别率,而原始样本与轴对称样本融合的识别率更加优越,较原始样本,识别率提高2%~9%,比原始样本与镜像样本融合高1%~5%。结果表明本文提出方法能有效提高人脸识别率。 相似文献
11.
This paper presents a dynamic representation-based tracker (DRT) to handle occlusions in the long-term pedestrian tracking of a single target. In our DRT, an adaptive representation network (ARN) is first constructed to extract multiple features, including classical features such as appearance and pose as well as some vector-format deep features. These features are then stacked to form a dynamic representation so as to convert the target tracking into a matching problem between the target features and candidate features, where the Euclidean distance (ED) and locality-constrained linear coding (LLC) are used as measurements in the decision-making. Next, the target state is determined through a voting procedure according to the feature matching error. Finally, a pose supervised module (PSM) and an IOU filtering module (IFM) are applied, respectively, to refine the target state and to filter out some invalid candidate targets that have been detected. Experimental results on public benchmark datasets show that our DRT is quite robust to complex environments with long-term pedestrian occlusions, and outperforms several existing state-of-the-arts trackers as it produces the best performance on both the pedestrian tracking dataset with occlusion (PTDO) and the pedestrian tracking dataset with occlusion plus (PTDO Plus). 相似文献
12.
针对人脸识别算法对光照变化敏感的问题,提出一种基于光照鲁棒稀疏表示的人脸识别方法。该方法对图像作小波变换,得到光照归一化图像,通过对光照归一化后人脸图像作稀疏变换,稀疏表示分类得出测试识别结果。本文方法在Yale B人脸库上仿真实验,识别率较高,对光照、表情、遮挡具有一定的鲁棒性。 相似文献
13.
14.
Recently, various sparse representation methods have been successfully used in multi-focus image fusion. Most of them produce some spatial artifacts and blurring effects because they only consider the image local information due to the patch processing strategy. In order to reduce the spatial artifacts and blurring effects on the edge details and improve the robustness of the multi-focus image fusion, a novel fusion method based on joint convolutional analysis and synthesis (JCAS) sparse representation is presented. The JCAS model, which integrates the analysis sparse representation and the synthesis sparse representation by using convolutional operation, can effectively separate large-scale structures and fine-scale textures of a single image. First, each source image is decomposed into a base layer and a detail layer using the JCAS model. Second, a Laplacian pyramid transform method is used to fuse the base layers, and a weighted regional energy method is used to fuse the detail layers. Finally, the fused image is reconstructed by combining the fused base and detail layers. Experimental results demonstrate that the proposed method can obtain clearer edge details compared with some popular multi-focus image fusion methods, thus exhibiting state-of-the-art performance in terms of both visual quality and objective assessment. 相似文献
15.
深空探测器的功耗和体积有限,任务工况多样,与低轨道地球探测器相比,深空探测器对导航敏感器的任务能力提出了更高的需求。提出了一种基于飞行时间成像的快速位姿测量和地物目标识别技术。为了在保证位姿测量精度的前提下满足对位姿测量时间性能的需求,提出了一种基于深度信息的动态尺度估计方法。该方法提升了物方多尺度变化条件下点云配准的时间稳定性,平均配准时间缩短60%以上,平均配准精度约为0.04 m。为了满足多尺度、多形态地物目标识别的需求,使用了基于轻量化深度神经网络,可根据场景深度信息进行地物检测。结果表明,该方法可对地物特征进行快速感知,在真实场景中的准确率达到70%以上。 相似文献
16.
In this paper, we present a new approach for dynamic hand gesture recognition. Our goal is to integrate spatiotemporal features extracted from multimodal data captured by the Kinect sensor. In case the skeleton data is not provided, we apply a novel skeleton estimation method to compute temporal features. Furthermore, we introduce an effective method to extract a fixed number of keyframes to reduce the processing time. To extract pose features from RGB-D data, we take advantage of two different approaches: (1) Convolutional Neural Networks and (2) Histogram of Cumulative Magnitudes. We test different integration methods to fuse the extracted spatiotemporal features to boost recognition performance in a linear SVM classifier. Extensive experiments prove the effectiveness and feasibility of the proposed framework for hand gesture recognition. 相似文献
17.
18.
19.
为了克服核稀疏表示分类(KSRC)算法无法获取数据的局部性信息从而导致获取的稀疏表示系数判别性受到限制的不足,提出一种局部敏感的KSRC(LS-KSRC)算法用于人脸识别。通过在核特征空间中同时集成稀疏性和数据局部性信息,从而获取具有良好判别性的用于分类的稀疏表示系数。在标准的ORL人脸数据库和Extended Yale B人脸数据库的试验结果表明,本文方法的分类性能优于传统的(KSRC)算法、稀疏表示分类(SRC)算法、局部线性约束编码(LLC)、支持向量机(SVM)、最近邻法(NN)以及最近邻子空间法(NS),用于人脸识别能够取得优越的分类性能。 相似文献