首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 0 毫秒
1.
Modeling and reasoning of the interactions between multiple entities (actors and objects) are beneficial for the action recognition task. In this paper, we propose a 3D Deformable Convolution Temporal Reasoning (DCTR) network to model and reason about the latent relationship dependencies between different entities in videos. The proposed DCTR network consists of a spatial modeling module and a temporal reasoning module. The spatial modeling module uses 3D deformable convolution to capture relationship dependencies between different entities in the same frame, while the temporal reasoning module uses Conv-LSTM to reason about the changes of multiple entity relationship dependencies in the temporal dimension. Experiments on the Moments-in-Time dataset, UCF101 dataset and HMDB51 dataset demonstrate that the proposed method outperforms several state-of-the-art methods.  相似文献   

2.
Much of the existing work on action recognition combines simple features with complex classifiers or models to represent an action. Parameters of such models usually do not have any physical meaning nor do they provide any qualitative insight relating the action to the actual motion of the body or its parts. In this paper, we propose a new representation of human actions called sequence of the most informative joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action based on highly interpretable measures such as the mean or variance of joint angle trajectories. We then represent the action as a sequence of these most informative joints. Experiments on multiple databases show that the SMIJ representation is discriminative for human action recognition and performs better than several state-of-the-art algorithms.  相似文献   

3.
Analysis of human behavior through visual information has been one of the active research areas in computer vision community during the last decade. Vision-based human action recognition (HAR) is a crucial part of human behavior analysis, which is also of great demand in a wide range of applications. HAR was initially performed via images from a conventional camera; however, depth sensors have recently embedded as an additional informative resource to cameras. In this paper, we have proposed a novel approach to largely improve the performance of human action recognition using Complex Network-based feature extraction from RGB-D information. Accordingly, the constructed complex network is employed for single-person action recognition from skeletal data consisting of 3D positions of body joints. The indirect features help the model cope with the majority of challenges in action recognition. In this paper, the meta-path concept in the complex network has been presented to lessen the unusual actions structure challenges. Further, it boosts recognition performance. The extensive experimental results on two widely adopted benchmark datasets, the MSR-Action Pairs, and MSR Daily Activity3D indicate the efficiency and validity of the method.  相似文献   

4.
5.
6.
在基于视频图像的动作识别中,由于固定视角相机所获取的不同动作视频存在视角差异,会造成识别准确率降低等问题。使用多视角视频图像是提高识别准确率的方法之一,提出基于三维残差网络(3D Residual Network,3D ResNet)和长短时记忆(Long Short-term Memory,LSTM)网络的多视角人体动作识别算法,通过3D ResNet学习各视角动作序列的融合时空特征,利用多层LSTM网络继续学习视频流中的长期活动序列表示并深度挖掘视频帧序列之间的时序信息。在NTU RGB+D 120数据集上的实验结果表明,该模型对多视角视频序列动作识别的准确率可达83.2%。  相似文献   

7.
刘桂玉  刘佩林  钱久超 《信息技术》2020,(5):121-124,130
基于3D骨架的动作识别技术现已成为人机交互的重要手段。为了提高3D动作识别的精度,文中提出一种将3D骨架特征和2D图片特征进行融合的双流神经网络。其中一个网络处理3D骨架序列,另一个网络处理2D图片。最后再将二者的特征进行融合,以提高识别精度。相较于单独使用3D骨架的动作识别,文中所使用的方法在NTU_RGBD数据集以及SYSU数据集上都有了很大的精度提升。  相似文献   

8.
In this paper we introduce a novel method for action/movement recognition in motion capture data. The joints orientation angles and the forward differences of these angles in different temporal scales are used to represent a motion capture sequence. Initially K-means is applied on training data to discover the most representative patterns on orientation angles and their forward differences. A novel K-means variant that takes into account the periodic nature of angular data is applied on the former. Each frame is then assigned to one or more of these patterns and histograms that describe the frequency of occurrence of these patterns for each movement are constructed. Nearest neighbour and SVM classification are used for action recognition on the test data. The effectiveness and robustness of this method is shown through extensive experimental results on four standard databases of motion capture data and various experimental setups.  相似文献   

9.
10.
Under the condition of weak light or no light, the recognition accuracy of the mature 2D face recognition technology decreases sharply. In this paper, a face recognition algorithm based on the matching of 3D face data and 2D face images is proposed. Firstly, 3D face data is reconstructed from the 2D face in the database based on the 3DMM algorithm, and the face depth image is obtained through orthogonal projection. Then, the average curvature map of the face depth image is used to enhance the data of the depth image. Finally, an improved residual neural network based on the depth image and curvature is designed to compare the scanned face with the face in the database. The method proposed in this paper is tested on the 3D face data in three public face datasets (Texas 3DFRD, FRGC v2.0, and Lock3DFace), and the recognition accuracy is 84.25%, 83.39%, and 78.24%, respectively.  相似文献   

11.
With the rapid development of portable digital video equipment, such as camcorders, digital cameras and smart phones, video stabilization techniques for camera de-shaking are strongly required. The cutting-edge video stabilization techniques provide outstanding visual quality by utilizing 3D motion, while early video stabilization is based on 2D motion only. Recently, a content-preserving warping algorithm has been acknowledged as state-of-the-art thanks to its superior stabilization performance. However, the huge computational cost of this technique is a serious burden in spite of its excellent performance. Thus, we propose a fast video stabilization algorithm that provides significantly reduced computational complexity over the state-of-the-art with the same stabilization performance. First, we estimate the 3D information of the feature points in each input frame and define the region of interest (ROI) based on the estimated 3D information. Next, if the number of feature points in the ROI is sufficient, we apply the proposed ROI-based pre-warping and content-preserving warping sequentially to the input frame. Otherwise, conventional full-frame warping is applied. From intensive simulation results, we find that the proposed algorithm reduces computational complexity to 14% of that of the state-of-the-art method, while keeping almost equivalent stabilization performance.  相似文献   

12.
基于三维模型的前视红外目标匹配识别方法   总被引:4,自引:1,他引:3  
针对前视红外图像中地面固定目标的识别问题,提出了一种基于三维模型的匹配识别方法.首先由场景的3D数据建立目标三维模型,并以人工标记的方式进行编号以保留交界线信息;然后根据实时观测参数进行二维投影绘制得到目标的二维模板图像;最后提取边缘加权HOG特征在观测图像中进行匹配.对大量实测数据的实验结果表明,该方法识别精度高、对...  相似文献   

13.
三维采集设备的快速发展,极大推动了三维数据技术的研究。其中,以三维人脸数据为载体的三维面部表情识别研究成果不断涌现。三维面部表情识别可以极大克服二维识别中的姿态和光照变化等方面问题。对三维表情识别技术进行了系统概括,尤其针对三维表情的关键技术,即对表情特征提取、表情编码分类及表情数据库进行了总结分析,并提出了三维表情识别的研究建议。三维面部表情识别技术在识别率上基本满足要求,但实时性上需要进一步优化。相关内容对该领域的研究具有指导意义。  相似文献   

14.
Human activity prediction has become increasingly valuable in many applications. This paper, initially from the perspective of cognition science, presents a novel approach to learning a hierarchical spatio-temporal pattern of human activities to predict ongoing activities from videos that contain only the onsets of the activities. Spatio-temporal pattern can be learned by a Hierarchical Self-Organizing Map (HSOM), which consists of two self-organizing maps (i.e., action map and actionlet map) connected via associative links trained by Hebbian learning. Ongoing activities can be predicted by Variable order Markov Model (VMM), which provides the means for capturing both large and small order Markov dependencies based on the training actionlet sequences. Experiments of the proposed method on four challenging 3D action datasets captured by commodity depth cameras show promising results.  相似文献   

15.
16.
The research of emotion recognition based on electroencephalogram (EEG) signals often ignores the relatedinformation between the brain electrode channels and the contextual emotional information existing in EEG signals,which may contain important characteristics related to emotional states. Aiming at the above defects, aspatiotemporal emotion recognition method based on a 3-dimensional (3D) time-frequency domain feature matrixwas proposed. Specifically, the extracted time-frequency domain EEG features are first expressed as a 3D matrixformat according to the actual position of the cerebral cortex. Then, the input 3D matrix is processed successivelyby multivariate convolutional neural network (MVCNN) and long short-term memory (LSTM) to classify theemotional state. Spatiotemporal emotion recognition method is evaluated on the DEAP data set, and achievedaccuracy of 87.58% and 88.50% on arousal and valence dimensions respectively in binary classification tasks, aswell as obtained accuracy of 84.58% in four class classification tasks. The experimental results show that 3D matrixrepresentation can represent emotional information more reasonably than two-dimensional (2D). In addition,MVCNN and LSTM can utilize the spatial information of the electrode channels and the temporal context information of the EEG signal respectively.  相似文献   

17.
In this paper, we propose a novel approach for key frames extraction on human action recognition from 3D video sequences. To represent human actions, an Energy Feature (EF), combining kinetic energy and potential energy, is extracted from 3D video sequences. A Self-adaptive Weighted Affinity Propagation (SWAP) algorithm is then proposed to extract the key frames. Finally, we employ SVM to recognize human actions on the EFs of selected key frames. The experiments show the information including whole action course can be effectively extracted by our method, and we obtain good recognition performance without losing classification accuracy. Moreover, the recognition speed is greatly improved.  相似文献   

18.
目前常用的超声3D目标识别方法主要是利用传感器在空间一点或多点获取一维回波,通过信号处理得到目标体3D信息以实现3D目标体识别。这些方法普遍存在识别率低和鲁棒性差的问题,制约了该项技术的推广和应用。为此,文中提出了一种基于可视化和非可视化特征融合的超声3D目标体识别方法,该方法将目标体回波信号处理方法与合成孔径方法相结合,将提取的目标体信息在特征层进行了融合,然后经BP神经网络实现了分类识别,可使现有方法的不足得到显著改善。通过对3类人工靶标的实验表明,该方法可显著提高缺陷的3D识别率,能够保持在90%以上,且鲁棒性也得到明显改善。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号