期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coupled Action Recognition and Pose Estimation from Multiple Views

Angela Yao Juergen Gall Luc Van Gool 《International Journal of Computer Vision》2012,100(1):16-37

Action recognition and pose estimation are two closely related topics in understanding human body movements; information from one task can be leveraged to assist the other, yet the two are often treated separately. We present here a framework for coupled action recognition and pose estimation by formulating pose estimation as an optimization over a set of action-specific manifolds. The framework allows for integration of a 2D appearance-based action recognition system as a prior for 3D pose estimation and for refinement of the action labels using relational pose features based on the extracted 3D poses. Our experiments show that our pose estimation system is able to estimate body poses with high degrees of freedom using very few particles and can achieve state-of-the-art results on the HumanEva-II benchmark. We also thoroughly investigate the impact of pose estimation and action recognition accuracy on each other on the challenging TUM kitchen dataset. We demonstrate not only the feasibility of using extracted 3D poses for action recognition, but also improved performance in comparison to action recognition using low-level appearance features. 相似文献

2.

Human activity recognition using multidimensional indexing 总被引：12，自引：0，他引：12

Ben-Arie J. Zhiqian Wang Pandit P. Rajaram S. 《IEEE transactions on pattern analysis and machine intelligence》2002,24(8):1091-1104

In this paper, we develop a novel method for view-based recognition of human action/activity from videos. By observing just a few frames, we can identify the activity that takes place in a video sequence. The basic idea of our method is that activities can be positively identified from a sparsely sampled sequence of a few body poses acquired from videos. In our approach, an activity is represented by a set of pose and velocity vectors for the major body parts (hands, legs, and torso) and stored in a set of multidimensional hash tables. We develop a theoretical foundation that shows that robust recognition of a sequence of body pose vectors can be achieved by a method of indexing and sequencing and it requires only a few pose vectors (i.e., sampled body poses in video frames). We find that the probability of false alarm drops exponentially with the increased number of sampled body poses. So, matching only a few body poses guarantees high probability for correct recognition. Our approach is parallel, i.e., all possible model activities are examined at one indexing operation. In addition, our method is robust to partial occlusion since each body part is indexed separately. We use a sequence-based voting approach to recognize the activity invariant to the activity speed. 相似文献

3.

Three-dimensional view-invariant face recognition using a hierarchical pose-normalization strategy

Martin D. Levine Ajit Rajwade 《Machine Vision and Applications》2006,17(5):309-325

Face recognition from three-dimensional (3D) shape data has been proposed as a method of biometric identification as a way of either supplanting or reinforcing a two-dimensional approach. This paper presents a 3D face recognition system capable of recognizing the identity of an individual from a 3D facial scan in any pose across the view-sphere, by suitably comparing it with a set of models (all in frontal pose) stored in a database. The system makes use of only 3D shape data, ignoring textural information completely. Firstly, we propose a generic learning strategy using support vector regression [Burges, Data Mining Knowl Discov 2(2): 121–167, 1998] to estimate the approximate pose of a 3D head. The support vector machine (SVM) is trained on range images in several poses belonging to only a small set of individuals and is able to coarsely estimate the pose of any unseen facial scan. Secondly, we propose a hierarchical two-step strategy to normalize a facial scan to a nearly frontal pose before performing any recognition. The first step consists of either a coarse normalization making use of facial features or the generic learning algorithm using the SVM. This is followed by an iterative technique to refine the alignment to the frontal pose, which is basically an improved form of the Iterated Closest Point Algorithm [Besl and Mckay, IEEE Trans Pattern Anal Mach Intell 14(2):239–256, 1992]. The latter step produces a residual error value, which can be used as a metric to gauge the similarity between two faces. Our two-step approach is experimentally shown to outperform both of the individual normalization methods in terms of recognition rates, over a very wide range of facial poses. Our strategy has been tested on a large database of 3D facial scans in which the training and test images of each individual were acquired at significantly different times, unlike all except two of the existing 3D face recognition methods. 相似文献

4.

Real-time 3D human pose recovery from a single depth image using principal direction analysis

Dong-Luong Dinh Myeong-Jun Lim Nguyen Duc Thang Sungyoung Lee Tae-Seong Kim 《Applied Intelligence》2014,41(2):473-486

In this paper, we present a novel approach to recover a 3D human pose in real-time from a single depth image using principal direction analysis (PDA). Human body parts are first recognized from a human depth silhouette via trained random forests (RFs). PDA is applied to each recognized body part, which is presented as a set of points in 3D, to estimate its principal direction. Finally, a 3D human pose is recovered by mapping the principal direction to each body part of a 3D synthetic human model. We perform both quantitative and qualitative evaluations of our proposed 3D human pose recovering methodology. We show that our proposed approach has a low average reconstruction error of 7.07 degrees for four key joint angles and performs more reliably on a sequence of unconstrained poses than conventional methods. In addition, our methodology runs at a speed of 20 FPS on a standard PC, indicating that our system is suitable for real-time applications. Our 3D pose recovery methodology is applicable to applications ranging from human computer interactions to human activity recognition. 相似文献

5.

Multiperson interaction recognition in images: A body keypoint based feature image analysis

Amit Verma Toshanlal Meenpal Bibhudendra Acharya 《Computational Intelligence》2021,37(1):461-483

Most interaction recognition approaches have been limited to single‐person action classification in videos. However, for still images where motion information is not available, the task becomes more complex. Aiming to this point, we propose an approach for multiperson human interaction recognition in images with keypoint‐based feature image analysis. Proposed method is a three‐stage framework. In the first stage, we propose feature‐based neural network (FCNN) for action recognition trained with feature images. Feature images are body features, that is, effective distances between a set of body part pairs and angular relation between body part triplets, rearranged in 2D gray‐scale image to learn effective representation of complex actions. In the later stage, we propose a voting‐based method for direction encoding to anticipate probable motion in steady images. Finally, our multiperson interaction recognition algorithm identifies which human pairs are interacting with each other using an interaction parameter. We evaluate our approach on two real‐world data sets, that is, UT‐interaction and SBU kinect interaction. The empirical experiments show that results are better than the state‐of‐the‐art methods with recognition accuracy of 95.83% on UT‐I set 1, 92.5% on UT‐I set 2, and 94.28% on SBU clean data set. 相似文献

6.

Pose-Robust Facial Expression Recognition Using View-Based 2D $+$ 3D AAM 总被引：1，自引：0，他引：1

Sung J. Kim D. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2008,38(4):852-866

This paper proposes a pose-robust face tracking and facial expression recognition method using a view-based 2D 3D active appearance model (AAM) that extends the 2D 3D AAM to the view-based approach, where one independent face model is used for a specific view and an appropriate face model is selected for the input face image. Our extension has been conducted in many aspects. First, we use principal component analysis with missing data to construct the 2D 3D AAM due to the missing data in the posed face images. Second, we develop an effective model selection method that directly uses the estimated pose angle from the 2D 3D AAM, which makes face tracking pose-robust and feature extraction for facial expression recognition accurate. Third, we propose a double-layered generalized discriminant analysis (GDA) for facial expression recognition. Experimental results show the following: 1) The face tracking by the view-based 2D 3D AAM, which uses multiple face models with one face model per each view, is more robust to pose change than that by an integrated 2D 3D AAM, which uses an integrated face model for all three views; 2) the double-layered GDA extracts good features for facial expression recognition; and 3) the view-based 2D 3D AAM outperforms other existing models at pose-varying facial expression recognition. 相似文献

7.

Trajectory-based human action segmentation

Luís Santos Kamrad Khoshhal Jorge Dias 《Pattern recognition》2015

This paper proposes a sliding window approach, whose length and time shift are dynamically adaptable in order to improve model confidence, speed and segmentation accuracy in human action sequences. Activity recognition is the process of inferring an action class from a set of observations acquired by sensors. We address the temporal segmentation problem of body part trajectories in Cartesian Space in which features are generated using Discrete Fast Fourier Transform (DFFT) and Power Spectrum (PS). We pose this as an entropy minimization problem. Using entropy from the classifier output as a feedback parameter, we continuously adjust the two key parameters in a sliding window approach, to maximize the model confidence at every step. The proposed classifier is a Dynamic Bayesian Network (DBN) model where classes are estimated using Bayesian inference. We compare our approach with our previously developed fixed window method. Experiments show that our method accurately recognizes and segments activities, with improved model confidence and faster convergence times, exhibiting anticipatory capabilities. Our work demonstrates that entropy feedback mitigates variability problems, and our method is applicable in research areas where action segmentation and classification is used. A working demo source code is provided online for academical dissemination purposes, by requesting the authors. 相似文献

8.

Simultaneous particle tracking in multi-action motion models with synthesized paths

Norimichi Ukita 《Image and vision computing》2013,31(6-7):448-459

This paper proposes human motion models of multiple actions for 3D pose tracking. A training pose sequence of each action, such as walking and jogging, is separately recorded by a motion capture system and modeled independently. This independent modeling of action-specific motions allows us 1) to optimize each model in accordance with only its respective motion and 2) to improve the scalability of the models. Unlike existing approaches with similar motion models (e.g. switching dynamical models), our pose tracking method uses the multiple models simultaneously for coping with ambiguous motions. For robust tracking with the multiple models, particle filtering is employed so that particles are distributed simultaneously in the models. Efficient use of the particles can be achieved by locating many particles in the model corresponding to an action that is currently observed. For transferring the particles among the models in quick response to changes in the action, transition paths are synthesized between the different models in order to virtually prepare inter-action motions. Experimental results demonstrate that the proposed models improve accuracy in pose tracking. 相似文献

9.

基于姿态估计与GRU网络的人体康复动作识别

闫航陈刚佟瑶姬波胡北辰《计算机工程》2021,47(1):12-20

康复锻炼是脑卒中患者的重要治疗方式,为提高康复动作识别的准确率与实时性,更好地辅助患者在居家环境中进行长期康复训练,结合姿态估计与门控循环单元(GRU)网络提出一种人体康复动作识别算法Pose-AMGRU。采用OpenPose姿态估计方法从视频帧中提取骨架关节点,经过姿态数据预处理后得到表达肢体运动的关键动作特征,并利用注意力机制构建融合三层时序特征的GRU网络实现人体康复动作分类。实验结果表明,该算法在KTH和康复动作数据集中的识别准确率分别为98.14%和100%,且在GTX1060显卡上的运行速度达到14.23frame/s,具有较高的识别准确率与实时性。相似文献

10.

2D representation of facial surfaces for multi-pose 3D face recognition

Yan-Ning ZhangZhe Guo Yong Xia Zeng-Gang LinDavid Dagan Feng 《Pattern recognition letters》2012,33(5):530-536

The increasing availability of 3D facial data offers the potential to overcome the intrinsic difficulties faced by conventional face recognition using 2D images. Instead of extending 2D recognition algorithms for 3D purpose, this letter proposes a novel strategy for 3D face recognition from the perspective of representing each 3D facial surface with a 2D attribute image and taking the advantage of the advances in 2D face recognition. In our approach, each 3D facial surface is mapped homeomorphically onto a 2D lattice, where the value at each site is an attribute that represents the local 3D geometrical or textural properties on the surface, therefore invariant to pose changes. This lattice is then interpolated to generate a 2D attribute image. 3D face recognition can be achieved by applying the traditional 2D face recognition techniques to obtained attribute images. In this study, we chose the pose invariant local mean curvature calculated at each vertex on the 3D facial surface to construct the 2D attribute image and adopted the eigenface algorithm for attribute image recognition. We compared our approach to state-of-the-art 3D face recognition algorithms in the FRGC (Version 2.0), GavabDB and NPU3D database. Our results show that the proposed approach has improved the robustness to head pose variation and can produce more accurate 3D multi-pose face recognition. 相似文献

11.

Aligned texture map creation for pose invariant face recognition

Antonio Rama Francesc Tarrés Jürgen Rurainsky 《Multimedia Tools and Applications》2010,49(3):545-565

In last years, Face recognition based on 3D techniques is an emergent technology which has demonstrated better results than conventional 2D approaches. Using texture (180° multi-view image) and depth maps is supposed to increase the robustness towards the two main challenges in Face Recognition: Pose and illumination. Nevertheless, 3D data should be acquired under highly controlled conditions and in most cases depends on the collaboration of the subject to be recognized. Thus, in applications such as surveillance or control access points, this kind of 3D data may not be available during the recognition process. This leads to a new paradigm using some mixed 2D-3D face recognition systems where 3D data is used in the training but either 2D or 3D information can be used in the recognition depending on the scenario. Following this concept, where only part of the information (partial concept) is used in the recognition, a novel method is presented in this work. This has been called Partial Principal Component Analysis (P²CA) since they fuse the Partial concept with the fundamentals of the well known PCA algorithm. This strategy has been proven to be very robust in pose variation scenarios showing that the 3D training process retains all the spatial information of the face while the 2D picture effectively recovers the face information from the available data. Furthermore, in this work, a novel approach for the automatic creation of 180° aligned cylindrical projected face images using nine different views is presented. These face images are created by using a cylindrical approximation for the real object surface. The alignment is done by applying first a global 2D affine transformation of the image, and afterward a local transformation of the desired face features using a triangle mesh. This local alignment allows a closer look to the feature properties and not the differences. Finally, these aligned face images are used for training a pose invariant face recognition approach (P²CA). 相似文献

12.

基于人体行为3D模型的2D行为识别 总被引：5，自引：1，他引：4

谷军霞丁晓青王生进《自动化学报》2010,36(1):46-53

针对行为识别中行为者朝向变化带来的问题, 提出了一种基于人体行为3D模型的2D行为识别算法. 在学习行为分类器时, 以3D占据网格表示行为样本, 提取人体3D关节点作为描述行为的特征, 为每一类行为训练一个基于范例的隐马尔可夫模型(Exemplar-based hidden Markov model, EHMM), 同时从3D行为样本中选取若干帧作为3D关键姿势集, 这个集合是连接2D观测样本和人体3D关节点特征的桥梁. 在识别2D行为时, 2D观测样本序列可以由一个或多个非标定的摄像机采集. 首先在3D关键姿势集中为每一帧2D观测样本寻找与之最匹配的3D关键姿势帧, 之后由行为分类器对2D观测样本序列对应的3D关键姿势序列进行识别. 该算法在训练行为分类器时要进行行为者的3D重构和人体3D关节点的提取, 而在识别2D行为时不再需要进行3D重构. 通过在3个数据库上的实验, 证明该算法可以有效识别行为者在任意朝向下的行为, 并可以适应不同的行为采集环境. 相似文献

13.

Tracking in object action space

Volker Krüger Dennis Herzog 《Computer Vision and Image Understanding》2013,117(7):764-789

In this paper we focus on the joint problem of tracking humans and recognizing human action in scenarios such as a kitchen scenario or a scenario where a robot cooperates with a human, e.g., for a manufacturing task. In these scenarios, the human directly interacts with objects physically by using/manipulating them or by, e.g., pointing at them such as in “Give me that…”. To recognize these types of human actions is difficult because (a) they ought to be recognized independent of scene parameters such as viewing direction and (b) the actions are parametric, where the parameters are either object-dependent or as, e.g., in the case of a pointing direction convey important information. One common way to achieve recognition is by using 3D human body tracking followed by action recognition based on the captured tracking data. For the kind of scenarios considered here we would like to argue that 3D body tracking and action recognition should be seen as an intertwined problem that is primed by the objects on which the actions are applied. In this paper, we are looking at human body tracking and action recognition from a object-driven perspective. Instead of the space of human body poses we consider the space of the object affordances, i.e., the space of possible actions that are applied on a given object. This way, 3D body tracking reduces to action tracking in the object (and context) primed parameter space of the object affordances. This reduces the high-dimensional joint-space to a low-dimensional action space. In our approach, we use parametric hidden Markov models to represent parametric movements; particle filtering is used to track in the space of action parameters. We demonstrate its effectiveness on synthetic and on real image sequences using human-upper body single arm actions that involve objects. 相似文献

14.

多模态数据的行为识别综述

下载免费PDF全文

王帅琛黄倩张云飞李兴聂云清雒国萃《中国图象图形学报》2022,27(11):3139-3159

行为识别是当前计算机视觉方向中视频理解领域的重要研究课题。从视频中准确提取人体动作的特征并识别动作,能为医疗、安防等领域提供重要的信息,是一个十分具有前景的方向。本文从数据驱动的角度出发,全面介绍了行为识别技术的研究发展,对具有代表性的行为识别方法或模型进行了系统阐述。行为识别的数据分为RGB模态数据、深度模态数据、骨骼模态数据以及融合模态数据。首先介绍了行为识别的主要过程和人类行为识别领域不同数据模态的公开数据集;然后根据数据模态分类,回顾了RGB模态、深度模态和骨骼模态下基于传统手工特征和深度学习的行为识别方法,以及多模态融合分类下RGB模态与深度模态融合的方法和其他模态融合的方法。传统手工特征法包括基于时空体积和时空兴趣点的方法（RGB模态）、基于运动变化和外观的方法（深度模态）以及基于骨骼特征的方法（骨骼模态）等;深度学习方法主要涉及卷积网络、图卷积网络和混合网络,重点介绍了其改进点、特点以及模型的创新点。基于不同模态的数据集分类进行不同行为识别技术的对比分析。通过类别内部和类别之间两个角度对比分析后,得出不同模态的优缺点与适用场景、手工特征法与深度学习法的区别和融合多模态的优势。最后,总结了行为识别技术当前面临的问题和挑战,并基于数据模态的角度提出了未来可行的研究方向和研究重点。相似文献

15.

Scanning 3D full human bodies using Kinects 总被引：3，自引：0，他引：3

Tong J Zhou J Liu L Pan Z Yan H 《IEEE transactions on visualization and computer graphics》2012,18(4):643-650

Depth camera such as Microsoft Kinect, is much cheaper than conventional 3D scanning devices, and thus it can be acquired for everyday users easily. However, the depth data captured by Kinect over a certain distance is of extreme low quality. In this paper, we present a novel scanning system for capturing 3D full human body models by using multiple Kinects. To avoid the interference phenomena, we use two Kinects to capture the upper part and lower part of a human body respectively without overlapping region. A third Kinect is used to capture the middle part of the human body from the opposite direction. We propose a practical approach for registering the various body parts of different views under non-rigid deformation. First, a rough mesh template is constructed and used to deform successive frames pairwisely. Second, global alignment is performed to distribute errors in the deformation space, which can solve the loop closure problem efficiently. Misalignment caused by complex occlusion can also be handled reasonably by our global alignment algorithm. The experimental results have shown the efficiency and applicability of our system. Our system obtains impressive results in a few minutes with low price devices, thus is practically useful for generating personalized avatars for everyday users. Our system has been used for 3D human animation and virtual try on, and can further facilitate a range of home–oriented virtual reality (VR) applications. 相似文献

16.

3D-freehand-pose initialization based on operator’s cognitive behavioral models 总被引：1，自引：0，他引：1

Zhiquan Feng Minming Zhang Zhigeng Pan Bo Yang Tao Xu Haokui Tang Yi Li 《The Visual computer》2010,26(6-8):607-617

Tracking, recognition and interaction based on 3D freehand are a part of our virtual assembly system, in which monocular camera is used to input online freehand videos and the hand pose tracker requires a reliable initial pose in the first frame. A novel approach to initializing 3D pose and position of freehand is put forward in this paper visualization of 3D hand model and modeling the operators’ cognitive behaviors. Our approach is composed of three phases: hand posture recognition, coarse-tuning and fine-tuning. The operator moves his/her hand onto the to meet the needs of our virtual assembly system. The main contribution of this paper is that the three core techniques are for the first time integrated together, including human–computer interaction (HCI) in the process of initializing, projection of the 3D hand model in the period of coarse-tuning time. Then, the computer repeatedly fine-tunes the 3D hand model until the projection of the 3D hand model is completely superimposed onto the operator’s hand image. We focus on exploring and modeling cognitive behavior of operator’s hand upon which we design our initialization algorithm. Our research shows that cognitive behavioral models are not only beneficial to reducing cognitive loads for operators, because it makes the computers cater for the changes of the operators’ hand poses, but also helpful to address high dimensionality of articulated 3D hand model. Our experimental results also show that the approach presented in this paper is easier, more pleasurable and satisfactory experience for the operators. Our initialization system has successfully been applied to our 3D freehand tracking system and a simulation virtual assembly system. 相似文献

17.

Genetic CONDENSATION for motion tracking

Zhu Ye Zhi-Qiang Liu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2007,11(4):349-354

Tracking serves as a means to prepare data for pose estimation and action recognition. The CONDENSATION algorithm is a conditional density propagation method for motion tracking. This algorithm combines factored sampling with learned dynamic models to propagate an entire probability distributes for object position and shape over time. It can accomplish highly robust tracking of object motion. However, it usually requires a large number of samples to ensure a fair maximum likelihood estimation of the current state. In this paper, we use the mutation and crossover operators of the genetic algorithm to find appropriate samples. With this approach, we are able to improve robustness, accuracy and flexibility in CONDENSATION for visual tracking. 相似文献

18.

3D Free-Form Object Recognition Using Indexing by Contour Features 总被引：1，自引：0，他引：1

Jin-Long Chen George C. Stockman 《Computer Vision and Image Understanding》1998,71(3):334-355

We address the problem of recognizing free-form 3D objects from a single 2D intensity image. A model-based solution within the alignment paradigm is presented which involves three major schemes—modeling, matching, and indexing. The modeling scheme constructs a set of model aspects which can predict the object contour as seen from any viewpoint. The matching scheme aligns the edgemap of a candidate model to the observed edgemap using an initial approximate pose. The major contribution of this paper involves the indexing scheme and its integration with modeling and matching to perform recognition. Indexing generates hypotheses specifying both candidate model aspects and approximate pose and scale. Hypotheses are ordered by likelihood based on prior knowledge of pre-stored models and the visual evidence from the observed objects. A prototype implementation has been tested in recognition and localization experiments with a database containing 658 model aspects from twenty 3D objects and eighty 2D objects. Bench tests and simulations show that many kinds of objects can be handled accurately and efficiently even in cluttered scenes. We conclude that the proposed recognition-by-alignment paradigm is a viable approach to many 3D object recognition problems. 相似文献

19.

Efficient 3D reconstruction for face recognition

Dalong Jiang Author Vitae Yuxiao Hu Author Vitae Author Vitae Lei Zhang Author Vitae Hongjiang Zhang Author Vitae Author Vitae 《Pattern recognition》2005,38(6):787-798

Face recognition with variant pose, illumination and expression (PIE) is a challenging problem. In this paper, we propose an analysis-by-synthesis framework for face recognition with variant PIE. First, an efficient two-dimensional (2D)-to-three-dimensional (3D) integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model from a single frontal face image with neutral expression and normal illumination. Then, realistic virtual faces with different PIE are synthesized based on the personalized 3D face to characterize the face subspace. Finally, face recognition is conducted based on these representative virtual faces. Compared with other related work, this framework has following advantages: (1) only one single frontal face is required for face recognition, which avoids the burdensome enrollment work; (2) the synthesized face samples provide the capability to conduct recognition under difficult conditions like complex PIE; and (3) compared with other 3D reconstruction approaches, our proposed 2D-to-3D integrated face reconstruction approach is fully automatic and more efficient. The extensive experimental results show that the synthesized virtual faces significantly improve the accuracy of face recognition with changing PIE. 相似文献

20.

Pose Estimation of 3D Free-Form Contours

Rosenhahn Bodo Perwass Christian Sommer Gerald 《International Journal of Computer Vision》2005,62(3):267-289

相似文献