首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Simultaneous tracking and action recognition for single actor human actions   总被引:1,自引:0,他引:1  
This paper presents an approach to simultaneously tracking the pose and recognizing human actions in a video. This is achieved by combining a Dynamic Bayesian Action Network (DBAN) with 2D body part models. Existing DBAN implementation relies on fairly weak observation features, which affects the recognition accuracy. In this work, we use a 2D body part model for accurate pose alignment, which in turn improves both pose estimate and action recognition accuracy. To compensate for the additional time required for alignment, we use an action entropy-based scheme to determine the minimum number of states to be maintained in each frame while avoiding sample impoverishment. In addition, we also present an approach to automation of the keypose selection task for learning 3D action models from a few annotations. We demonstrate our approach on a hand gesture dataset with 500 action sequences, and we show that compared to DBAN our algorithm achieves 6% improvement in accuracy.  相似文献   

2.
3D human pose estimation is a very difficult task. In this paper we propose that this problem can be more easily solved by first finding the solutions to a set of easier sub-problems. These are to locally estimate pose conditioned on a fixed root node state, which defines the global position and orientation of the person. The global solution can then be found using information extracted during this procedure. This approach has two key benefits: The first is that each local solution can be found by modeling the articulated object as a kinematic chain, which has far less degrees of freedom than alternative models. The second is that by using this approach we can represent, or support, a much larger area of the posterior than is currently possible. This allows far more robust algorithms to be implemented since there is far less pressure to prune the search space to free up computational resources. We apply this approach to two problems: The first is single frame monocular 3D pose estimation, where we propose a method to directly extract 3D pose without first extracting any intermediate 2D representation or being dependent on strong spatial prior models. The second is multi-view 3D tracking where we show that using the above technique results in an approach that is far more robust than current approaches, without relying on strong temporal prior models. In both domains we demonstrate the strength and versatility of the proposed method.  相似文献   

3.
目的 2D姿态估计的误差是导致3D人体姿态估计产生误差的主要原因,如何在2D误差或噪声干扰下从2D姿态映射到最优、最合理的3D姿态,是提高3D人体姿态估计的关键。本文提出了一种稀疏表示与深度模型联合的3D姿态估计方法,以将3D姿态空间几何先验与时间信息相结合,达到提高3D姿态估计精度的目的。方法 利用融合稀疏表示的3D可变形状模型得到单帧图像可靠的3D初始值。构建多通道长短时记忆MLSTM(multi-channel long short term memory)降噪编/解码器,将获得的单帧3D初始值以时间序列形式输入到其中,利用MLSTM降噪编/解码器学习相邻帧之间人物姿态的时间依赖关系,并施加时间平滑约束,得到最终优化的3D姿态。结果 在Human3.6M数据集上进行了对比实验。对于两种输入数据:数据集给出的2D坐标和通过卷积神经网络获得的2D估计坐标,相比于单帧估计,通过MLSTM降噪编/解码器优化后的视频序列平均重构误差分别下降了12.6%,13%;相比于现有的基于视频的稀疏模型方法,本文方法对视频的平均重构误差下降了6.4%,9.1%。对于2D估计坐标数据,相比于现有的深度模型方法,本文方法对视频的平均重构误差下降了12.8%。结论 本文提出的基于时间信息的MLSTM降噪编/解码器与稀疏模型相结合,有效利用了3D姿态先验知识,视频帧间人物姿态连续变化的时间和空间依赖性,一定程度上提高了单目视频3D姿态估计的精度。  相似文献   

4.
We present a system for human pose estimation by using a single frame and without making assumptions on temporal coherence. The system uses 3D voxel data reconstructed from multiple synchronized video streams as input, and computes, for each frame, a skeleton model which best fits the body pose. This system adopts a hierarchical approach where the head and torso locations are found first based on template fitting with their specific shapes and dimensions. It is followed by a limb detection procedure that estimates the pose parameters of four limbs. However, a problem generally faced with skeleton models is the means to find adequate measurements to fit the model. In this paper, voxel data, together with two novel local shape features, are used for this purpose. Experiments show that this system is robust to several perturbations associated with the input data, such as voxel reconstruction errors and complex poses with self-contact, and also allows unconstrained motions, such as fast or unpredictable movements.  相似文献   

5.
基于人体行为3D模型的2D行为识别   总被引:4,自引:1,他引:4  
针对行为识别中行为者朝向变化带来的问题, 提出了一种基于人体行为3D模型的2D行为识别算法. 在学习行为分类器时, 以3D占据网格表示行为样本, 提取人体3D关节点作为描述行为的特征, 为每一类行为训练一个基于范例的隐马尔可夫模型(Exemplar-based hidden Markov model, EHMM), 同时从3D行为样本中选取若干帧作为3D关键姿势集, 这个集合是连接2D观测样本和人体3D关节点特征的桥梁. 在识别2D行为时, 2D观测样本序列可以由一个或多个非标定的摄像机采集. 首先在3D关键姿势集中为每一帧2D观测样本寻找与之最匹配的3D关键姿势帧, 之后由行为分类器对2D观测样本序列对应的3D关键姿势序列进行识别. 该算法在训练行为分类器时要进行行为者的3D重构和人体3D关节点的提取, 而在识别2D行为时不再需要进行3D重构. 通过在3个数据库上的实验, 证明该算法可以有效识别行为者在任意朝向下的行为, 并可以适应不同的行为采集环境.  相似文献   

6.
7.
增强现实应用中基于三维模型的手形追踪   总被引:2,自引:0,他引:2  
本文介绍了一种基于三维模型的分步迭代法来实现对全局和局部手运动的估计追踪。手部位置由ICP(Iterative Closed point)算法和因式分解法求得的掌形近似。结合自然手运动限制,本文采用基于序列的Monte Carlo算法追踪手指运动。最后采用在姿态估计和手指关节追踪之间的迭代算法得到一个精确的结构估计。实验证实本方法对自然手势运动具有较好的精确性和鲁棒性。  相似文献   

8.
Guo  Chuan  Zuo  Xinxin  Wang  Sen  Liu  Xinshuang  Zou  Shihao  Gong  Minglun  Cheng  Li 《International Journal of Computer Vision》2022,130(2):285-315

We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories. The key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances. It is achieved in this paper by a two-step process that maintains internal 3D pose and shape representations, action2motion and motion2video. Action2motion stochastically generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos. Specifically, the Lie algebraic theory is engaged in representing natural human motions following the physical law of human kinematics; a temporal variational auto-encoder is developed that encourages diversity of output motions. Moreover, given an additional input image of a clothed human character, an entire pipeline is proposed to extract his/her 3D detailed shape, and to render in videos the plausible motions from different views. This is realized by improving existing methods to extract 3D human shapes and textures from single 2D images, rigging, animating, and rendering to form 2D videos of human motions. It also necessitates the curation and reannotation of 3D human motion datasets for training purpose. Thorough empirical experiments including ablation study, qualitative and quantitative evaluations manifest the applicability of our approach, and demonstrate its competitiveness in addressing related tasks, where components of our approach are compared favorably to the state-of-the-arts.

  相似文献   

9.
We present an approach which exploits the coupling between human actions and scene geometry to use human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints on the scene. These constraints are then used to improve single-view 3D scene understanding approaches. The proposed method is validated on monocular time-lapse sequences from YouTube and still images of indoor scenes gathered from the Internet. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.  相似文献   

10.
适用于单目视频的无标记三维人体运动跟踪   总被引:2,自引:2,他引:0  
在无标记人体运动跟踪过程中,由于被跟踪目标缺乏明显的特征以及背景复杂而使得跟踪到的人体运动姿态与真实值偏差较大,不能进行长序列视频跟踪.针对这一现象,提出一种基于形变外观模板匹配进行单目视频的三维人体运动跟踪算法,其中所用的人体外观模型由三维人体骨骼模型及二维纸板模型组成.首先根据人体骨骼比例约束采用逆运动学计算出关节旋转欧拉角;然后利用正向运动学求得纸板模型中像素在三维空间中的坐标,将这些像素根据摄像机成像模型投影到二维图像中得到形变外观模板;最后采用直方图匹配得到人体运动跟踪结果.实验结果表明,该算法对于一些复杂的长序列人体运动能够得到较为理想的跟踪结果,可应用于人机交互和动画制作等领域.  相似文献   

11.
Recovering 3D human body configurations using shape contexts   总被引:3,自引:0,他引:3  
The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process would succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently - tracking just becomes repeated recognition. We present results on a variety of data sets.  相似文献   

12.
Network space has gone beyond the map. Many critics claim that Geographic Information Systems (GIS) determines the environment. New geotechnical technologies such as embedded computer systems and tools, cyberspace network space reconstruction, and aerial photography allow network space to obtain large amounts of back-specified 3D data, but these 3D technologies are environmentally friendly. Help deal with positive criticism. Can connect GIS and 3D. It back-specifies the 3D model with basic data and adds a ground-based network space reconstruction to the traditional GIS aerial view. This proposed to place GIS and 3D in an index frame and wrap the environmental and network space elements using 3D GIS for the network space reconstruction of the new landscape archeological network space. 3D-models and displays in scene architectonic examination and plan. These scene models, close to drawings and guides, are urgent during plan examination and exploration by the scene engineering program. In this regard, models offer scene scientists and creators of various activity methods for visual reasoning and visual correspondence. This commitment investigates the different elements of (enlarged) models for scene designers, incorporating investigation, affirmation, union and introduction regarding use and cycle. It presents a typology of models depending on their exhibition (instrumental use), exemplified by a few cases. Even though the attention here is on the function of models in scene engineering, the contention created in this paper is additionally pertinent to the firmly related controls of engineering and metropolitan plan and accordingly replaceable.  相似文献   

13.
针对二维人脸识别对姿态与光照变化较为敏感的问题,提出了一种基于三维数据与混合多尺度奇异值特征MMSV(mixture of multi-scale singular value,MMSV)的二维人脸识别方法。在训练阶段,利用三维人脸数据与光照模型获取大量具有不同姿态和光照条件的二维虚拟图像,为构造完备的特征模板奠定基础;同时,通过子集划分有效地缓解了人脸特征提取过程中的非线性问题;最后对人脸图像进行MMSV特征提取,从而对人脸的全局与局部特征进行融合。在识别阶段,通过计算MMSV特征子空间距离完成分类识别。实验证明,提取到的MMSV特征包含有更多的鉴别信息,对姿态和光照变化具有理想的鲁棒性。该方法在WHU-3D数据库上取得了约98.4%的识别率。  相似文献   

14.
Tracking human body poses in monocular video has many important applications. The problem is challenging in realistic scenes due to background clutter, variation in human appearance and self-occlusion. The complexity of pose tracking is further increased when there are multiple people whose bodies may inter-occlude. We proposed a three-stage approach with multi-level state representation that enables a hierarchical estimation of 3D body poses. Our method addresses various issues including automatic initialization, data association, self and inter-occlusion. At the first stage, humans are tracked as foreground blobs and their positions and sizes are coarsely estimated. In the second stage, parts such as face, shoulders and limbs are detected using various cues and the results are combined by a grid-based belief propagation algorithm to infer 2D joint positions. The derived belief maps are used as proposal functions in the third stage to infer the 3D pose using data-driven Markov chain Monte Carlo. Experimental results on several realistic indoor video sequences show that the method is able to track multiple persons during complex movement including sitting and turning movements with self and inter-occlusion.  相似文献   

15.
In this paper, we present a novel approach to recover a 3D human pose in real-time from a single depth image using principal direction analysis (PDA). Human body parts are first recognized from a human depth silhouette via trained random forests (RFs). PDA is applied to each recognized body part, which is presented as a set of points in 3D, to estimate its principal direction. Finally, a 3D human pose is recovered by mapping the principal direction to each body part of a 3D synthetic human model. We perform both quantitative and qualitative evaluations of our proposed 3D human pose recovering methodology. We show that our proposed approach has a low average reconstruction error of 7.07 degrees for four key joint angles and performs more reliably on a sequence of unconstrained poses than conventional methods. In addition, our methodology runs at a speed of 20 FPS on a standard PC, indicating that our system is suitable for real-time applications. Our 3D pose recovery methodology is applicable to applications ranging from human computer interactions to human activity recognition.  相似文献   

16.
We propose a framework to reconstruct the 3D pose of a human for animation from a sequence of single-view video frames. The framework for pose construction starts with background estimation and the performer?s silhouette is extracted using image subtraction for each frame. Then the body silhouettes are automatically labeled using a model-based approach. Finally, the 3D pose is constructed from the labeled human silhouette by assuming orthographic projection. The proposed approach does not require camera calibration. It assumes that the input video has a static background, it has no significant perspective effects, and the performer is in an upright position. The proposed approach requires minimal user interaction.  相似文献   

17.
提出一种根据用户指定的人体运动和观察视角生成真实感视频的方法.首先采集演员进行少数基本运动时的多视角视频数据库,并使用无标记运动捕捉的方法获得任意时刻人体对应的骨骼和3D模型.其次,用户对人体骨架指定运动并设定视角,以此定义目标视频.实验结果验证了文中方法能够利用有限的数据库合成演员在用户指定运动和视角下的真实感视频.  相似文献   

18.
In this paper, we introduce a method to estimate the object’s pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature points from adjacent images in the video sequence. We first demonstrate that centralized pose estimation from the collection of corresponding feature points in the 2D images from all cameras can be obtained as a solution to a generalized Sylvester’s equation. We subsequently derive a distributed solution to pose estimation from multiple cameras and show that it is equivalent to the solution of the centralized pose estimation based on Sylvester’s equation. Specifically, we rely on collaboration among the multiple cameras to provide an iterative refinement of the independent solution to pose estimation obtained for each camera based on Sylvester’s equation. The proposed approach to pose estimation from multiple cameras relies on all of the information available from all cameras to obtain an estimate at each camera even when the image features are not visible to some of the cameras. The resulting pose estimation technique is therefore robust to occlusion and sensor errors from specific camera views. Moreover, the proposed approach does not require matching feature points among images from different camera views nor does it demand reconstruction of 3D points. Furthermore, the computational complexity of the proposed solution grows linearly with the number of cameras. Finally, computer simulation experiments demonstrate the accuracy and speed of our approach to pose estimation from multiple cameras.  相似文献   

19.
We introduce a framework for unconstrained 3D human upper body pose estimation from multiple camera views in complex environment. Its main novelty lies in the integration of three components: single-frame pose recovery, temporal integration and model texture adaptation. Single-frame pose recovery consists of a hypothesis generation stage, in which candidate 3D poses are generated, based on probabilistic hierarchical shape matching in each camera view. In the subsequent hypothesis verification stage, the candidate 3D poses are re-projected into the other camera views and ranked according to a multi-view likelihood measure. Temporal integration consists of computing K-best trajectories combining a motion model and observations in a Viterbi-style maximum-likelihood approach. Poses that lie on the best trajectories are used to generate and adapt a texture model, which in turn enriches the shape likelihood measure used for pose recovery. The multiple trajectory hypotheses are used to generate pose predictions, augmenting the 3D pose candidates generated at the next time step.  相似文献   

20.
人体上肢姿态的估计及多解分析   总被引:1,自引:0,他引:1  
任海兵  徐光祐 《软件学报》2002,13(11):2127-2133
人体是复杂的变形物体(deformable objects),结构自由度多.仅仅从2D表观特征出发,很难分析和识别精细的动作,更谈不上理解用户的意图.因为手势是交流的主要方式,所以,以3D模型为基础,对人体上肢建模,重构人体上肢的3D姿态.从实用化和尽可能少引入误差出发,分析了重构人体上肢姿态所必须的最少条件.在此条件下,提出了端点固定的关节模型(end-determined articulate model)及其对应的方程组,以估计各个关节点的3D坐标.然后,分析方程组解的最大可能数目,并给出相应的求解方法.最后,利用结果误差和位姿来检验解的合理性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号