首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Human Body Model Acquisition and Tracking Using Voxel Data   总被引:1,自引:0,他引:1  
We present an integrated system for automatic acquisition of the human body model and motion tracking using input from multiple synchronized video streams. The video frames are segmented and the 3D voxel reconstructions of the human body shape in each frame are computed from the foreground silhouettes. These reconstructions are then used as input to the model acquisition and tracking algorithms.The human body model consists of ellipsoids and cylinders and is described using the twists framework resulting in a non-redundant set of model parameters. Model acquisition starts with a simple body part localization procedure based on template fitting and growing, which uses prior knowledge of average body part shapes and dimensions. The initial model is then refined using a Bayesian network that imposes human body proportions onto the body part size estimates. The tracker is an extended Kalman filter that estimates model parameters based on the measurements made on the labeled voxel data. A voxel labeling procedure that handles large frame-to-frame displacements was designed resulting in very robust tracking performance.Extensive evaluation shows that the system performs very reliably on sequences that include different types of motion such as walking, sitting, dancing, running and jumping and people of very different body sizes, from a nine year old girl to a tall adult male.  相似文献   

2.
视频插值是利用视频相邻帧的图像信息合成中间帧,可直接应用于慢动作视频回放、高频视频合成、动画制作等领域。现有基于深度体素流的视频插值模型存在合成精度低、参数量大的问题,限制其在移动端的部署应用。提出一种压缩驱动的精化深度体素流插值模型。通过预训练深度体素流模型提高视频的插值质量并确定高精度参数,利用稀疏压缩技术裁剪卷积通道数,以减少参数量并得到粗体素流,同时将输入视频帧、粗体素流和粗中间帧作为精体素流网络的输入,获得精体素流。在此基础上,通过三线性插值方法计算得到精中间帧,以增强模型对边缘信息的捕获能力,从而提高中间帧质量。在Vimeo 90K和UCF101数据集上的实验结果表明,相比DVF、SepConv、CDFI等模型,该模型的峰值信噪比和结构相似性分别平均提高1.59 dB和0.015,在保证参数量增幅较小的前提下,能够有效优化视频合成效果。  相似文献   

3.
赵威  李毅 《计算机应用》2022,42(9):2830-2837
为了生成更准确流畅的虚拟人动画,采用Kinect设备捕获三维人体姿态数据的同时,使用单目人体三维姿态估计算法对Kinect的彩色信息进行骨骼点数据推理,从而实时优化人体姿态估计效果,并驱动虚拟人物模型生成动画。首先,提出了一种时空优化的骨骼点数据处理方法,以提高单目估计人体三维姿态的稳定性;其次,提出了一种Kinect和遮挡鲁棒姿势图(ORPM)算法融合的人体姿态估计方法来解决Kinect的遮挡问题;最后,研制了基于四元数向量插值和逆向运动学约束的虚拟人动画系统,其能够进行运动仿真和实时动画生成。与仅利用Kinect捕获人体运动来生成动画的方法相比,所提方法的人体姿态估计数据鲁棒性更强,具备一定的防遮挡能力,而与基于ORPM算法的动画生成方法相比,所提方法生成的动画在帧率上提高了两倍,效果更真实流畅。  相似文献   

4.
运动视频中特定运动帧的获取是运动智能化教学实现的重要环节,为了得到视频中的特定运动 帧以便进一步地对视频进行分析,并利用姿态估计和聚类的相关知识,提出了一种对运动视频提取特定运动帧 的方法。首先选用 HRNet 姿态估计模型作为基础,该模型精度高但模型规模过大,为了实际运用的需求,对 该模型进行轻量化处理并与 DARK 数据编码相结合,提出了 Small-HRNet 网络模型,在基本保持精度不变的情 况下参数量减少了 82.0%。然后利用 Small-HRNet 模型从视频中提取人体关节点,将每一视频帧中的人体骨架特 征作为聚类的样本点,最终以标准运动帧的骨架特征为聚类中心,对整个视频进行聚类得到视频的特定运动帧, 在武术运动数据集上进行实验。该方法对武术动作帧的提取准确率为 87.5%,能够有效地提取武术动作帧。  相似文献   

5.
目的 2D姿态估计的误差是导致3D人体姿态估计产生误差的主要原因,如何在2D误差或噪声干扰下从2D姿态映射到最优、最合理的3D姿态,是提高3D人体姿态估计的关键。本文提出了一种稀疏表示与深度模型联合的3D姿态估计方法,以将3D姿态空间几何先验与时间信息相结合,达到提高3D姿态估计精度的目的。方法 利用融合稀疏表示的3D可变形状模型得到单帧图像可靠的3D初始值。构建多通道长短时记忆MLSTM(multi-channel long short term memory)降噪编/解码器,将获得的单帧3D初始值以时间序列形式输入到其中,利用MLSTM降噪编/解码器学习相邻帧之间人物姿态的时间依赖关系,并施加时间平滑约束,得到最终优化的3D姿态。结果 在Human3.6M数据集上进行了对比实验。对于两种输入数据:数据集给出的2D坐标和通过卷积神经网络获得的2D估计坐标,相比于单帧估计,通过MLSTM降噪编/解码器优化后的视频序列平均重构误差分别下降了12.6%,13%;相比于现有的基于视频的稀疏模型方法,本文方法对视频的平均重构误差下降了6.4%,9.1%。对于2D估计坐标数据,相比于现有的深度模型方法,本文方法对视频的平均重构误差下降了12.8%。结论 本文提出的基于时间信息的MLSTM降噪编/解码器与稀疏模型相结合,有效利用了3D姿态先验知识,视频帧间人物姿态连续变化的时间和空间依赖性,一定程度上提高了单目视频3D姿态估计的精度。  相似文献   

6.
针对危险驾驶行为引起的交通安全事故频发的现状,提出一种基于MobileNetV3和ST-SRU的危险驾驶姿态识别系统.首先,修改MobileNetV3的网络结构使其适用于人体姿态估计任务,输出关节点的热力图和偏移量图,用来估计J个关节点的二维坐标位置;其次,定义ST-SRU骨架动作识别算法,利用动作的骨架序列数据对动作进行分类.实验结果表明:MobileNetV3姿态估计算法在自建的AI Challenger上肢姿态数据集上测得PCP值(percentage correct parts)达到95.6%,测试1 000次用时仅为5.03 s;利用自建的危险驾驶行为数据集将训练好的姿态估计和动作识别模型移植到嵌入式平台,实现了实时的危险驾驶姿态识别系统.  相似文献   

7.
In articulated tracking, one is concerned with estimating the pose of a person in every frame of a film. This pose is most often represented as a kinematic skeleton where the joint angles are the degrees of freedom. Least-committed predictive models are then phrased as a Brownian motion in joint angle space. However, the metric of the joint angle space is rather unintuitive as it ignores both bone lengths and how bones are connected. As Brownian motion is strongly linked with the underlying metric, this has severe impact on the predictive models. We introduce the spatial kinematic manifold of joint positions, which is embedded in a high dimensional Euclidean space. This Riemannian manifold inherits the metric from the embedding space, such that distances are measured as the combined physical length that joints travel during movements. We then develop a least-committed Brownian motion model on the manifold that respects the natural metric. This model is expressed in terms of a stochastic differential equation, which we solve using a novel numerical scheme. Empirically, we validate the new model in a particle filter based articulated tracking system. Here, we not only outperform the standard Brownian motion in joint angle space, we are also able to specialise the model in ways that otherwise are both difficult and expensive in joint angle space.  相似文献   

8.
Creating and animating subject‐specific anatomical models is traditionally a difficult process involving medical image segmentation, geometric corrections and the manual definition of kinematic parameters. In this paper, we introduce a novel template morphing algorithm that facilitates three‐dimensional modelling and parameterization of skeletons. Target data can be either medical images or surfaces of the whole skeleton. We incorporate prior knowledge about bone shape, the feasible skeleton pose and the morphological variability in the population. This allows for noise reduction, bone separation and the transfer, from the template, of anatomical and kinematical information not present in the input data. Our approach treats both local and global deformations in successive regularization steps: smooth elastic deformations are represented by an as‐rigid‐as‐possible displacement field between the reference and current configuration of the template, whereas global and discontinuous displacements are estimated through a projection onto a statistical shape model and a new joint pose optimization scheme with joint limits.  相似文献   

9.
In this paper, we address simultaneous markerless motion and shape capture from 3D input meshes of partial views onto a moving subject. We exploit a computer graphics model based on kinematic skinning as template tracking model. This template model consists of vertices, joints and skinning weights learned a priori from registered full‐body scans, representing true human shape and kinematics‐based shape deformations. Two data‐driven priors are used together with a set of constraints and cues for setting up sufficient correspondences. A Gaussian mixture model‐based pose prior of successive joint configurations is learned to soft‐constrain the attainable pose space to plausible human poses. To make the shape adaptation robust to outliers and non‐visible surface regions and to guide the shape adaptation towards realistically appearing human shapes, we use a mesh‐Laplacian‐based shape prior. Both priors are learned/extracted from the training set of the template model learning phase. The output is a model adapted to the captured subject with respect to shape and kinematic skeleton as well as the animation parameters to resemble the observed movements. With example applications, we demonstrate the benefit of such footage. Experimental evaluations on publicly available datasets show the achieved natural appearance and accuracy.  相似文献   

10.
We present a novel approach for 3D human body shape model adaptation to a sequence of multi-view images, given an initial shape model and initial pose sequence. In a first step, the most informative frames are determined by optimization of an objective function that maximizes a shape–texture likelihood function and a pose diversity criterion (i.e. the model surface area that lies close to the occluding contours), in the selected frames. Thereafter, a batch-mode optimization is performed of the underlying shape- and pose-parameters, by means of an objective function that includes both contour and texture cues over the selected multi-view frames.Using above approach, we implement automatic pose and shape estimation using a three-step procedure: first, we recover initial poses over a sequence using an initial (generic) body model. Both model and poses then serve as input to the above mentioned adaptation process. Finally, a more accurate pose recovery is obtained by means of the adapted model.We demonstrate the effectiveness of our frame selection, model adaptation and integrated pose and shape recovery procedure in experiments using both challenging outdoor data and the HumanEva data set.  相似文献   

11.
3D mapping with multi-resolution occupied voxel lists   总被引:1,自引:0,他引:1  
Most current navigation algorithms in mobile robotics produce 2D maps from data provided by 2D sensors. In large part this is due to the availability of suitable 3D sensors and difficulties of managing the large amount of data supplied by 3D sensors. This paper presents a novel, multi-resolution algorithm that aligns 3D range data stored in occupied voxel lists so as to facilitate the construction of 3D maps. Multi-resolution occupied voxel lists (MROL) are voxel based data structures that efficiently represent 3D scan and map information. The process described in this research can align a sequence of scans to produce maps and localise a range sensor within a prior global map. An office environment (200 square metres) is mapped in 3D at 0.02 m resolution, resulting in a 200,000 voxel occupied voxel list. Global localisation within this map, with no prior pose estimate, is completed in 5 seconds on a 2 GHz processor. The MROL based sequential scan matching is compared to a standard iterative closest point (ICP) implementation with an error in the initial pose estimate of plus or minus 1 m and 90 degrees. MROL correctly scan matches 94% of scans to within 0.1 m as opposed to ICP with 30% within 0.1 m.  相似文献   

12.
LEGO is a globally popular toy composed of colorful interlocking plastic bricks that can be assembled in many ways; however, this special feature makes designing a LEGO sculpture particularly challenging. Building a stable sculpture is not easy for a beginner; even an experienced user requires a good deal of time to build one. This paper provides a novel approach to creating a balanced LEGO sculpture for a 3D model in any pose, using centroid adjustment and inner engraving. First, the input 3D model is transformed into a voxel data structure. Next, the model’s centroid is adjusted to an appropriate position using inner engraving to ensure that the model stands stably. A model can stand stably without any struts when the center of mass is moved to the ideal position. Third, voxels are merged into layer-by-layer brick layout assembly instructions. Finally, users will be able to build a LEGO sculpture by following these instructions. The proposed method is demonstrated with a number of LEGO sculptures and the results of the physical experiments are presented.  相似文献   

13.
针对视频运动模糊严重影响插帧效果的情况,提出了一种新型的模糊视频插帧方法。首先,提出一种多任务融合卷积神经网络,该网络结构由两个模块组成:去模糊模块和插帧模块。其中,去模糊模块采用残差块堆叠的深度卷积神经网络(CNN),提取并学习深度模糊特征以实现两帧输入图像的运动模糊去除;插帧模块用于估计帧间的体素流,所得体素流将用于指导像素进行三线性插值以合成中间帧。其次,制作了大型模糊视频仿真数据集,并提出一种先分后合、由粗略至细致的训练策略,实验结果表明该策略促进了多任务网络有效收敛。最后,对比前沿的去模糊和插帧算法组合,实验指标显示所提方法合成中间帧时峰值信噪比最少提高1.41 dB,结构相似性提升0.020,插值误差降低1.99。视觉对比及重制序列展示表明,所提模型对于模糊视频有着显著的帧率上转换效果,即能够将两帧模糊视频帧端对端重制为清晰且视觉连贯的三帧视频帧。  相似文献   

14.
A new method is presented for the efficient and reliable pose determination of 3D objects in dense range image data. The method is based upon a minimalistic Geometric Probing strategy that hypothesizes the intersection of the object with some selected image point, and searches for additional surface data at locations relative to that point. The strategy is implemented in the discrete domain as a binary decision tree classifier. The tree leaf nodes represent individual voxel templates of the model, with one template per distinct model pose. The internal nodes represent the union of the templates of their descendant leaf nodes. The union of all leaf node templates is the complete template set of the model over its discrete pose space. Each internal node also encodes a single voxel which is the most common element of its child node templates. Traversing the free is equivalent to efficiently matching the large set of templates at a selected image seed location. The method was implemented and extensive experiments were conducted for a variety of combinations of tree designs and traversals under isolated, cluttered, and occluded scene conditions. The results demonstrated a tradeoff between efficiency and reliability. It was concluded that there exist combinations of tree design and traversal which are both highly efficient and reliable  相似文献   

15.
We present a novel representation and rendering method for free‐viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi‐automatic, data‐driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free‐viewpoint video generated from broadcast footage of challenging, uncontrolled environments.  相似文献   

16.
林辉灿  吕强  王国胜  卫恒  梁冰 《机器人》2018,40(6):911-920
针对基于视觉特征的同时定位与地图构建(SLAM)系统在图像模糊、运动过快和特征缺失的情况下存在鲁棒性和精度急剧下降甚至失败的问题,提出了紧耦合的非线性优化的立体视觉-惯导SLAM系统.首先,以关键帧的位姿作为约束,采用分而治之的策略估计惯性测量单元(IMU)的偏差.在前端,针对ORB-SLAM2在跟踪过程中由于运动过快导致匀速运动模型失效的问题,通过预积分上一帧到当前帧的IMU数据,预测当前帧的初始位姿,并在位姿优化中加入了IMU预积分约束.然后,在后端优化中,在滑动窗口内优化关键帧的位姿、地图点和IMU预积分,并更新IMU的偏差.最后,通过EuRoC数据集验证该系统的性能,对比ORB-SLAM2系统、VINS-Mono系统和OKVIS系统,该系统的精度分别提高了1.14倍、1.48倍和4.59倍;相比前沿的SLAM系统,该系统在快速运动、图像模糊和特征缺失条件下的鲁棒性也得到了提高.  相似文献   

17.
引进最新骨架提取算法,设计并实现了一种以手势的欧氏骨架为基准的手势识别系统,系统由通用视频采集模块和ARM开发板硬件组成.利用动态前景检测算法结合YCbCr肤色识别模型,分割出手势区域;借助欧氏距离变换和Delta—中轴骨架提取算法获得手势区域的欧氏骨架,并提取骨架的关键点和欧氏距离等几何参数,以此建立手势识别的几何模型.实验测试正确识别率高达94%,每帧图片处理时间小于25 ms,表明该系统实时、有效.  相似文献   

18.
本文提出了一个从三维voxel数据当中获取和跟踪人体骨架的算法。我们通过16台照相机得到人体的voxel数据,它是三维空间中定义一个点的图象信息的单位,包含每个点的X、Y和Z三个坐标等信息。从多目摄像机获得的三维人体voxel数据当中提取准确的,实时的人体骨架信息是当前计算机视觉领域的一个研究热点和难题。借鉴量子遗传算法的思想,把其中的迭代和优化思想运用到骨架的优化和提取是一种行之有效的办法。实验结果表明了该算法有较强的鲁棒性和有效性。  相似文献   

19.
In this paper, we present a novel approach for recovering a 3-D pose from a single human body depth silhouette using nonrigid point set registration and body part tracking. In our method, a human body depth silhouette is presented as a set of 3-D points and matched to another set of 3-D points using point correspondences. To recognize and maintain body part labels, we initialize the first set of points to corresponding human body parts, resulting in a body part-labeled map. Then, we transform the points to a sequential set of points based on point correspondences determined by nonrigid point set registration. After point registration, we utilize the information from tracked body part labels and registered points to create a human skeleton model. A 3-D human pose gets recovered by mapping joint information from the skeleton model to a 3-D synthetic human model. Quantitative and qualitative evaluation results on synthetic and real data show that complex human poses can be recovered more reliably with lower errors compared to other conventional techniques for 3-D pose recovery.  相似文献   

20.
We present a skeleton-based algorithm for intrinsic symmetry detection on imperfect 3D point cloud data. The data imperfections such as noise and incompleteness make it difficult to reliably compute geodesic distances, which play essential roles in existing intrinsic symmetry detection algorithms. In this paper, we leverage recent advances in curve skeleton extraction from point clouds for symmetry detection. Our method exploits the properties of curve skeletons, such as homotopy to the input shape, approximate isometry-invariance, and skeleton-to-surface mapping, for the detection task. Starting from a curve skeleton extracted from an input point cloud, we first compute symmetry electors, each of which is composed of a set of skeleton node pairs pruned with a cascade of symmetry filters. The electors are used to vote for symmetric node pairs indicating the symmetry map on the skeleton. A symmetry correspondence matrix (SCM) is constructed for the input point cloud through transferring the symmetry map from skeleton to point cloud. The final symmetry regions on the point cloud are detected via spectral analysis over the SCM. Experiments on raw point clouds, captured by a 3D scanner or the Microsoft Kinect, demonstrate the robustness of our algorithm. We also apply our method to repair incomplete scans based on the detected intrinsic symmetries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号