首页 | 本学科首页   官方微博 | 高级检索  
     

基于多层空间特征融合的三维人体姿态估计
引用本文:梁桉源,肖学中. 基于多层空间特征融合的三维人体姿态估计[J]. 计算机系统应用, 2024, 33(8): 250-256
作者姓名:梁桉源  肖学中
作者单位:南京邮电大学 计算机学院, 南京 210023
摘    要:在三维人体姿态估计任务当中, 人体关节之间的连接关系形成了一种复杂的拓扑结构, 利用图卷积网络对该结构进行建模, 可以有效捕捉局部关节间的联系; 尽管不相邻关节之间没有直接的物理连接, 但由于人体的运动和姿态受到生物力学约束以及人体关节之间的协同作用, 利用Transformer编码器建立关节之间的上下文关系, 可以更好地推断出人体姿态; 在大模型的背景下, 如何在保证模型性能的同时, 降低参数量, 也显得尤为重要. 针对上述问题, 设计了一个基于图卷积和Transformer的多层空间特征融合网络模型(MLSFFN), 在使用相对少量的参数基础上, 有效地融合了局部和全局空间特征. 实验结果表明, 本文提出的方法在仅需2.1M参数量的情况下, 在Human3.6M数据集上达到了49.9 mm的平均每关节误差(MPJPE). 此外, 模型在MPI-INF-3DHP数据集上也展示出了较强的泛化能力.

关 键 词:多层空间特征融合  三维人体姿态估计  图卷积网络  Transformer  轻量型
收稿时间:2024-02-26
修稿时间:2024-03-28

3D Human Pose Estimation Based on Multi-layer Spatial Feature Fusion
LIANG An-Yuan,XIAO Xue-Zhong. 3D Human Pose Estimation Based on Multi-layer Spatial Feature Fusion[J]. Computer Systems& Applications, 2024, 33(8): 250-256
Authors:LIANG An-Yuan  XIAO Xue-Zhong
Affiliation:School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Abstract:In the task of 3D human pose estimation, the complex topology formed by the connection relationship between human joints presents a challenge. Effective capture of the connections between local joints is possible through modeling this structure with a graph convolutional network. Although non-adjacent joints lack direct physical connections, Transformer encoders establish contextual relationships between joints, which is crucial for better human posture inference due to the biomechanical constraints influencing human motion and pose, as well as the synergistic interaction of human joints. Balancing model performance with a reduction in the number of parameters is of particular importance for large-scale models. To tackle these challenges, a multi-layer spatial feature fusion network model (MLSFFN) based on graph convolution and Transformer is designed. This model proficiently fuses local and global spatial features with a relatively minimal parameter set. Experimental results demonstrate that the proposed method achieves a mean point per joint error (MPJPE) of 49.9 mm on the Human3.6M dataset with only 2.1M parameters. Moreover, the model demonstrates a robust generalization capability.
Keywords:multi-layer spatial feature fusion  3D human pose estimation  graph convolutional network (GCN)  Transformer  lightweight
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号