首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hand Pose Estimation aims to predict the position of joints on a hand from an image, and it has become popular because of the emergence of VR/AR/MR technology. Nevertheless, an issue surfaces when trying to achieve this goal, since a hand tends to cause self-occlusion or external occlusion easily as it interacts with external objects. As a result, there have been many projects dedicated to this field for a better solution of this problem. This paper develops a system that accurately estimates a hand pose in 3D space using depth images for VR applications. We propose a data-driven approach of training a deep learning model for hand pose estimation with object interaction. In the convolutional neural network (CNN) training procedure, we design a skeleton-difference loss function, which effectively can learn the physical constraints of a hand. Also, we propose an object-manipulating loss function, which considers knowledge of the hand-object interaction, to enhance performance.In the experiments we have conducted for hand pose estimation under different conditions, the results validate the robustness and the performance of our system and show that our method is able to predict the joints more accurately in challenging environmental settings. Such appealing results may be attributed to the consideration of the physical joint relationship as well as object information, which in turn can be applied to future VR/AR/MR systems for more natural experience.  相似文献   

2.
Most of the existing Action Quality Assessment (AQA) methods for scoring sports videos have deeply researched how to evaluate the single action or several sequential-defined actions that performed in short-term sport videos, such as diving, vault, etc. They attempted to extract features directly from RGB videos through 3D ConvNets, which makes the features mixed with ambiguous scene information. To investigate the effectiveness of deep pose feature learning on automatically evaluating the complicated activities in long-duration sports videos, such as figure skating and artistic gymnastic, we propose a skeleton-based deep pose feature learning method to address this problem. For pose feature extraction, a spatial–temporal pose extraction module (STPE) is built to capture the subtle changes of human body movements and obtain the detail representations for skeletal data in space and time dimensions. For temporal information representation, an inter-action temporal relation extraction module (ATRE) is implemented by recurrent neural network to model the dynamic temporal structure of skeletal subsequences. We evaluate the proposed method on figure skating activity of MIT-skate and FIS-V datasets. The experimental results show that the proposed method is more effective than RGB video-based deep feature learning methods, including SENet and C3D. Significant performance progress has been achieved for the Spearman Rank Correlation (SRC) on MIT-Skate dataset. On FIS-V dataset, for the Total Element Score (TES) and the Program Component Score (PCS), better SRC and MSE have been achieved between the predicted scores against the judge’s ones when compared with SENet and C3D feature methods.  相似文献   

3.
3D hand pose estimation by taking point cloud as input has been paid more and more attention recently. In this paper, a new module for point cloud processing, named Local-aware Point Processing Module (LPPM), is designed. With the ability to extract local information, it is permutation invariant w.r.t. neighboring points in input point cloud and is an independent module that is easy to be implemented and flexible to construct point cloud network. Based on this module, a LPPM-Net is constructed to estimate 3D hand pose. In order to normalize orientations of the point cloud as well as to maintain diversity properly in a controllable manner, we transform point cloud into an oriented bounding box coordinate system (OBB C.S.) and then rotate it randomly around the principal axis when training. In addition, a simple but effective technique called sampling ensemble is used in the test stage, which compensates for the resolution degradation caused by downsampling and improves the performance without extra parameters. We evaluate the proposed method on three public hand datasets: NYU, ICVL, and MSRA. Results show that our approach has a competitive performance on the three datasets.  相似文献   

4.
Recently convolutional neural networks (CNNs) have been employed to address the problem of hand pose estimation. In this work, we introduce an end-to-end deep architecture that can accurately estimate hand pose through the joint use of model-based and fine-tuning methods. In the model-based stage, we make use of the prior information in hand model geometry to ensure the geometric validity of the estimated poses. Next, we introduce a fine-tuning approach that learns to refine the errors between the model and observed hand. Our approach is validated on three challenging public datasets and achieves state-of-the-art performance.  相似文献   

5.
Image steganalysis based on convolutional neural networks(CNN) has attracted great attention. However, existing networks lack attention to regional features with complex texture, which makes the ability of discrimination learning miss in network. In this paper, we described a new CNN designed to focus on useful features and improve detection accuracy for spatial-domain steganalysis. The proposed model consists of three modules: noise extraction module, noise analysis module and classification module. A channel attention mechanism is used in the noise extraction module and analysis module, which is realized by embedding the SE(Squeeze-and-Excitation) module into the residual block. Then, we use convolutional pooling instead of average pooling to aggregate features. The experimental results show that detection accuracy of the proposed model is significantly better than those of the existing models such as SRNet, Zhu-Net and GBRAS-Net. Compared with these models, our model has better generalization ability, which is critical for practical application.  相似文献   

6.
针对表面肌电信号(surface electromyography, sEMG)手势识别使用卷积神经网络(convolutional neural network, CNN)提取特征不够充分,且忽略时序信息而导致识别精度不高的问题,本文创新性地提出了一种融合双层注意力与多流卷积神经网络(multi-stream convolutional neural network, MS-CNN)的sEMG手势识别记忆网络模型。首先,利用滑动窗口生成的表面肌电图像作为该模型的输入;然后在MS-CNN中嵌入通道注意力层(channel attention module, CAM),弱化无关信息,使网络能够更加专注sEMG的有效特征;其次,通过长短期记忆网络(long short term memory network, LSTM)对输入的特征进行时序上的激励,关注更多sEMG的时序信息,让网络在时间维度上拥有更强的学习能力;最后,采用时序注意力(time-sequence attention, TSA)层对LSTM的状态进行关注,从而更好地学习重要肌肉信息,提高手势识别精度。在NinaPro数据集上...  相似文献   

7.
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss–Seidel iterative method. Motivated by this iterative solution, we design a Gauss–Seidel network (GS-Net) architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer’s initial features as the network depth increases. We evaluate our proposed model on two standard benchmark datasets, and compare it with a comprehensive set of strong baseline methods for 3D human pose estimation. Our experimental results demonstrate that our approach outperforms the baseline methods on both datasets, achieving state-of-the-art performance. Furthermore, we conduct ablation studies to analyze the contributions of different components of our model architecture and show that the skip connection and adjacency modulation help improve the model performance.  相似文献   

8.
Convolutional neural network (CNN) based methods have recently achieved extraordinary performance in single image super-resolution (SISR) tasks. However, most existing CNN-based approaches increase the model’s depth by stacking massive kernel convolutions, bringing expensive computational costs and limiting their application in mobile devices with limited resources. Furthermore, large kernel convolutions are rarely used in lightweight super-resolution designs. To alleviate the above problems, we propose a multi-scale convolutional attention network (MCAN), a lightweight and efficient network for SISR. Specifically, a multi-scale convolutional attention (MCA) is designed to aggregate the spatial information of different large receptive fields. Since the contextual information of the image has a strong local correlation, we design a local feature enhancement unit (LFEU) to further enhance the local feature extraction. Extensive experimental results illustrate that our proposed MCAN can achieve better performance with lower model complexity compared with other state-of-the-art lightweight methods.  相似文献   

9.
In this paper, a method is proposed to improve the accuracy of 3D hand pose estimation. The existing methods make poor use of the depth information of hand joints and have difficulties of estimating the 3D coordinates accurately. To solve this problem, a method that utilizing the information between adjacent joints of each finger is proposed to estimate the depth coordinates of joints. In order to make full use of 2D information for depth estimation, this paper divides hand pose estimation into two sub-tasks (2D hand joints estimation and depth estimation). In depth estimation, a multi-stage network is proposed. We first estimate the depth of a part of hand joints, and then with the help of it and 2D information, the depth coordinates of adjacent joints can be well estimated. The method proposed in this paper has been proved to be effective on three public hand pose datasets through Self-comparisons. Compared with the methods that based on 2D CNN, our method achieves state-of-the-art performance on ICVL and NYU datasets, and also has a good result on MSRA dataset.  相似文献   

10.
Recognizing human interactions in still images is quite a challenging task since compared to videos, there is only a glimpse of interaction in a single image. This work investigates the role of human poses in recognizing human–human interactions in still images. To this end, a multi-stream convolutional neural network architecture is proposed, which fuses different levels of human pose information to recognize human interactions better. In this context, several pose-based representations are explored. Experimental evaluations in an extended benchmark dataset show that the proposed multi-stream pose Convolutional Neural Network is successful in discriminating a wide range of human–human interactions and human poses when used in conjunction with the overall context provides discriminative cues about human–human interactions.  相似文献   

11.
Hand pose estimation plays an important role in human–computer interaction and augmented reality. Regressing the joints coordinates is a difficult task due to the flexibility of the joint, self-occlusion and so on. In this paper, we propose a novel and simple hierarchical neural network for hand pose estimation. The hand joint coordinates are divided into six parts and each part is regressed in sequence with this hierarchical architecture. This can divide the complex task of regressing all hand joints coordinates into several sub-tasks which can make the estimation more accurate. When regress the joint coordinates of one part, the features of other parts may bring negative influence to this part due to the similarity among the fingers, so we use an interference cancellation operation in our hierarchical architecture. At the time the joint coordinates of one part are regressed, the corresponding features will be removed from the hand global feature to eliminate the interference of this part. The obtained features will be used as input for regressing the joints coordinates of the next part. The ablation study verifies the effectiveness of our hierarchical architecture. The experimental results demonstrate that our method can achieve state-of-the-art or comparable results relative to existing methods on four public hand pose datasets.  相似文献   

12.
This work is about solving a challenging problem of estimating the full 3D hand pose when a hand interacts with an unknown object. Compared to isolated single hand pose estimation, occlusion and interference induced by the manipulated object and the clutter background bring more difficulties for this task. Our proposed Multi-Level Fusion Net focuses on extracting more effective features to overcome these disadvantages by multi-level fusion design with a new end-to-end Convolutional Neural Network (CNN) framework. It takes cropped RGBD data from a single RGBD camera at free viewpoint as input without requiring additional hand–object pre-segmentation and object or hand pre-modeling. Through extensive evaluations on public hand–object interaction dataset, we demonstrate the state-of-the-art performance of our method.  相似文献   

13.
The high performance of state-of-the-art deep learning methods for 3D hand pose estimation heavily depends on a large annotated training set. However, it is difficult and time-consuming to obtain the annotations for 3D hand poses. To leverage unannotated images to reduce the annotation cost, we propose a semi-supervised method based on Multi-Task and Multi-View Consistency (MTMVC) for hand pose estimation. First, we obtain the joints based on heatmap prediction and coordinate regression parallelly and encourage their consistency. Second, we introduce multi-view consistency to encourage the predicted poses to be rotation-invariant. Thirdly, to make the network pay more attention to the hand region, we propose a spatially weighted consistency. Experiments on four public datasets showed that our proposed MTMVC outperformed existing semi-supervised hand pose estimation methods, and by only using half of the annotations, the accuracy of our method was comparable to those of several state-of-the-art fully supervised methods.  相似文献   

14.
Three-dimensional (3D) human pose tracking has recently attracted more and more attention in the computer vision field. Real-time pose tracking is highly useful in various domains such as video surveillance, somatosensory games, and human-computer interaction. However, vision-based pose tracking techniques usually raise privacy concerns, making human pose tracking without vision data usage an important problem. Thus, we propose using Radio Frequency Identification (RFID) as a pose tracking technique via a low-cost wearable sensing device. Although our prior work illustrated how deep learning could transfer RFID data into real-time human poses, generalization for different subjects remains challenging. This paper proposes a subject-adaptive technique to address this generalization problem. In the proposed system, termed Cycle-Pose, we leverage a cross-skeleton learning structure to improve the adaptability of the deep learning model to different human skeletons. Moreover, our novel cycle kinematic network is proposed for unpaired RFID and labeled pose data from different subjects. The Cycle-Pose system is implemented and evaluated by comparing its prototype with a traditional RFID pose tracking system. The experimental results demonstrate that Cycle-Pose can achieve lower estimation error and better subject generalization than the traditional system.  相似文献   

15.
针对直接处理点云数据的深度神经网络PointNet++无法充分学习点云形状信息的问题,提出一种融合空间感知模块和特征增强模块(spatial awareness and feature enhancement,SAFE) 的三维点云分类与分割方法(SAFE-PointNet++) 。首先,设计了空间感知(spatial awareness,SA) 模块,使特征提取网络在特征升维时融合了包含空间结构的权重信息,增强了特征在空间上的表现力。其次,设计了特征增强(feature enhancement,FE) 模块,通过把增强后的几何信息和附加信息拆分并分别进行编码,达到充分利用点云附加信息的目的。实验结果表明,在ModelNet40和S3DIS数据集上,SAFE-PointNet++与其他10种经典网络相比具有更高的分类和分割精度。  相似文献   

16.
刘唐波  杨锐  王文伟  何楚 《信号处理》2019,35(12):2062-2069
为有效检查驾驶员在行驶过程中的不当行为,本文研究结合人体姿态估计信息的检测算法,通过对检测目标的约束,建立起一套具有多阶段的手部动作检测方法。该方法包含三个模块。第一,人体姿态估计模块,选取人体姿态估计网络关节的高斯热图层,通过输出的人体姿态高斯热图信息,达到对检测目标的空间信息的获取;第二,手部检测模块,基于CNN的检测网络,在网络输入层融合人体姿态高斯热图后,达到对手部的检测率提高的效果;第三,手部动作分类模块,通过接受手部检测模块的输出,消除对检测结果产生干扰的背景,将分类网络的特征提取约束在手部局部位置,提高手部动作分类的准确率,将手部区域输入至分类网络得到驾驶员手部动作,从而判断驾驶员是否存在抽烟、接听电话等不当行为,实现驾驶员的行为检测。为了验证本文提出的多阶段的手部动作检测方法,已在自制数据集上进行了相应实验。   相似文献   

17.
目前卷积神经网络已成为腹部动脉血管分割领域的研究热点,但经典的卷积网络存在分割精度低和分割血管不连续的问题。为此,文中提出了基于改进3D全卷积网络的腹部动脉血管分割算法。该方法在网络的编码路径上构造不同尺度的侧输入,并将侧输入卷积后的图像与下采样卷积后的图像进行融合,提取更多的特征信息。同时,网络中嵌入了新的多尺度特征提取模块,该模块将通道注意力与密集扩张卷积进行了融合,有效地捕获了更高层次的特征信息。对腹部动脉血管进行分割的结果表明,与其他分割方法相比,所提方法在直观性和定量性上均有提高,证明了该方法能够提升血管分割精度。  相似文献   

18.
6D object pose (3D rotation and translation) estimation from RGB-D image is an important and challenging task in computer vision and has been widely applied in a variety of applications such as robotic manipulation, autonomous driving, augmented reality etc. Prior works extract global feature or reason about local appearance from an individual frame, which neglect the spatial geometric relevance between two frames, limiting their performance for occluded or truncated objects in heavily cluttered scenes. In this paper, we present a dual-stream network for estimating 6D pose of a set of known objects from RGB-D images. Our novelty stands in contrast to prior work that learns latent geometric consistency in pairwise dense feature representations from multiple observations of the same objects in a self-supervised manner. We show in experiments that our method outperforms state-of-the-art approaches on 6D object pose estimation in two challenging datasets, YCB-Video and LineMOD.  相似文献   

19.
对目标车辆的信息进行高效、准确的检测是自动泊车、智能交通等领域的关键技术之一。针对智能泊车机器人对目标车辆进行近距离测量的需求,提出了一种基于单线激光雷达的车辆位姿检测方法。利用激光雷达扫描目标车辆底部区域,并使用DBSCAN聚类算法分割点云。将车轮点云簇视作L形特征,提出了一种基于特征点搜索的车轮拟合算法,同时给出了两种特征角点搜索准则。针对获取的车轮集合,提出了一种筛选目标车辆车轮的策略,假定了两种车辆位姿检测工况并设计了对应的算法。通过实车环境下的测试,验证了方法的实时性、准确性,满足了泊车机器人的位姿检测需求。  相似文献   

20.
The study of 3D hand pose estimation from a single depth image is regarded as a detection-based or regression-based problem among most of the existing deep learning-based methods, and this approach does not fully exploit the geometry of the hand, such as its structural and physical constraints. To overcome these weaknesses, we design a network with three simple parallel branches that correspond to the three functional parts of the hand. This observation is motivated by the biological viewpoint that each finger plays a different role in performing grasping and manipulation. In each branch, we perform a more detailed regression in two stages – top-down joint location regression followed by bottom-up hand pose regression – which fully exploits both the local and global structure of a hand. Finally, we further make use of the hand structure and physical constraints to refine each joint by its auxiliary points. The proposed network is a unified structure and function model that is more appropriate for hand pose estimation. Our system does not require pose pre-processing or feedback since it can directly perform training and predicting from end-to-end. The experimental results on three public datasets demonstrate that the proposed system achieves performance comparable to state-of-the-art methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号