首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对工业上常见的弱纹理、散乱摆放复杂场景下点云目标机器人抓取问题,该文提出一种6D位姿估计深度学习网络。首先,模拟复杂场景下点云目标多姿态随机摆放的物理环境,生成带真实标签的数据集;进而,设计了6D位姿估计深度学习网络模型,提出多尺度点云分割网络(MPCS-Net),直接在完整几何点云上进行点云实例分割,解决了对RGB信息和点云分割预处理的依赖问题。然后,提出多层特征姿态估计网(MFPE-Net),有效地解决了对称物体的位姿估计问题。最后,实验结果和分析证实了,相比于传统的点云配准方法和现有的切分点云的深度学习位姿估计方法,所提方法取得了更高的准确率和更稳定性能,并且在估计对称物体位姿时有较强的鲁棒性。  相似文献   

2.
Three-dimensional human pose estimation (3D HPE) has broad application prospects in the fields of trajectory prediction, posture tracking and action analysis. However, the frequent self-occlusions and the substantial depth ambiguity in two-dimensional (2D) representations hinder the further improvement of accuracy. In this paper, we propose a novel video-based human body geometric aware network to mitigate the above problems. Our network can implicitly be aware of the geometric constraints of the human body by capturing spatial and temporal context information from 2D skeleton data. Specifically, a novel skeleton attention (SA) mechanism is proposed to model geometric context dependencies among different body joints, thereby improving the spatial feature representation ability of the network. To enhance the temporal consistency, a novel multilayer perceptron (MLP)-Mixer based structure is exploited to comprehensively learn temporal context information from input sequences. We conduct experiments on publicly available challenging datasets to evaluate the proposed approach. The results outperform the previous best approach by 0.5 mm in the Human3.6m dataset. It also demonstrates significant improvements in HumanEva-I dataset.  相似文献   

3.
Human action recognition in videos is still an important while challenging task. Existing methods based on RGB image or optical flow are easily affected by clutters and ambiguous backgrounds. In this paper, we propose a novel Pose-Guided Inflated 3D ConvNet framework (PI3D) to address this issue. First, we design a spatial–temporal pose module, which provides essential clues for the Inflated 3D ConvNet (I3D). The pose module consists of pose estimation and pose-based action recognition. Second, for multi-person estimation task, the introduced pose estimation network can determine the action most relevant to the action category. Third, we propose a hierarchical pose-based network to learn the spatial–temporal features of human pose. Moreover, the pose-based network and I3D network are fused at the last convolutional layer without loss of performance. Finally, the experimental results on four data sets (HMDB-51, SYSU 3D, JHMDB and Sub-JHMDB) demonstrate that the proposed PI3D framework outperforms the existing methods on human action recognition. This work also shows that posture cues significantly improve the performance of I3D.  相似文献   

4.
Human pose estimation aims at predicting the poses of human body parts in images or videos. Since pose motions are often driven by some specific human actions, knowing the body pose of a human is critical for action recognition. This survey focuses on recent progress of human pose estimation and its application to action recognition. We attempt to provide a comprehensive review of recent bottom-up and top-down deep human pose estimation models, as well as how pose estimation systems can be used for action recognition. Thanks to the availability of commodity depth sensors like Kinect and its capability for skeletal tracking, there has been a large body of literature on 3D skeleton-based action recognition, and there are already survey papers such as [1] about this topic. In this survey, we focus on 2D skeleton-based action recognition where the human poses are estimated from regular RGB images instead of depth images. We summarize the performance of recent action recognition methods that use pose estimated from color images as input, then show that there is much room for improvements in this direction.  相似文献   

5.
Finding objects and tracking their poses are essential functions for service robots, in order to manipulate objects and interact with humans. We present novel algorithms for local feature matching for object detection, and 3D pose estimation. Our feature matching algorithm takes advantage of local geometric consistency for better performance, and the new 3D pose estimation algorithm solves the pose in a closed-form using homography, followed by a non-linear optimization step for stability. Advantages of our approach include better performance, minimal prior knowledge for the target pattern, and easy implementation and portability as a modularized software component. We have implemented our approach along with both CPU and GPU-based feature extraction, and built an interoperable component that can be used in any Robot Technology (RT)-based control system. Experiment shows that our approach produces very robust results for the estimated 3D pose, and maintain very low false positive rate. It is also fast enough to be used in on-line applications. We integrated our vision component in an autonomous robot system with a search-and-grasp task, and tested it with several objects that are found in ordinary domestic environment. We present the details of our approach, the design of our modular component design, and the results of the experiments in this paper.  相似文献   

6.
7.
Hand Pose Estimation aims to predict the position of joints on a hand from an image, and it has become popular because of the emergence of VR/AR/MR technology. Nevertheless, an issue surfaces when trying to achieve this goal, since a hand tends to cause self-occlusion or external occlusion easily as it interacts with external objects. As a result, there have been many projects dedicated to this field for a better solution of this problem. This paper develops a system that accurately estimates a hand pose in 3D space using depth images for VR applications. We propose a data-driven approach of training a deep learning model for hand pose estimation with object interaction. In the convolutional neural network (CNN) training procedure, we design a skeleton-difference loss function, which effectively can learn the physical constraints of a hand. Also, we propose an object-manipulating loss function, which considers knowledge of the hand-object interaction, to enhance performance.In the experiments we have conducted for hand pose estimation under different conditions, the results validate the robustness and the performance of our system and show that our method is able to predict the joints more accurately in challenging environmental settings. Such appealing results may be attributed to the consideration of the physical joint relationship as well as object information, which in turn can be applied to future VR/AR/MR systems for more natural experience.  相似文献   

8.
Hand pose estimation is a challenging task owing to the high flexibility and serious self-occlusion of the hand. Therefore, an optimized convolutional pose machine (OCPM) was proposed in this study to estimate the hand pose accurately. Traditional CPMs have two components, a feature extraction module and an information processing module. First, the backbone network of the feature extraction module was replaced by Resnet-18 to reduce the number of network parameters. Furthermore, an attention module called the convolutional block attention module (CBAM) is embedded into the feature extraction module to enhance the information extraction. Then, the structure of the information processing module was adjusted through a residual connection in each stage that consist of a series of continuous convolutional operations, and requires a dense fusion between the output from all previous stages and the feature extraction module. The experimental results on two public datasets showed that the OCPM network achieved excellent performance.  相似文献   

9.
Three-dimensional (3D) human pose tracking has recently attracted more and more attention in the computer vision field. Real-time pose tracking is highly useful in various domains such as video surveillance, somatosensory games, and human-computer interaction. However, vision-based pose tracking techniques usually raise privacy concerns, making human pose tracking without vision data usage an important problem. Thus, we propose using Radio Frequency Identification (RFID) as a pose tracking technique via a low-cost wearable sensing device. Although our prior work illustrated how deep learning could transfer RFID data into real-time human poses, generalization for different subjects remains challenging. This paper proposes a subject-adaptive technique to address this generalization problem. In the proposed system, termed Cycle-Pose, we leverage a cross-skeleton learning structure to improve the adaptability of the deep learning model to different human skeletons. Moreover, our novel cycle kinematic network is proposed for unpaired RFID and labeled pose data from different subjects. The Cycle-Pose system is implemented and evaluated by comparing its prototype with a traditional RFID pose tracking system. The experimental results demonstrate that Cycle-Pose can achieve lower estimation error and better subject generalization than the traditional system.  相似文献   

10.
6D object pose (3D rotation and translation) estimation from RGB-D image is an important and challenging task in computer vision and has been widely applied in a variety of applications such as robotic manipulation, autonomous driving, augmented reality etc. Prior works extract global feature or reason about local appearance from an individual frame, which neglect the spatial geometric relevance between two frames, limiting their performance for occluded or truncated objects in heavily cluttered scenes. In this paper, we present a dual-stream network for estimating 6D pose of a set of known objects from RGB-D images. Our novelty stands in contrast to prior work that learns latent geometric consistency in pairwise dense feature representations from multiple observations of the same objects in a self-supervised manner. We show in experiments that our method outperforms state-of-the-art approaches on 6D object pose estimation in two challenging datasets, YCB-Video and LineMOD.  相似文献   

11.
Dense 3D reconstruction is required for robots to safely navigate or perform advanced tasks. The accurate depth information of the image and its pose are the basis of 3D reconstruction. The resolution of depth maps obtained by LIDAR and RGB-D cameras is limited, and traditional pose calculation methods are not accurate enough. In addition, if each image is used for dense 3D reconstruction, the dense point clouds will increase the amount of calculation. To address these issues, we propose a 3D reconstruction system. Specifically, we propose a depth network of contour and gradient attention, which is used to complete and correct depth maps to obtain high-resolution and high-quality depth maps. Then, we propose a method of fusion of traditional algorithms and deep learning for pose estimation to obtain accurate localization results. Finally, we adopt the method of autonomous selection of keyframes to reduce the number of keyframes, the surfel-based geometric reconstruction is performed to reconstruct the dense 3D environment. On the TUM RGB-D, ICL-NIUM, and KITTI datasets, our method significantly improves the quality of the depth maps, the localization results, and the effect of 3D reconstruction. At the same time, we have also accelerated the speed of 3D reconstruction.  相似文献   

12.
该文用交叉电偶极子对在锥面共形载体表面构造极化敏感阵列,在建立其快拍数据模型的基础上实现了信源方位和极化参数的联合估计。算法首先通过合理的矩阵变换将阵列流形中的信源方位和极化信息去耦合,然后分别根据秩损理论和旋转不变子空间思想对其进行估计,最后通过一种轮换比对配对方法实现信源方位和极化参数的联合估计。Monte-Carlo仿真实验表明,所提算法可以很好地解决锥面共形阵列的多参数联合估计问题。  相似文献   

13.
In this paper, a method is proposed to improve the accuracy of 3D hand pose estimation. The existing methods make poor use of the depth information of hand joints and have difficulties of estimating the 3D coordinates accurately. To solve this problem, a method that utilizing the information between adjacent joints of each finger is proposed to estimate the depth coordinates of joints. In order to make full use of 2D information for depth estimation, this paper divides hand pose estimation into two sub-tasks (2D hand joints estimation and depth estimation). In depth estimation, a multi-stage network is proposed. We first estimate the depth of a part of hand joints, and then with the help of it and 2D information, the depth coordinates of adjacent joints can be well estimated. The method proposed in this paper has been proved to be effective on three public hand pose datasets through Self-comparisons. Compared with the methods that based on 2D CNN, our method achieves state-of-the-art performance on ICVL and NYU datasets, and also has a good result on MSRA dataset.  相似文献   

14.
In this paper,we proposed a novel Two-layer Motion Estimation (TME) which searches motion vectors on two layers with partial distortion measures in order to reduce the overwhelming computational complexity of Motion Estimation (ME) in video coding.A layer is an image which is derived from the reference frame such that the sum of a block of pixels in the reference frame determines the point of a layer.It has been noticed on different video sequences that many motion vectors on the layers are the same as those searched on the reference frame.The proposed TME performs a coarse search on the first layer to identify the small region in which the best candidate block is likely to be positioned and then perform local refined search on the next layer to pick the best candidate block in the located small area.The key feature of TME is its flexibility of mixing with any fast search algorithm.Experimental results on a wide variety of video sequences show that the proposed algorithm has achieved both fast speed and good motion prediction quality when compared to well known as well as the state-of-the-art fast block matching algorithms.  相似文献   

15.
针对基于合成孔径声呐(SAS)图像目标识别的先验模板获取困难、运算复杂度高的问题,该文提出一种基于模型的改进型相关快速识别方法。首先,基于构造凸壳估计目标姿态角,实现目标成像几何关系的估计;其次,提出改进的基于隐藏点移除的目标图像快速生成方法,可实时得到各备选目标对应成像几何关系的仿真图像;进而基于图像相关实现目标图像识别;最后,仿真实验证明了算法的有效性。仿真实验结果表明,相比于常规的直接模板识别方法,该方法识别率高,计算速度快。  相似文献   

16.
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss–Seidel iterative method. Motivated by this iterative solution, we design a Gauss–Seidel network (GS-Net) architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer’s initial features as the network depth increases. We evaluate our proposed model on two standard benchmark datasets, and compare it with a comprehensive set of strong baseline methods for 3D human pose estimation. Our experimental results demonstrate that our approach outperforms the baseline methods on both datasets, achieving state-of-the-art performance. Furthermore, we conduct ablation studies to analyze the contributions of different components of our model architecture and show that the skip connection and adjacency modulation help improve the model performance.  相似文献   

17.
Monocular 3D human pose estimation is a challenging task because of depth ambiguity and occlusion. Recent methods exploit spatio-temporal information and generate different hypotheses for simulating diverse solutions to alleviate these problems. However, these methods do not fully extract spatial and temporal information and the relationship of each hypothesis. To ease these limitations, we propose EMHIFormer (Enhanced Multi-Hypothesis Interaction Transformer) to model 3D human pose with better performance. In detail, we build connections between different Transformer layers so that our model is able to integrate spatio-temporal information from the previous layer and establish more comprehensive hypotheses. Furthermore, a cross-hypothesis model consisting of a parallel Transformer is proposed to strengthen the relationship between various hypotheses. We also design an enhanced regression head which adaptively adjusts the channel weights to export the final 3D human pose. Extensive experiments are conducted on two challenging datasets: Human3.6M and MPI-INF-3DHP to evaluate our EMHIFormer. The results show that EMHIFormer achieves competitive performance on Human3.6M and state-of-the-art performance on MPI-INF-3DHP. Compared with the closest counterpart, MHFormer, our model outperforms it by 0.6% P-MPJPE and 0.5% MPJPE on Human3.6M dataset and 46.0% MPJPE on MPI-INF-3DHP.  相似文献   

18.
This paper proposed a novel fast fractional pixel search algorithm based on polynomial model. With the analysis of distribution characteristics of motion compensation error surface inside tractional pixel searching window, the matching error is fitted with parabola along horizontal and vertical direction respectively. The proposcd searching strategy needs to check only 6 points rather than 16 or 24 points, which are used in the l lierarchical Fractional Pel Search algorithm (HFPS) for 1/4-pel and 1/8-pel Motion Estimation (ME). The experimental results show that the proposed algorithm shows very good capability in keeping the rate distortion performance while reduces computation load to a large extent compared with HFPS algorithm.  相似文献   

19.
In this paper, we propose a new method that combines collage error in fractal domain and Hu moment invariants for image retrieval with a statistical method - variable bandwidth Kernel Density Estimation (KDE). The proposed method is called CHK (KDE of Collage error and Hu moment) and it is tested on the Vistex texture database with 640 natural images. Experimental results show that the Average Retrieval Rate (ARR) can reach into 78.18%, which demonstrates that the proposed method performs better than the one with parameters respectively as well as the commonly used histogram method both on retrieval rate and retrieval time.  相似文献   

20.
在激光雷达目标识别中,目标姿态的精确估计可以有效地简化识别过程.现有的PDVA算法主要是针对地面结构化目标而提出的一种3D目标姿态估计方法.该方法利用模型坐标系(MCS)各个坐标轴的正方向向量来确定目标的三维姿态角,其有效性通过实验得到了验证.但该方法在确定MCS各坐标轴的正方向向量时,所消耗的时间比较多,影响了算法的执行效率.文中提出了一种改进的PDVA算法,利用聚类中心邻域判别CCND法来加速MCS各坐标轴的正方向向量的确定过程.采用四种地面军用车模型目标进行了仿真实验,实验结果显示,改进的PDVA算法的平均运行时间约占PDVA算法的66%,极大地提高了目标3D姿态估计的执行效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号