首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对目标的三维姿态估计,结合基于深度学习的目标检测模型,提出一种基于改进YOLO V2的6D目标姿态估计算法。通过卷积神经网络提取一幅RGB图像中目标的特征信息;在2D检测的基础上将目标的位置信息映射到三维空间;利用点到点的映射关系在三维空间匹配并计算目标的自由度,进而估计目标的6D姿态。该算法不仅能检测单幅RGB图像中的目标,还可以预测目标的6D姿态,同时不需要额外的后处理过程。实验表明,该算法在LineMod和Occlusion LineMod数据集上的性能优于最近提出的其他基于CNN的方法,在Titan X GPU上的运行速度是37?frame/s,适合实时处理。  相似文献   

2.
二维手部姿态估计是人机交互领域的一项关键技术。为增强复杂环境下系统鲁棒性,提高手势姿态估计精度,提出一种基于目标检测和热图回归的YOLOv3-HM算法。首先,利用YOLOv3算法从RGB图像中识别框选手部区域,采用CIoU作为边界框损失函数;然后,结合热图回归算法对手部的21个关键点进行标注;最终,通过回归手部热图实现二维手部姿态估计。分别在FreiHAND数据集与真实场景下进行测试,结果表明,该算法相较于传统手势检测算法在姿态估计精度和检测速度上均有所提高,对手部关键点的识别准确率达到99.28%,实时检测速度达到59 f/s,在复杂场景下均能精准实现手部姿态估计。  相似文献   

3.
Sun  Shantong  Liu  Rongke  Du  Qiuchen  Sun  Shuqiao 《Neural Processing Letters》2020,51(3):2417-2436
Neural Processing Letters - Deep learning method for 6D object pose estimation based on RGB image and depth (RGB-D) has been successfully applied to robot grasping. The fusion of RGB and depth is...  相似文献   

4.
《国际计算机数学杂志》2012,89(13):2857-2870
Three novel object's contour detection schemes based on image fusion are proposed in this paper. In these schemes an active contour model is applied to detect the object's contour edge. Since an object's contour in an infrared (IR) image is usually clearer than that in a visible image, the convergent active contour in a visible image is improved with that in an IR image. The first contour detection scheme is realized by revising the shape-preserving active contour model. The second scheme minimizes the B-spline L 2 norm's square of the difference of the B-spline control point vectors in two modal images. Contour tracking and extraction experiments indicate that the first scheme outperforms the second one. Moreover, a third scheme based on the active contour and pixel-level image fusion is proposed for images with incomplete but complementary scene information. An example using contour extraction of a partially hidden tank proves its efficacy.  相似文献   

5.
6.
从图像中获取目标物体的6D位姿信息在机器人操作和虚拟现实等领域有着广泛的应用,然而,基于深度学习的位姿估计方法在训练模型时通常需要大量的训练数据集来提高模型的泛化能力,一般的数据采集方法存在收集成本高同时缺乏3D空间位置信息等问题.鉴于此,提出一种低质量渲染图像的目标物体6D姿态估计网络框架.该网络中,特征提取部分以单张RGB图像作为输入,用残差网络提取输入图像特征;位姿估计部分的目标物体分类流用于预测目标物体的类别,姿态回归流在3D空间中回归目标物体的旋转角度和平移矢量.另外,采用域随机化方法以低收集成本方式构建大规模低质量渲染、带有物体3D空间位置信息的图像数据集Pose6DDR.在所建立的Pose6DDR数据集和LineMod公共数据集上的测试结果表明了所提出位姿估计方法的优越性以及大规模数据集域随机化生成数据方法的有效性.  相似文献   

7.
3D human pose estimation in motion is a hot research direction in the field of computer vision. However, the performance of the algorithm is affected by the complexity of 3D spatial information, self-occlusion of human body, mapping uncertainty and other problems. In this paper, we propose a 3D human joint localization method based on multi-stage regression depth network and 2D to 3D point mapping algorithm. First of all, we use a single RGB image as the input, through the introduction of heatmap and multi-stage regression to constantly optimize the coordinates of human joint points. Then we input the 2D joint points into the mapping network for calculation, and get the coordinates of 3D human body joint points, and then to complete the 3D human body pose estimation task. The MPJPE of the algorithm in Human3.6 M dataset is 40.7. The evaluation of dataset shows that our method has obvious advantages.  相似文献   

8.
In this paper, we introduce a method to estimate the object’s pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature points from adjacent images in the video sequence. We first demonstrate that centralized pose estimation from the collection of corresponding feature points in the 2D images from all cameras can be obtained as a solution to a generalized Sylvester’s equation. We subsequently derive a distributed solution to pose estimation from multiple cameras and show that it is equivalent to the solution of the centralized pose estimation based on Sylvester’s equation. Specifically, we rely on collaboration among the multiple cameras to provide an iterative refinement of the independent solution to pose estimation obtained for each camera based on Sylvester’s equation. The proposed approach to pose estimation from multiple cameras relies on all of the information available from all cameras to obtain an estimate at each camera even when the image features are not visible to some of the cameras. The resulting pose estimation technique is therefore robust to occlusion and sensor errors from specific camera views. Moreover, the proposed approach does not require matching feature points among images from different camera views nor does it demand reconstruction of 3D points. Furthermore, the computational complexity of the proposed solution grows linearly with the number of cameras. Finally, computer simulation experiments demonstrate the accuracy and speed of our approach to pose estimation from multiple cameras.  相似文献   

9.
In this paper, we address the problem of 2D–3D pose estimation. Specifically, we propose an approach to jointly track a rigid object in a 2D image sequence and to estimate its pose (position and orientation) in 3D space. We revisit a joint 2D segmentation/3D pose estimation technique, and then extend the framework by incorporating a particle filter to robustly track the object in a challenging environment, and by developing an occlusion detection and handling scheme to continuously track the object in the presence of occlusions. In particular, we focus on partial occlusions that prevent the tracker from extracting an exact region properties of the object, which plays a pivotal role for region-based tracking methods in maintaining the track. To this end, a dynamical choice of how to invoke the objective functional is performed online based on the degree of dependencies between predictions and measurements of the system in accordance with the degree of occlusion and the variation of the object’s pose. This scheme provides the robustness to deal with occlusions of an obstacle with different statistical properties from that of the object of interest. Experimental results demonstrate the practical applicability and robustness of the proposed method in several challenging scenarios.  相似文献   

10.
We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study. First, we introduce a two-stream architecture consisting of segmentation and regression streams. The segmentation stream processes the spatial embedding features and obtains the corresponding image crop. These features are further coupled with the image crop in the fusion network. Second, we use an efficient perspective-n-point (E-PnP) algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints. Finally, we perform iterative refinement with an end-to-end mechanism to improve the estimation performance. We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD. The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.  相似文献   

11.
物体位姿估计是机器人在散乱环境中实现三维物体拾取的关键技术,然而目前多数用于物体位姿估计的深度学习方法严重依赖场景的RGB信息,从而限制了其应用范围。提出基于深度学习的六维位姿估计方法,在物理仿真环境下生成针对工业零件的数据集,将三维点云映射到二维平面生成深度特征图和法线特征图,并使用特征融合网络对散乱场景中的工业零件进行六维位姿估计。在仿真数据集和真实数据集上的实验结果表明,该方法相比传统点云位姿估计方法准确率更高、计算时间更短,且对于疏密程度不一致的点云以及噪声均具有更强的鲁棒性。  相似文献   

12.
Earthwork operations are crucial parts of most construction projects. Heavy construction equipment and workers are often required to work in limited workspaces simultaneously. Struck-by accidents resulting from poor worker and equipment interactions account for a large proportion of accidents and fatalities on construction sites. The emerging technologies based on computer vision and artificial intelligence offer an opportunity to enhance construction safety through advanced monitoring utilizing site cameras. A crucial pre-requisite to the development of safety monitoring applications is the ability to identify accurately and localize the position of the equipment and its critical components in 3D space. This study proposes a workflow for excavator 3D pose estimation based on deep learning using RGB images. In the proposed workflow, an articulated 3D digital twin of an excavator is used to generate the necessary data for training a 3D pose estimation model. In addition, a method for generating hybrid datasets (simulation and laboratory) for adapting the 3D pose estimation model for various scenarios with different camera parameters is proposed. Evaluations prove the capability of the workflow in estimating the 3D pose of excavators. The study concludes by discussing the limitations and future research opportunities.  相似文献   

13.
In this paper we present a novel vision-based markerless hand pose estimation scheme with the input of depth image sequences. The proposed scheme exploits both temporal constraints and spatial features of the input sequence, and focuses on hand parsing and 3D fingertip localization for hand pose estimation. The hand parsing algorithm incorporates a novel spatial-temporal feature into a Bayesian inference framework to assign the correct label to each image pixel. The 3D fingertip localization algorithm adapts a recently developed geodesic extrema extraction method to fingertip detection with the hand parsing algorithm, a novel path-reweighting method and K-means clustering in metric space. The detected 3D fingertip locations are finally used for hand pose estimation with an inverse kinematics solver. Quantitative experiments on synthetic data show the proposed hand pose estimation scheme can accurately capture the natural hand motion. A simulated water-oscillator application is also built to demonstrate the effectiveness of the proposed method in human-computer interaction scenarios.  相似文献   

14.
Head pose estimation plays an essential role in many high-level face analysis tasks. However, accurate and robust pose estimation with existing approaches remains challenging. In this paper, we propose a novel method for accurate three-dimensional (3D) head pose estimation with noisy depth maps and high-resolution color images that are typically produced by popular RGBD cameras such as the Microsoft Kinect. Our method combines the advantages of the high-resolution RGB image with the 3D information of the depth image. For better accuracy and robustness, features are first detected using only the color image, and then the 3D feature points used for matching are obtained by combining depth information. The outliers are then filtered with depth information using rules proposed for depth consistency, normal consistency, and re-projection consistency, which effectively eliminate the influence of depth noise. The pose parameters are then iteratively optimized using the Extended LM (Levenberg-Marquardt) method. Finally, a Kalman filter is used to smooth the parameters. To evaluate our method, we built a database of more than 10K RGBD images with ground-truth poses recorded using motion capture. Both qualitative and quantitative evaluations show that our method produces notably smaller errors than previous methods.  相似文献   

15.
A model-based approach for estimating human 3D poses in static images   总被引:2,自引:0,他引:2  
Estimating human body poses in static images is important for many image understanding applications including semantic content extraction and image database query and retrieval. This problem is challenging due to the presence of clutter in the image, ambiguities in image observation, unknown human image boundary, and high-dimensional state space due to the complex articulated structure of the human body. Human pose estimation can be made more robust by integrating the detection of body components such as face and limbs, with the highly constrained structure of the articulated body. In this paper, a data-driven approach based on Markov chain Monte Carlo (DD-MCMC) is used, where component detection results generate state proposals for 3D pose estimation. To translate these observations into pose hypotheses, we introduce the use of "proposal maps," an efficient way of consolidating the evidence and generating 3D pose candidates during the MCMC search. Experimental results on a set of test images show that the method is able to estimate the human pose in static images of real scenes.  相似文献   

16.
基于深度学习的人体姿态估计方法旨在通过构建合适的神经网络,直接从二维的图像特征中回归出人体姿态信息。主要按照2D人体姿态估计到3D人体姿态估计的顺序,并从单人检测与多人检测、稀疏的关节点检测与密集的模型构建等方面,对近年来基于深度学习的人体姿态估计方法进行系统介绍,从而初步了解如何通过深度学习的方法得到人体姿态的各个要素,包括肢体部件的相对朝向和比例尺度、骨骼关节点的位置坐标和连接关系,甚至更为复杂的人体蒙皮模型信息。最后,对当前研究面临的挑战以及未来的热点动向进行概述,清晰地呈现出该领域的发展脉络。  相似文献   

17.
目的 2D姿态估计的误差是导致3D人体姿态估计产生误差的主要原因,如何在2D误差或噪声干扰下从2D姿态映射到最优、最合理的3D姿态,是提高3D人体姿态估计的关键。本文提出了一种稀疏表示与深度模型联合的3D姿态估计方法,以将3D姿态空间几何先验与时间信息相结合,达到提高3D姿态估计精度的目的。方法 利用融合稀疏表示的3D可变形状模型得到单帧图像可靠的3D初始值。构建多通道长短时记忆MLSTM(multi-channel long short term memory)降噪编/解码器,将获得的单帧3D初始值以时间序列形式输入到其中,利用MLSTM降噪编/解码器学习相邻帧之间人物姿态的时间依赖关系,并施加时间平滑约束,得到最终优化的3D姿态。结果 在Human3.6M数据集上进行了对比实验。对于两种输入数据:数据集给出的2D坐标和通过卷积神经网络获得的2D估计坐标,相比于单帧估计,通过MLSTM降噪编/解码器优化后的视频序列平均重构误差分别下降了12.6%,13%;相比于现有的基于视频的稀疏模型方法,本文方法对视频的平均重构误差下降了6.4%,9.1%。对于2D估计坐标数据,相比于现有的深度模型方法,本文方法对视频的平均重构误差下降了12.8%。结论 本文提出的基于时间信息的MLSTM降噪编/解码器与稀疏模型相结合,有效利用了3D姿态先验知识,视频帧间人物姿态连续变化的时间和空间依赖性,一定程度上提高了单目视频3D姿态估计的精度。  相似文献   

18.
In this paper, a homography‐based visual servo controller is developed for a rigid body to track a moving object in three‐dimensional space with a fixed relative pose. Specifically, a monocular camera is mounted on the rigid body, and the desired relative pose is expressed by a pre‐recorded reference image. Homography is exploited to obtain the orientation and scaled position for controller design. Considering the unknown moving object's velocities and distance information, a continuous nonlinear visual controller is developed using the robust integral of the signum of the error methodology. To facilitate the stability analysis, the system uncertainties regarding the moving object's velocities and distance information are divided into the error‐unrelated system uncertainties and the error‐related system uncertainties. After that, the upper bounds of the error‐related system uncertainties are derived with composited system errors. An asymptotic tracking of the leading object is proved based on the Lyapunov methods and the derived upper bounds. In addition, the proposed controller is extended to address the trajectory tracking problem. Simulation results validate the effectiveness of the proposed approach. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

19.
提出了一种多物体环境下基于改进YOLOv2的无标定3D机械臂自主抓取方法。首先为了降低深度学习算法YOLOv2检测多物体边界框重合率和3D距离计算误差,提出了一种改进的YOLOv2算法。利用此算法对图像中的目标物体进行检测识别,得到目标物体在RGB图像中的位置信息; 然后根据深度图像信息使用K-means++聚类算法快速计算目标物体到摄像机的距离,估计目标物体大小和姿态,同时检测机械手的位置信息,计算机械手到目标物体的距离; 最后根据目标物体的大小、姿态和到机械手的距离,使用PID算法控制机械手抓取物体。提出的改进YOLOv2算法获得了更精准的物体边界框,边框交集更小,提高了目标物体距离检测和大小、姿态估计的准确率。为了避免了繁杂的标定,提出无标定抓取方法,代替了基于雅克比矩阵的无标定估计方法,通用性好。实验验证了提出的系统框架能对图像中物体进行较为准确的自动分类和定位,利用Universal Robot 3机械臂能够对任意摆放的物体进行较为准确的抓取。  相似文献   

20.
Accurate visual hand pose estimation at joint level has several applications for human-robot interaction, natural user interfaces and virtual/augmented reality applications. However, it is still an open problem being addressed by the computer vision community. Recent novel deep learning techniques may help circumvent the limitations of standard approaches. However, they require large amounts of accurate annotated data.Hand pose datasets that have been released so far present issues such as limited number of samples, inaccurate data or high-level annotations. Moreover, most of them are focused on depth-based approaches, providing only depth information (missing RGB data).In this work, we present a novel multiview hand pose dataset in which we provide hand color images and different kind of annotations for each sample, i.e. the bounding box and the 2D and 3D location on the joints in the hand. Furthermore, we introduce a simple yet accurate deep learning architecture for real-time robust 2D hand pose estimation. Then, we conduct experiments that show how the use of the proposed dataset in the training stage produces accurate results for 2D hand pose estimation using a single color camera.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号