首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
Finding objects and tracking their poses are essential functions for service robots, in order to manipulate objects and interact with humans. We present novel algorithms for local feature matching for object detection, and 3D pose estimation. Our feature matching algorithm takes advantage of local geometric consistency for better performance, and the new 3D pose estimation algorithm solves the pose in a closed-form using homography, followed by a non-linear optimization step for stability. Advantages of our approach include better performance, minimal prior knowledge for the target pattern, and easy implementation and portability as a modularized software component. We have implemented our approach along with both CPU and GPU-based feature extraction, and built an interoperable component that can be used in any Robot Technology (RT)-based control system. Experiment shows that our approach produces very robust results for the estimated 3D pose, and maintain very low false positive rate. It is also fast enough to be used in on-line applications. We integrated our vision component in an autonomous robot system with a search-and-grasp task, and tested it with several objects that are found in ordinary domestic environment. We present the details of our approach, the design of our modular component design, and the results of the experiments in this paper.  相似文献   

2.
基于视点特征直方图的激光点云模型的位姿估计   总被引:2,自引:2,他引:0  
提出一种基于视点特征直方图的点云模型位姿估计 算法。首先在目标物体周围采 集三维点云,拼接后获得物体的完整点云模型;然后对点云模型计算其视点特征直方图, 构建特征数据 库;对待估计点云同样计算其特征直方图,使用KNN算法在数据库中搜索与之最接近的位 姿作为初始位 姿估计值;最后使用迭代最近点(ICP)算法将待估计点云精确配准到模型点云,从而获得坐 标系之间的相 对位姿。实验表明,这种方法对于物体位姿识别有很强的稳健性,能很好实现目标物体的 三维位姿计算。  相似文献   

3.
A Human–machine interaction system requires precise information about the user’s body position, in order to allow a natural 3D interaction in stereoscopic augmented reality environments, where real and virtual objects should coherently coexist. The diffusion of RGB-D sensors seems to provide an effective solution to such a problem. Nevertheless, the interaction with stereoscopic 3D environments, in particular in peripersonal space, requires a higher degree of precision. To this end, a reliable calibration of such sensors and an accurate estimation of the relative pose of different RGB-D and visualization devices are crucial. Here, robust and straightforward procedures to calibrate a RGB-D camera, to improve the accuracy of its 3D measurements, and to co-register different calibrated devices are proposed. Quantitative measures validate the proposed approach. Moreover, calibrated devices have been used in an augmented reality system, based on a dynamic stereoscopic rendering technique that needs accurate information about the observer’s eyes position.  相似文献   

4.
Dense 3D reconstruction is required for robots to safely navigate or perform advanced tasks. The accurate depth information of the image and its pose are the basis of 3D reconstruction. The resolution of depth maps obtained by LIDAR and RGB-D cameras is limited, and traditional pose calculation methods are not accurate enough. In addition, if each image is used for dense 3D reconstruction, the dense point clouds will increase the amount of calculation. To address these issues, we propose a 3D reconstruction system. Specifically, we propose a depth network of contour and gradient attention, which is used to complete and correct depth maps to obtain high-resolution and high-quality depth maps. Then, we propose a method of fusion of traditional algorithms and deep learning for pose estimation to obtain accurate localization results. Finally, we adopt the method of autonomous selection of keyframes to reduce the number of keyframes, the surfel-based geometric reconstruction is performed to reconstruct the dense 3D environment. On the TUM RGB-D, ICL-NIUM, and KITTI datasets, our method significantly improves the quality of the depth maps, the localization results, and the effect of 3D reconstruction. At the same time, we have also accelerated the speed of 3D reconstruction.  相似文献   

5.
We present a geometry-based indexing approach for the retrieval of video databases. It consists of two modules: 3D object shape inferencing from video data and geometric modeling from the reconstructed shape structure. A motion-based segmentation algorithm employing feature block tracking and principal component split is used for multi-moving-object motion classification and segmentation. After segmentation, feature blocks from each individual object are used to reconstruct its motion and structure through a factorization method. The estimated shape structure and motion parameters are used to generate the implicit polynomial model for the object. The video data is retrieved using the geometric structure of objects and their spatial relationship. We generalize the 2D string to 3D to compactly encode the spatial relationship of objects.  相似文献   

6.
针对工业上常见的弱纹理、散乱摆放复杂场景下点云目标机器人抓取问题,该文提出一种6D位姿估计深度学习网络。首先,模拟复杂场景下点云目标多姿态随机摆放的物理环境,生成带真实标签的数据集;进而,设计了6D位姿估计深度学习网络模型,提出多尺度点云分割网络(MPCS-Net),直接在完整几何点云上进行点云实例分割,解决了对RGB信息和点云分割预处理的依赖问题。然后,提出多层特征姿态估计网(MFPE-Net),有效地解决了对称物体的位姿估计问题。最后,实验结果和分析证实了,相比于传统的点云配准方法和现有的切分点云的深度学习位姿估计方法,所提方法取得了更高的准确率和更稳定性能,并且在估计对称物体位姿时有较强的鲁棒性。  相似文献   

7.
We address the problem of accurate and efficient alignment of 3D point clouds captured by an RGB-D (Kinect-style) camera from different viewpoints. While the Iterative Closest Point (ICP) algorithm has been widely used for dense point cloud matching, it is limited in its ability to produce accurate results in challenging scenarios involving objects that lack structural features and have significant camera view changes. In this paper, we introduce a new cost function with dynamic weights for the ICP algorithm to tackle this problem. It balances the significance of structural and photometric features with dynamically adjusted weights to improve the error minimization process. Our algorithm also includes a novel outlier rejection method, which adopts adaptive thresholding at each ICP iteration, using both the structural information of the object and the spatial distances of sparse SIFT feature pairs. The effectiveness of our proposed approach is demonstrated by experimental results from various challenging scenarios. We obtained superior registration accuracy than related previous methods, at the same time maintaining low computational requirements.  相似文献   

8.
王鹏  周权通  孙长库 《红外与激光工程》2017,46(5):517001-0517001(9)
为解决单目视觉位姿测量时,由目标特征点较多导致图像点与目标点拓扑关系未知的问题,提出了一种多特征点拓扑确定位姿测量算法。较多特征点可在目标进行大角度运动时保证足够的特征点进行位姿解算,与较少特征点相比提高测量精度。该算法将拓扑确定的过程和位姿求解的迭代过程进行嵌套,同时进行拓扑确定和位姿计算。位姿计算的迭代过程基于平行透视投影模型,不需要目标重心投影点坐标作为迭代初始值。拓扑确定的过程转化为分配问题的求解过程。每次位姿迭代的过程中进行一次拓扑确定,拓扑确定的结果可以计算更优的位姿估计。通过多位姿测量实验和精度对比实验结果证明:该算法适合大范围、高精度的位姿测量,在-120~120范围内,位姿测量均方根误差为0。272。  相似文献   

9.
Detecting objects and estimating their 6D poses from a single RGB image is quite challenging under severe occlusions. Recently, vector-field based methods have shown certain robustness to occlusion and truncation. Based on the vector-field representation, applying voting strategy to localize 2D keypoints can further reduce the influence of outliers. To improve the effectiveness of vector-field based deep network and voting scheme, we propose Atrous Spatial Pyramid Pooling and Distance-Filtered PVNet (ASPP-DF-PVNet), an occlusion resistant framework for 6D object pose estimation. ASPP-DF-PVNet utilizes the effective Atrous Spatial Pyramid Pooling (ASPP) module of Deeplabv3 to capture multi-scale features and encode global context information, which improves the accuracy of segmentation and vector-field prediction comparing to the original PVNet, especially under severe occlusions. Considering that the distances between pixels and keypoint hypotheses will affect the voting deviations, we then present a distance-filtered voting scheme which takes the voting distances into consideration to filter out the votes with large deviations. Experiments demonstrate that our method outperforms the state-of-the-art methods by a considerable margin without using pose refinement, and obtains competitive results against the methods with refinement on the LINEMOD and Occlusion LINEMOD datasets.  相似文献   

10.
Hand Pose Estimation aims to predict the position of joints on a hand from an image, and it has become popular because of the emergence of VR/AR/MR technology. Nevertheless, an issue surfaces when trying to achieve this goal, since a hand tends to cause self-occlusion or external occlusion easily as it interacts with external objects. As a result, there have been many projects dedicated to this field for a better solution of this problem. This paper develops a system that accurately estimates a hand pose in 3D space using depth images for VR applications. We propose a data-driven approach of training a deep learning model for hand pose estimation with object interaction. In the convolutional neural network (CNN) training procedure, we design a skeleton-difference loss function, which effectively can learn the physical constraints of a hand. Also, we propose an object-manipulating loss function, which considers knowledge of the hand-object interaction, to enhance performance.In the experiments we have conducted for hand pose estimation under different conditions, the results validate the robustness and the performance of our system and show that our method is able to predict the joints more accurately in challenging environmental settings. Such appealing results may be attributed to the consideration of the physical joint relationship as well as object information, which in turn can be applied to future VR/AR/MR systems for more natural experience.  相似文献   

11.
This work is about solving a challenging problem of estimating the full 3D hand pose when a hand interacts with an unknown object. Compared to isolated single hand pose estimation, occlusion and interference induced by the manipulated object and the clutter background bring more difficulties for this task. Our proposed Multi-Level Fusion Net focuses on extracting more effective features to overcome these disadvantages by multi-level fusion design with a new end-to-end Convolutional Neural Network (CNN) framework. It takes cropped RGBD data from a single RGBD camera at free viewpoint as input without requiring additional hand–object pre-segmentation and object or hand pre-modeling. Through extensive evaluations on public hand–object interaction dataset, we demonstrate the state-of-the-art performance of our method.  相似文献   

12.
An approach to model-based dynamic object verification and identification using video is proposed. From image sequences containing the moving object, we compute its motion trajectory. Then we estimate its three-dimensional (3-D) pose at each time step. Pose estimation is formulated as a search problem, with the search space constrained by the motion trajectory information of the moving object and assumptions about the scene structure. A generalized Hausdorff (1962) metric, which is more robust to noise and allows a confidence interpretation, is suggested for the matching procedure used for pose estimation as well as the identification and verification problem. The pose evolution curves are used to assist in the acceptance or rejection of an object hypothesis. The models are acquired from real image sequences of the objects. Edge maps are extracted and used for matching. Results are presented for both infrared and optical sequences containing moving objects involved in complex motions  相似文献   

13.
Object segmentation of unknown objects with arbitrary shape in cluttered scenes is an ambitious goal in computer vision and became a great impulse with the introduction of cheap and powerful RGB-D sensors. We introduce a framework for segmenting RGB-D images where data is processed in a hierarchical fashion. After pre-clustering on pixel level parametric surface patches are estimated. Different relations between patch-pairs are calculated, which we derive from perceptual grouping principles, and support vector machine classification is employed to learn Perceptual Grouping. Finally, we show that object hypotheses generation with Graph-Cut finds a globally optimal solution and prevents wrong grouping. Our framework is able to segment objects, even if they are stacked or jumbled in cluttered scenes. We also tackle the problem of segmenting objects when they are partially occluded. The work is evaluated on publicly available object segmentation databases and also compared with state-of-the-art work of object segmentation.  相似文献   

14.
Three-dimensional human pose estimation (3D HPE) has broad application prospects in the fields of trajectory prediction, posture tracking and action analysis. However, the frequent self-occlusions and the substantial depth ambiguity in two-dimensional (2D) representations hinder the further improvement of accuracy. In this paper, we propose a novel video-based human body geometric aware network to mitigate the above problems. Our network can implicitly be aware of the geometric constraints of the human body by capturing spatial and temporal context information from 2D skeleton data. Specifically, a novel skeleton attention (SA) mechanism is proposed to model geometric context dependencies among different body joints, thereby improving the spatial feature representation ability of the network. To enhance the temporal consistency, a novel multilayer perceptron (MLP)-Mixer based structure is exploited to comprehensively learn temporal context information from input sequences. We conduct experiments on publicly available challenging datasets to evaluate the proposed approach. The results outperform the previous best approach by 0.5 mm in the Human3.6m dataset. It also demonstrates significant improvements in HumanEva-I dataset.  相似文献   

15.
传统VSLAM算法基于静态场景实现,其在室内动态场景下定位精度退化,三维稀疏点云地图也会出现动态特征点误匹配等问题.文中在ORB-SLAM2框架上进行改进,结合Mask R-CNN进行图像的语义分割,剔除位于动态物体上的动态特征点,优化了相机位姿,得到了静态的三维稀疏点云地图.在公开的TUM数据集上的实验结果表明,结合...  相似文献   

16.
Automatic generation of object recognition programs   总被引:1,自引:0,他引:1  
Issues and techniques are discussed to automatically compile object and sensor models into a visual recognition strategy for recognizing and locating an object in three-dimensional space from visual data. Automatic generation of recognition programs by compilation, in an attempt to automate this process, is described. An object model describes geometric and photometric properties of an object to be recognized. A sensor model specifies the sensor characteristics in predicting object appearances and variations of feature values. It is emphasized that the sensors, as well as objects, must be explicitly modeled to achieve the goal of automatic generation of reliable and efficient recognition programs. Actual creation of interpretation trees for two objects and their execution for recognition from a bin of parts are demonstrated  相似文献   

17.
Saliency prediction on RGB-D images is an underexplored and challenging task in computer vision. We propose a channel-wise attention and contextual interaction asymmetric network for RGB-D saliency prediction. In the proposed network, a common feature extractor provides cross-modal complementarity between the RGB image and corresponding depth map. In addition, we introduce a four-stream feature-interaction module that fully leverages multiscale and cross-modal features for extracting contextual information. Moreover, we propose a channel-wise attention module to highlight the feature representation of salient regions. Finally, we refine coarse maps through a corresponding refinement block. Experimental results show that the proposed network achieves a performance comparable with state-of-the-art saliency prediction methods on two representative datasets.  相似文献   

18.
Hand pose estimation is a challenging task owing to the high flexibility and serious self-occlusion of the hand. Therefore, an optimized convolutional pose machine (OCPM) was proposed in this study to estimate the hand pose accurately. Traditional CPMs have two components, a feature extraction module and an information processing module. First, the backbone network of the feature extraction module was replaced by Resnet-18 to reduce the number of network parameters. Furthermore, an attention module called the convolutional block attention module (CBAM) is embedded into the feature extraction module to enhance the information extraction. Then, the structure of the information processing module was adjusted through a residual connection in each stage that consist of a series of continuous convolutional operations, and requires a dense fusion between the output from all previous stages and the feature extraction module. The experimental results on two public datasets showed that the OCPM network achieved excellent performance.  相似文献   

19.
Bottom-up and top-down visual cues are two types of information that helps the visual saliency models. These salient cues can be from spatial distributions of the features (space-based saliency) or contextual/task-dependent features (object-based saliency). Saliency models generally incorporate salient cues either in bottom-up or top-down norm separately. In this work, we combine bottom-up and top-down cues from both space- and object-based salient features on RGB-D data. In addition, we also investigated the ability of various pre-trained convolutional neural networks for extracting top-down saliency on color images based on the object dependent feature activation. We demonstrate that combining salient features from color and dept through bottom-up and top-down methods gives significant improvement on the salient object detection with space-based and object-based salient cues. RGB-D saliency integration framework yields promising results compared with the several state-of-the-art-models.  相似文献   

20.
Human action recognition in videos is still an important while challenging task. Existing methods based on RGB image or optical flow are easily affected by clutters and ambiguous backgrounds. In this paper, we propose a novel Pose-Guided Inflated 3D ConvNet framework (PI3D) to address this issue. First, we design a spatial–temporal pose module, which provides essential clues for the Inflated 3D ConvNet (I3D). The pose module consists of pose estimation and pose-based action recognition. Second, for multi-person estimation task, the introduced pose estimation network can determine the action most relevant to the action category. Third, we propose a hierarchical pose-based network to learn the spatial–temporal features of human pose. Moreover, the pose-based network and I3D network are fused at the last convolutional layer without loss of performance. Finally, the experimental results on four data sets (HMDB-51, SYSU 3D, JHMDB and Sub-JHMDB) demonstrate that the proposed PI3D framework outperforms the existing methods on human action recognition. This work also shows that posture cues significantly improve the performance of I3D.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号