期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Interoperable vision component for object detection and 3D pose estimation for modularized robot control 总被引：1，自引：0，他引：1

Yasushi Mae Jaeil Choi Hideyasu Takahashi Kenichi Ohara Tomohito Takubo Tatsuo Arai 《Mechatronics》2011,21(6):983-992

Finding objects and tracking their poses are essential functions for service robots, in order to manipulate objects and interact with humans. We present novel algorithms for local feature matching for object detection, and 3D pose estimation. Our feature matching algorithm takes advantage of local geometric consistency for better performance, and the new 3D pose estimation algorithm solves the pose in a closed-form using homography, followed by a non-linear optimization step for stability. Advantages of our approach include better performance, minimal prior knowledge for the target pattern, and easy implementation and portability as a modularized software component. We have implemented our approach along with both CPU and GPU-based feature extraction, and built an interoperable component that can be used in any Robot Technology (RT)-based control system. Experiment shows that our approach produces very robust results for the estimated 3D pose, and maintain very low false positive rate. It is also fast enough to be used in on-line applications. We integrated our vision component in an autonomous robot system with a search-and-grasp task, and tested it with several objects that are found in ordinary domestic environment. We present the details of our approach, the design of our modular component design, and the results of the experiments in this paper. 相似文献

2.

基于视点特征直方图的激光点云模型的位姿估计 总被引：2，自引：2，他引：0

张彪曹其新焦瑶《光电子．激光》2013,(7):1357-1362

提出一种基于视点特征直方图的点云模型位姿估计算法。首先在目标物体周围采集三维点云,拼接后获得物体的完整点云模型;然后对点云模型计算其视点特征直方图, 构建特征数据库;对待估计点云同样计算其特征直方图,使用KNN算法在数据库中搜索与之最接近的位姿作为初始位姿估计值;最后使用迭代最近点(ICP)算法将待估计点云精确配准到模型点云,从而获得坐标系之间的相对位姿。实验表明,这种方法对于物体位姿识别有很强的稳健性,能很好实现目标物体的三维位姿计算。相似文献

3.

Calibrated depth and color cameras for accurate 3D interaction in a stereoscopic augmented reality environment

《Journal of Visual Communication and Image Representation》2014,25(1):227-237

A Human–machine interaction system requires precise information about the user’s body position, in order to allow a natural 3D interaction in stereoscopic augmented reality environments, where real and virtual objects should coherently coexist. The diffusion of RGB-D sensors seems to provide an effective solution to such a problem. Nevertheless, the interaction with stereoscopic 3D environments, in particular in peripersonal space, requires a higher degree of precision. To this end, a reliable calibration of such sensors and an accurate estimation of the relative pose of different RGB-D and visualization devices are crucial. Here, robust and straightforward procedures to calibrate a RGB-D camera, to improve the accuracy of its 3D measurements, and to co-register different calibrated devices are proposed. Quantitative measures validate the proposed approach. Moreover, calibrated devices have been used in an augmented reality system, based on a dynamic stereoscopic rendering technique that needs accurate information about the observer’s eyes position. 相似文献

4.

3D reconstruction with auto-selected keyframes based on depth completion correction and pose fusion

《Journal of Visual Communication and Image Representation》2021

Dense 3D reconstruction is required for robots to safely navigate or perform advanced tasks. The accurate depth information of the image and its pose are the basis of 3D reconstruction. The resolution of depth maps obtained by LIDAR and RGB-D cameras is limited, and traditional pose calculation methods are not accurate enough. In addition, if each image is used for dense 3D reconstruction, the dense point clouds will increase the amount of calculation. To address these issues, we propose a 3D reconstruction system. Specifically, we propose a depth network of contour and gradient attention, which is used to complete and correct depth maps to obtain high-resolution and high-quality depth maps. Then, we propose a method of fusion of traditional algorithms and deep learning for pose estimation to obtain accurate localization results. Finally, we adopt the method of autonomous selection of keyframes to reduce the number of keyframes, the surfel-based geometric reconstruction is performed to reconstruct the dense 3D environment. On the TUM RGB-D, ICL-NIUM, and KITTI datasets, our method significantly improves the quality of the depth maps, the localization results, and the effect of 3D reconstruction. At the same time, we have also accelerated the speed of 3D reconstruction. 相似文献

5.

3D Shape Inferencing and Modeling for Video Retrieval

《Journal of Visual Communication and Image Representation》2000,11(1):41-57

We present a geometry-based indexing approach for the retrieval of video databases. It consists of two modules: 3D object shape inferencing from video data and geometric modeling from the reconstructed shape structure. A motion-based segmentation algorithm employing feature block tracking and principal component split is used for multi-moving-object motion classification and segmentation. After segmentation, feature blocks from each individual object are used to reconstruct its motion and structure through a factorization method. The estimated shape structure and motion parameters are used to generate the implicit polynomial model for the object. The video data is retrieved using the geometric structure of objects and their spatial relationship. We generalize the 2D string to 3D to compactly encode the spatial relationship of objects. 相似文献

6.

复杂场景点云数据的6D位姿估计深度学习网络

陈海永李龙腾陈鹏孟蕊《电子与信息学报》2022,44(5):1591-1601

针对工业上常见的弱纹理、散乱摆放复杂场景下点云目标机器人抓取问题,该文提出一种6D位姿估计深度学习网络。首先,模拟复杂场景下点云目标多姿态随机摆放的物理环境,生成带真实标签的数据集;进而,设计了6D位姿估计深度学习网络模型,提出多尺度点云分割网络(MPCS-Net),直接在完整几何点云上进行点云实例分割,解决了对RGB信息和点云分割预处理的依赖问题。然后,提出多层特征姿态估计网(MFPE-Net),有效地解决了对称物体的位姿估计问题。最后,实验结果和分析证实了,相比于传统的点云配准方法和现有的切分点云的深度学习位姿估计方法,所提方法取得了更高的准确率和更稳定性能,并且在估计对称物体位姿时有较强的鲁棒性。相似文献

7.

Fine registration of 3D point clouds fusing structural and photometric information using an RGB-D camera

《Journal of Visual Communication and Image Representation》2015

We address the problem of accurate and efficient alignment of 3D point clouds captured by an RGB-D (Kinect-style) camera from different viewpoints. While the Iterative Closest Point (ICP) algorithm has been widely used for dense point cloud matching, it is limited in its ability to produce accurate results in challenging scenarios involving objects that lack structural features and have significant camera view changes. In this paper, we introduce a new cost function with dynamic weights for the ICP algorithm to tackle this problem. It balances the significance of structural and photometric features with dynamically adjusted weights to improve the error minimization process. Our algorithm also includes a novel outlier rejection method, which adopts adaptive thresholding at each ICP iteration, using both the structural information of the object and the spatial distances of sparse SIFT feature pairs. The effectiveness of our proposed approach is demonstrated by experimental results from various challenging scenarios. We obtained superior registration accuracy than related previous methods, at the same time maintaining low computational requirements. 相似文献

8.

多特征点拓扑确定位姿测量算法研究

下载免费PDF全文

王鹏周权通孙长库《红外与激光工程》2017,46(5):517001-0517001(9)

为解决单目视觉位姿测量时,由目标特征点较多导致图像点与目标点拓扑关系未知的问题,提出了一种多特征点拓扑确定位姿测量算法。较多特征点可在目标进行大角度运动时保证足够的特征点进行位姿解算,与较少特征点相比提高测量精度。该算法将拓扑确定的过程和位姿求解的迭代过程进行嵌套,同时进行拓扑确定和位姿计算。位姿计算的迭代过程基于平行透视投影模型,不需要目标重心投影点坐标作为迭代初始值。拓扑确定的过程转化为分配问题的求解过程。每次位姿迭代的过程中进行一次拓扑确定,拓扑确定的结果可以计算更优的位姿估计。通过多位姿测量实验和精度对比实验结果证明:该算法适合大范围、高精度的位姿测量,在-120~120范围内,位姿测量均方根误差为0。272。相似文献

9.

ASPP-DF-PVNet: Atrous Spatial Pyramid Pooling and Distance-Filtered PVNet for occlusion resistant 6D object pose estimation

《Signal Processing: Image Communication》2021

Detecting objects and estimating their 6D poses from a single RGB image is quite challenging under severe occlusions. Recently, vector-field based methods have shown certain robustness to occlusion and truncation. Based on the vector-field representation, applying voting strategy to localize 2D keypoints can further reduce the influence of outliers. To improve the effectiveness of vector-field based deep network and voting scheme, we propose Atrous Spatial Pyramid Pooling and Distance-Filtered PVNet (ASPP-DF-PVNet), an occlusion resistant framework for 6D object pose estimation. ASPP-DF-PVNet utilizes the effective Atrous Spatial Pyramid Pooling (ASPP) module of Deeplabv3 to capture multi-scale features and encode global context information, which improves the accuracy of segmentation and vector-field prediction comparing to the original PVNet, especially under severe occlusions. Considering that the distances between pixels and keypoint hypotheses will affect the voting deviations, we then present a distance-filtered voting scheme which takes the voting distances into consideration to filter out the votes with large deviations. Experiments demonstrate that our method outperforms the state-of-the-art methods by a considerable margin without using pose refinement, and obtains competitive results against the methods with refinement on the LINEMOD and Occlusion LINEMOD datasets. 相似文献

10.

Hand pose estimation in object-interaction based on deep learning for virtual reality applications

《Journal of Visual Communication and Image Representation》2020

Hand Pose Estimation aims to predict the position of joints on a hand from an image, and it has become popular because of the emergence of VR/AR/MR technology. Nevertheless, an issue surfaces when trying to achieve this goal, since a hand tends to cause self-occlusion or external occlusion easily as it interacts with external objects. As a result, there have been many projects dedicated to this field for a better solution of this problem. This paper develops a system that accurately estimates a hand pose in 3D space using depth images for VR applications. We propose a data-driven approach of training a deep learning model for hand pose estimation with object interaction. In the convolutional neural network (CNN) training procedure, we design a skeleton-difference loss function, which effectively can learn the physical constraints of a hand. Also, we propose an object-manipulating loss function, which considers knowledge of the hand-object interaction, to enhance performance.In the experiments we have conducted for hand pose estimation under different conditions, the results validate the robustness and the performance of our system and show that our method is able to predict the joints more accurately in challenging environmental settings. Such appealing results may be attributed to the consideration of the physical joint relationship as well as object information, which in turn can be applied to future VR/AR/MR systems for more natural experience. 相似文献

11.

Multi-Level Fusion Net for hand pose estimation in hand-object interaction

《Signal Processing: Image Communication》2021

This work is about solving a challenging problem of estimating the full 3D hand pose when a hand interacts with an unknown object. Compared to isolated single hand pose estimation, occlusion and interference induced by the manipulated object and the clutter background bring more difficulties for this task. Our proposed Multi-Level Fusion Net focuses on extracting more effective features to overcome these disadvantages by multi-level fusion design with a new end-to-end Convolutional Neural Network (CNN) framework. It takes cropped RGBD data from a single RGBD camera at free viewpoint as input without requiring additional hand–object pre-segmentation and object or hand pre-modeling. Through extensive evaluations on public hand–object interaction dataset, we demonstrate the state-of-the-art performance of our method. 相似文献

12.

Model-based temporal object verification using video

Baoxin Li Chellappa R. Qinfen Zheng Der S.Z. 《IEEE transactions on image processing》2001,10(6):897-908

An approach to model-based dynamic object verification and identification using video is proposed. From image sequences containing the moving object, we compute its motion trajectory. Then we estimate its three-dimensional (3-D) pose at each time step. Pose estimation is formulated as a search problem, with the search space constrained by the motion trajectory information of the moving object and assumptions about the scene structure. A generalized Hausdorff (1962) metric, which is more robust to noise and allows a confidence interpretation, is suggested for the matching procedure used for pose estimation as well as the identification and verification problem. The pose evolution curves are used to assist in the acceptance or rejection of an object hypothesis. The models are acquired from real image sequences of the objects. Edge maps are extracted and used for matching. Results are presented for both infrared and optical sequences containing moving objects involved in complex motions 相似文献

13.

Learning of perceptual grouping for object segmentation on RGB-D data

《Journal of Visual Communication and Image Representation》2014,25(1):64-73

Object segmentation of unknown objects with arbitrary shape in cluttered scenes is an ambitious goal in computer vision and became a great impulse with the introduction of cheap and powerful RGB-D sensors. We introduce a framework for segmenting RGB-D images where data is processed in a hierarchical fashion. After pre-clustering on pixel level parametric surface patches are estimated. Different relations between patch-pairs are calculated, which we derive from perceptual grouping principles, and support vector machine classification is employed to learn Perceptual Grouping. Finally, we show that object hypotheses generation with Graph-Cut finds a globally optimal solution and prevents wrong grouping. Our framework is able to segment objects, even if they are stacked or jumbled in cluttered scenes. We also tackle the problem of segmenting objects when they are partially occluded. The work is evaluated on publicly available object segmentation databases and also compared with state-of-the-art work of object segmentation. 相似文献

14.

Video-based body geometric aware network for 3D human pose estimation

LI Chaonan LIU Sheng YAO Lu ZOU Siyu 《光电子快报》2022,18(5):313-320

Three-dimensional human pose estimation (3D HPE) has broad application prospects in the fields of trajectory prediction, posture tracking and action analysis. However, the frequent self-occlusions and the substantial depth ambiguity in two-dimensional (2D) representations hinder the further improvement of accuracy. In this paper, we propose a novel video-based human body geometric aware network to mitigate the above problems. Our network can implicitly be aware of the geometric constraints of the human body by capturing spatial and temporal context information from 2D skeleton data. Specifically, a novel skeleton attention (SA) mechanism is proposed to model geometric context dependencies among different body joints, thereby improving the spatial feature representation ability of the network. To enhance the temporal consistency, a novel multilayer perceptron (MLP)-Mixer based structure is exploited to comprehensively learn temporal context information from input sequences. We conduct experiments on publicly available challenging datasets to evaluate the proposed approach. The results outperform the previous best approach by 0.5 mm in the Human3.6m dataset. It also demonstrates significant improvements in HumanEva-I dataset. 相似文献

15.

面向室内动态场景的VSLAM

伞红军王汪林陈久朋谢飞亚徐洋洋陈佳《电子科技》2022,35(4):14-19

传统VSLAM算法基于静态场景实现,其在室内动态场景下定位精度退化,三维稀疏点云地图也会出现动态特征点误匹配等问题.文中在ORB-SLAM2框架上进行改进,结合Mask R-CNN进行图像的语义分割,剔除位于动态物体上的动态特征点,优化了相机位姿,得到了静态的三维稀疏点云地图.在公开的TUM数据集上的实验结果表明,结合... 相似文献

16.

Automatic generation of object recognition programs 总被引：1，自引：0，他引：1

Ikeuchi K. Kanade T. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1988,76(8):1016-1035

Issues and techniques are discussed to automatically compile object and sensor models into a visual recognition strategy for recognizing and locating an object in three-dimensional space from visual data. Automatic generation of recognition programs by compilation, in an attempt to automate this process, is described. An object model describes geometric and photometric properties of an object to be recognized. A sensor model specifies the sensor characteristics in predicting object appearances and variations of feature values. It is emphasized that the sensors, as well as objects, must be explicitly modeled to achieve the goal of automatic generation of reliable and efficient recognition programs. Actual creation of interpretation trees for two objects and their execution for recognition from a bin of parts are demonstrated 相似文献

17.

Attention-based contextual interaction asymmetric network for RGB-D saliency prediction

《Journal of Visual Communication and Image Representation》2021

Saliency prediction on RGB-D images is an underexplored and challenging task in computer vision. We propose a channel-wise attention and contextual interaction asymmetric network for RGB-D saliency prediction. In the proposed network, a common feature extractor provides cross-modal complementarity between the RGB image and corresponding depth map. In addition, we introduce a four-stream feature-interaction module that fully leverages multiscale and cross-modal features for extracting contextual information. Moreover, we propose a channel-wise attention module to highlight the feature representation of salient regions. Finally, we refine coarse maps through a corresponding refinement block. Experimental results show that the proposed network achieves a performance comparable with state-of-the-art saliency prediction methods on two representative datasets. 相似文献

18.

Optimized convolutional pose machine for 2D hand pose estimation

《Journal of Visual Communication and Image Representation》2022

Hand pose estimation is a challenging task owing to the high flexibility and serious self-occlusion of the hand. Therefore, an optimized convolutional pose machine (OCPM) was proposed in this study to estimate the hand pose accurately. Traditional CPMs have two components, a feature extraction module and an information processing module. First, the backbone network of the feature extraction module was replaced by Resnet-18 to reduce the number of network parameters. Furthermore, an attention module called the convolutional block attention module (CBAM) is embedded into the feature extraction module to enhance the information extraction. Then, the structure of the information processing module was adjusted through a residual connection in each stage that consist of a series of continuous convolutional operations, and requires a dense fusion between the output from all previous stages and the feature extraction module. The experimental results on two public datasets showed that the OCPM network achieved excellent performance. 相似文献

19.

An integration of bottom-up and top-down salient cues on RGB-D data: saliency from objectness versus non-objectness

Nevrez Imamoglu Wataru Shimoda Chi Zhang Yuming Fang Asako Kanezaki Keiji Yanai Yoshifumi Nishida 《Signal, Image and Video Processing》2018,12(2):307-314

Bottom-up and top-down visual cues are two types of information that helps the visual saliency models. These salient cues can be from spatial distributions of the features (space-based saliency) or contextual/task-dependent features (object-based saliency). Saliency models generally incorporate salient cues either in bottom-up or top-down norm separately. In this work, we combine bottom-up and top-down cues from both space- and object-based salient features on RGB-D data. In addition, we also investigated the ability of various pre-trained convolutional neural networks for extracting top-down saliency on color images based on the object dependent feature activation. We demonstrate that combining salient features from color and dept through bottom-up and top-down methods gives significant improvement on the salient object detection with space-based and object-based salient cues. RGB-D saliency integration framework yields promising results compared with the several state-of-the-art-models. 相似文献

20.

Pose-Guided Inflated 3D ConvNet for action recognition in videos

《Signal Processing: Image Communication》2021

Human action recognition in videos is still an important while challenging task. Existing methods based on RGB image or optical flow are easily affected by clutters and ambiguous backgrounds. In this paper, we propose a novel Pose-Guided Inflated 3D ConvNet framework (PI3D) to address this issue. First, we design a spatial–temporal pose module, which provides essential clues for the Inflated 3D ConvNet (I3D). The pose module consists of pose estimation and pose-based action recognition. Second, for multi-person estimation task, the introduced pose estimation network can determine the action most relevant to the action category. Third, we propose a hierarchical pose-based network to learn the spatial–temporal features of human pose. Moreover, the pose-based network and I3D network are fused at the last convolutional layer without loss of performance. Finally, the experimental results on four data sets (HMDB-51, SYSU 3D, JHMDB and Sub-JHMDB) demonstrate that the proposed PI3D framework outperforms the existing methods on human action recognition. This work also shows that posture cues significantly improve the performance of I3D. 相似文献