首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In the spirit of recent work on contextual recognition and estimation, we present a method for estimating the pose of human hands, employing information about the shape of the object in the hand. Despite the fact that most applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Occlusion of the hand from grasped objects does in fact often pose a severe challenge to the estimation of hand pose. In the presented method, object occlusion is not only compensated for, it contributes to the pose estimation in a contextual fashion; this without an explicit model of object shape. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (.. entries) of hand poses with and without grasped objects. The system that operates in real time, is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from monocular video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high-dim pose space. Experiments show the non-parametric method to outperform other state of the art regression methods, while operating at a significantly lower computational cost than comparable model-based hand tracking methods.  相似文献   

A candidate pose algorithm is described which computes object pose from an assumed correspondence between a pair of 2D image points and a pair of 3D model points. By computing many pose candidates actual object pose can usually be determined by detecting a cluster in the space of all candidates. Cluster space can receive candidate pose parameters from independent computations in different camera views. It is shown that use of of geometric constraint can be sufficient for reliable pose detection, but use of other knowledge, such as edge presence and type, can be easily added for increased efficiency.  相似文献   

Robust grasping under object pose uncertainty   总被引:1,自引:0,他引:1  
This paper presents a decision-theoretic approach to problems that require accurate placement of a robot relative to an object of known shape, such as grasping for assembly or tool use. The decision process is applied to a robot hand with tactile sensors, to localize the object on a table and ultimately achieve a target placement by selecting among a parameterized set of grasping and information-gathering trajectories. The process is demonstrated in simulation and on a real robot. This work has been previously presented in Hsiao et al. (Workshop on Algorithmic Foundations of Robotics (WAFR), 2008; Robotics Science and Systems (RSS), 2010) and Hsiao (Relatively robust grasping, Ph.D. thesis, Massachusetts Institute of Technology, 2009).  相似文献   

Primates are very good at recognizing objects independent of viewing angle or retinal position, and they outperform existing computer vision systems by far. But invariant object recognition is only one prerequisite for successful interaction with the environment. An animal also needs to assess an object's position and relative rotational angle. We propose here a model that is able to extract object identity, position, and rotation angles. We demonstrate the model behavior on complex three-dimensional objects under translation and rotation in depth on a homogeneous background. A similar model has previously been shown to extract hippocampal spatial codes from quasi-natural videos. The framework for mathematical analysis of this earlier application carries over to the scenario of invariant object recognition. Thus, the simulation results can be explained analytically even for the complex high-dimensional data we employed.  相似文献   

Nguyen  Thao  Gopalan  Nakul  Patel  Roma  Corsaro  Matt  Pavlick  Ellie  Tellex  Stefanie 《Autonomous Robots》2022,46(1):83-98
Autonomous Robots - Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying...  相似文献   

We describe an approach to category-level detection and viewpoint estimation for rigid 3D objects from single 2D images. In contrast to many existing methods, we directly integrate 3D reasoning with an appearance-based voting architecture. Our method relies on a nonparametric representation of a joint distribution of shape and appearance of the object class. Our voting method employs a novel parameterization of joint detection and viewpoint hypothesis space, allowing efficient accumulation of evidence. We combine this with a re-scoring and refinement mechanism, using an ensemble of view-specific support vector machines. We evaluate the performance of our approach in detection and pose estimation of cars on a number of benchmark datasets. Finally we introduce the “Weizmann Cars ViewPoint” (WCVP) dataset, a benchmark for evaluating continuous pose estimation.  相似文献   

We present an efficient method for estimating the pose of a three-dimensional object. Its implementation is embedded in a computer vision system which is motivated by and based on cognitive principles concerning the visual perception of three-dimensional objects. Viewpoint-invariant object recognition has been subject to controversial discussions for a long time. An important point of discussion is the nature of internal object representations. Behavioral studies with primates, which are summarized in this article, support the model of view-based object representations. We designed our computer vision system according to these findings and demonstrate that very precise estimations of the poses of real-world objects are possible even if only a small number of sample views of an object is available. The system can be used for a variety of applications.  相似文献   

目的 哈希是大规模图像检索的有效方法。为提高检索精度,哈希码应保留语义信息。图像之间越相似,其哈希码也应越接近。现有方法首先提取描述图像整体的特征,然后生成哈希码。这种方法不能精确地描述图像包含的多个目标,限制了多标签图像检索的精度。为此提出一种基于卷积神经网络和目标提取的哈希生成方法。方法 首先提取图像中可能包含目标的一系列区域,然后用深度卷积神经网络提取每个区域的特征并进行融合,通过生成一组特征来刻画图像中的每个目标,最后再产生整幅图像的哈希码。采用Triplet Loss的训练方法,使得哈希码尽可能保留语义信息。结果 在VOC2012、Flickr25K和NUSWIDE数据集上进行多标签图像检索。在NDCG(normalized discounted cumulative gain)性能指标上,当返回图像数量为 1 000时,对于VOC2012,本文方法相对于DSRH(deep semantic ranking hashing)方法提高24个百分点,相对于ITQ-CCA(iterative quantization-canonical correlation analysis)方法能提高36个百分点;对于Flickr25,本文方法比DSRH方法能提高2个左右的百分点;对于NUSWIDE,本文方法相对于DSRH方法能提高4个左右的百分点。对于平均检索准确度,本文方法在NUSWIDE和Flickr25上能提高25个百分点。根据多项评价指标可以看出,本文方法能以更细粒度来精确地描述图像,显著提高了多标签图像检索的性能。结论 本文新的特征学习模型,对图像进行细粒度特征编码是一种可行的方法,能够有效提高数据集的检索性能。  相似文献   

Pose tracking is an important task in Augmented Reality (AR), interactive systems, and robotic systems. The frame-by-frame pose tracking that is effective in many cases still faces challenges in complex environments such as occlusions, illumination changes and flipping. In this paper, based on the optimization model offered by Ye et al. J Vis Commun Image Represent 44:72–81 (2017), three improvements are further proposed. First, a feature adjustment strategy based on a group of neighbors is offered to alleviate a sharp reduction of features. Then, when the features are no longer well representing the scene of interest, a score model based on a weighted histogram for result evaluations is presented to realize an adaptive interval. Besides, a forward-backward algorithm is provided to improve the accuracy by replacing the detection method with the tracking method. Experimental results manifest the effectiveness of the proposed algorithms.  相似文献   

Model-based object pose in 25 lines of code   总被引:17,自引:3,他引:17  
In this paper, we describe a method for finding the pose of an object from a single image. We assume that we can detect and match in the image four or more noncoplanar feature points of the object, and that we know their relative geometry on the object. The method combines two algorithms; the first algorithm,POS (Pose from Orthography and Scaling) approximates the perspective projection with a scaled orthographic projection and finds the rotation matrix and the translation vector of the object by solving a linear system; the second algorithm,POSIT (POS with ITerations), uses in its iteration loop the approximate pose found by POS in order to compute better scaled orthographic projections of the feature points, then applies POS to these projections instead of the original image projections. POSIT converges to accurate pose measurements in a few iterations. POSIT can be used with many feature points at once for added insensitivity to measurement errors and image noise. Compared to classic approaches making use of Newton's method, POSIT does not require starting from an initial guess, and computes the pose using an order of magnitude fewer floating point operations; it may therefore be a useful alternative for real-time operation. When speed is not an issue, POSIT can be written in 25 lines or less in Mathematica; the code is provided in an Appendix.  相似文献   

Although using domain specific knowledge sources for information retrieval yields more accurate results compared to pure keyword-based methods, more improvements can be achieved by considering both relations between concepts in an ontology and also their statistical dependencies over the corpus. In this paper, an innovative approach named concept-based pseudo-relevance feedback is introduced for improving accuracy of biomedical retrieval systems. Proposed method uses a hybrid retrieval algorithm for discovering relevancy between queries and documents which is based on a combination of keyword- and concept-based approaches. It also uses a pseudo-relevance feedback mechanism for expanding initial queries with auxiliary biomedical concepts extracted from top-ranked results of hybrid information retrieval. Using concept-based similarities makes it possible for the system to detect related documents to users’ queries, which are semantically close to each other while not necessarily sharing common keywords. In addition, expanding initial queries with concepts introduced by pseudo-relevance feedback captures those relations between queries and documents, which rely on statistical dependencies between concepts they contain. As a matter of fact, these relations may remain undetected, examining merely existing links between concepts in an external knowledge source. Proposed approach is evaluated using OHSUMED test collection and standard evaluation methods from text retrieval conference (TREC). Experimental results on MEDLINE documents (in OHSUMED collection) show 21% improvement over keyword-based approach in terms of mean average precision, which is a noticeable gain.  相似文献   

This paper deals with the problem of stable grasping under pose uncertainty. Our method utilizes tactile sensing data to estimate grasp stability and make necessary hand adjustments after an initial grasp is established. We first discuss a learning approach to estimating grasp stability based on tactile sensing data. This estimator can be used as an indicator to the stability of the current grasp during a grasping procedure. We then present a tactile experience based hand adjustment algorithm to synthesize a hand adjustment and optimize the hand pose to achieve a stable grasp. Experiments show that our method improves the grasping performance under pose uncertainty.  相似文献   

从图像中获取目标物体的6D位姿信息在机器人操作和虚拟现实等领域有着广泛的应用,然而,基于深度学习的位姿估计方法在训练模型时通常需要大量的训练数据集来提高模型的泛化能力,一般的数据采集方法存在收集成本高同时缺乏3D空间位置信息等问题.鉴于此,提出一种低质量渲染图像的目标物体6D姿态估计网络框架.该网络中,特征提取部分以单张RGB图像作为输入,用残差网络提取输入图像特征;位姿估计部分的目标物体分类流用于预测目标物体的类别,姿态回归流在3D空间中回归目标物体的旋转角度和平移矢量.另外,采用域随机化方法以低收集成本方式构建大规模低质量渲染、带有物体3D空间位置信息的图像数据集Pose6DDR.在所建立的Pose6DDR数据集和LineMod公共数据集上的测试结果表明了所提出位姿估计方法的优越性以及大规模数据集域随机化生成数据方法的有效性.  相似文献   

RGB-D sensors have become in recent years a product of easy access to general users. They provide both a color image and a depth image of the scene and, besides being used for object modeling, they can also offer important cues for object detection and tracking in real time. In this context, the work presented in this paper investigates the use of consumer RGB-D sensors for object detection and pose estimation from natural features. Two methods based on depth-assisted rectification are proposed, which transform features extracted from the color image to a canonical view using depth data in order to obtain a representation invariant to rotation, scale and perspective distortions. While one method is suitable for textured objects, either planar or non-planar, the other method focuses on texture-less planar objects. Qualitative and quantitative evaluations of the proposed methods are performed, showing that they can obtain better results than some existing methods for object detection and pose estimation, especially when dealing with oblique poses.  相似文献   

《Image and vision computing》2002,20(5-6):341-348
The viewing hemisphere of a three-dimensional object can be partitioned into areas of similar views, which provide pose robustness. We compare two procedures for measuring the robustness of views to pose variation: tracking of object features, i.e. Gabor wavelet responses, by utilizing the continuity of successive views and matching of features in different views, which are assumed to be independent. Both procedures proved to be appropriate to detect canonical views. We found no difference concerning the size of the view bubbles, but tracking provides more precise correspondences than matching. Tracking is more appropriate for recognizing changes of features, whereas matching is more suitable if features of the same appearance are to be found.  相似文献   

针对已有的基于形状的图像检索中目标形状描述方法的不足对其进行改进。首先对目标图像进行一系列预处理,得到图像的外部轮廓,利用改进的霍夫变换提取目标轮廓的线性特征;然后引入成对几何特征即有向相对角和有向相对位置来描述图像的形状;最后利用直方图相交算法衡量图像特征间的相似度。实验证明,利用本文改进的方法所描述的形状属性来检索数据库中的图像具有较高的效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号