首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Robust Object Detection with Interleaved Categorization and Segmentation   总被引:5,自引:0,他引:5  
This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.  相似文献   

2.
毛凌  解梅 《计算机应用研究》2013,30(11):3514-3517
图像语义分割方法大多基于点对条件随机场模型, 不能定位到单个目标, 并且难以利用全局形状特征, 造成误识。针对这些问题, 提出一种新的高阶条件随机场模型, 将基于全局形状特征的目标检测结果和点对条件随机场模型统一在一个概率模型框架中, 同时完成图像分割、目标检测与识别的任务。利用目标检测器和前背景分割算法获取图像中目标区域, 在目标区域上定义新的高阶能量项。新的高阶条件随机场模型就是高阶能量项和点对条件随机场模型的加权混合模型, 其最优解即为图像语义分割结果。在MSRC-21类数据库上进行的实验验证了该模型能够显著提升图像语义分割性能, 并定位到单个目标。  相似文献   

3.
Detecting objects, estimating their pose, and recovering their 3D shape are critical problems in many vision and robotics applications. This paper addresses the above needs using a two stages approach. In the first stage, we propose a new method called DEHV – Depth-Encoded Hough Voting. DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Inspired by the Hough voting scheme introduced in [1], DEHV incorporates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. Once the depth map is given, a full reconstruction is achieved in a second (3D modelling) stage, where modified or state-of-the-art 3D shape and texture completion techniques are used to recover the complete 3D model. Extensive quantitative and qualitative experimental analysis on existing datasets [2], [3], [4] and a newly proposed 3D table-top object category dataset shows that our DEHV scheme obtains competitive detection and pose estimation results. Finally, the quality of 3D modelling in terms of both shape completion and texture completion is evaluated on a 3D modelling dataset containing both in-door and out-door object categories. We demonstrate that our overall algorithm can obtain convincing 3D shape reconstruction from just one single uncalibrated image.  相似文献   

4.
This paper introduces a novel interactive framework for segmenting images using probabilistic hypergraphs which model the spatial and appearance relations among image pixels. The probabilistic hypergraph provides us a means to pose image segmentation as a machine learning problem. In particular, we assume that a small set of pixels, which are referred to as seed pixels, are labeled as the object and background. The seed pixels are used to estimate the labels of the unlabeled pixels by learning on a hypergraph via minimizing a quadratic smoothness term formed by a hypergraph Laplacian matrix subject to the known label constraints. We derive a natural probabilistic interpretation of this smoothness term, and provide a detailed discussion on the relation of our method to other hypergraph and graph based learning methods. We also present a front-to-end image segmentation system based on the proposed method, which is shown to achieve promising quantitative and qualitative results on the commonly used GrabCut dataset.  相似文献   

5.
This paper presents a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other state of the art methods which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Our method works by optimizing a cost function based on a Conditional Random Field (CRF). This has the advantage that all information in the image (edges, background and foreground appearances), as well as the prior information on the shape and pose of the subject can be combined and used in a Bayesian framework. Optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any rigid, deformable or articulated object.  相似文献   

6.
为了实现复杂自然场景中多类目标的识别与分割,利用条件随机场(CRF)对目标特征进行建模,并在此基础上运用过分割算法将图片分为有限个连续区域,提出一种新的基于区域的CRF模型,即R-CRF模型,并采用Joint-boost算法对标注样本进行训练,研究基于主题的R-CRF模型在多类目标识别与分割中的应用。MSRC-21类数据库的实验结果表明,该算法在多类目标识别与分割中取得的结果优于国内外其他算法,尤其对于其他算法中正确率很低的形状多变而样本少的高结构物体的识别和分割取得了很好的结果。  相似文献   

7.
Probabilistic graphical models have had a tremendous impact in machine learning and approaches based on energy function minimization via techniques such as graph cuts are now widely used in image segmentation. However, the free parameters in energy function-based segmentation techniques are often set by hand or using heuristic techniques. In this paper, we explore parameter learning in detail. We show how probabilistic graphical models can be used for segmentation problems to illustrate Markov random fields (MRFs), their discriminative counterparts conditional random fields (CRFs) as well as kernel CRFs. We discuss the relationships between energy function formulations, MRFs, CRFs, hybrids based on graphical models and their relationships to key techniques for inference and learning. We then explore a series of novel 3D graphical models and present a series of detailed experiments comparing and contrasting different approaches for the complete volumetric segmentation of multiple organs within computed tomography imagery of the abdominal region. Further, we show how these modeling techniques can be combined with state of the art image features based on histograms of oriented gradients to increase segmentation performance. We explore a wide variety of modeling choices, discuss the importance and relationships between inference and learning techniques and present experiments using different levels of user interaction. We go on to explore a novel approach to the challenging and important problem of adrenal gland segmentation. We present a 3D CRF formulation and compare with a novel 3D sparse kernel CRF approach we call a relevance vector random field. The method yields state of the art performance and avoids the need to discretize or cluster input features. We believe our work is the first to provide quantitative comparisons between traditional MRFs with edge-modulated interaction potentials and CRFs for multi-organ abdominal segmentation and the first to explore the 3D adrenal gland segmentation problem. Finally, along with this paper we provide the labeled data used for our experiments to the community.  相似文献   

8.
In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach.  相似文献   

9.
10.
We propose a general approach using Laplacian Eigenmaps and a graphical model of the human body to segment 3D voxel data of humans into different articulated chains. In the bottom-up stage, the voxels are transformed into a high-dimensional (6D or less) Laplacian Eigenspace (LE) of the voxel neighborhood graph. We show that LE is effective at mapping voxels on long articulated chains to nodes on smooth 1D curves that can be easily discriminated, and prove these properties using representative graphs. We fit 1D splines to voxels belonging to different articulated chains such as the limbs, head and trunk, and determine the boundary between splines using the spline fitting error. A top-down probabilistic approach is then used to register the segmented chains, utilizing their mutual connectivity and individual properties. Our approach enables us to deal with complex poses such as those where the limbs form loops. We use the segmentation results to automatically estimate the human body models. While we use human subjects in our experiments, the method is fairly general and can be applied to voxel-based segmentation of any articulated object composed of long chains. We present results on real and synthetic data that illustrate the usefulness of this approach.  相似文献   

11.
We introduce a segmentation-based detection and top-down figure-ground delineation algorithm. Unlike common methods which use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation. Our algorithm receives as input an image, along with its bottom-up hierarchical segmentation. The shape of each segment is then described both by its significant boundary sections and by regional, dense orientation information derived from the segment’s shape using the Poisson equation. Our method then examines multiple, overlapping segmentation hypotheses, using their shape and color, in an attempt to find a “coherent whole,” i.e., a collection of segments that consistently vote for an object at a single location in the image. Once an object is detected, we propose a novel pixel-level top-down figure-ground segmentation by “competitive coverage” process to accurately delineate the boundaries of the object. In this process, given a particular detection hypothesis, we let the voting segments compete for interpreting (covering) each of the semantic parts of an object. Incorporating competition in the process allows us to resolve ambiguities that arise when two different regions are matched to the same object part and to discard nearby false regions that participated in the voting process. We provide quantitative and qualitative experimental results on challenging datasets. These experiments demonstrate that our method can accurately detect and segment objects with complex shapes, obtaining results comparable to those of existing state of the art methods. Moreover, our method allows us to simultaneously detect multiple instances of class objects in images and to cope with challenging types of occlusions such as occlusions by a bar of varying size or by another object of the same class, that are difficult to handle with other existing class-specific top-down segmentation methods.  相似文献   

12.
We describe a top-down object detection and segmentation approach that uses a skeleton-based shape model and that works directly on real images. The approach is based on three components. First, we propose a fragment-based generative model for shape that is based on the shock graph and has minimal dependency among its shape fragments. The model is capable of generating a wide variation of shapes as instances of a given object category. Second, we develop a progressive selection mechanism to search among the generated shapes for the category instances that are present in the image. The search begins with a large pool of candidates identified by a dynamic programming (DP) algorithm and progressively reduces it in size by applying series of criteria, namely, local minimum criterion, extent of shape overlap, and thresholding of the objective function to select the final object candidates. Third, we propose the Partitioned Chamfer Matching (PCM) measure to capture the support of image edges for a hypothesized shape. This measure overcomes the shortcomings of the Oriented Chamfer Matching and is robust against spurious edges, missing edges, and accidental alignment between the image edges and the shape boundary contour. We have evaluated our approach on the ETHZ dataset and found it to perform well in both object detection and object segmentation tasks.  相似文献   

13.
In this article we present the integration of 3-D shape knowledge into a variational model for level set based image segmentation and contour based 3-D pose tracking. Given the surface model of an object that is visible in the image of one or multiple cameras calibrated to the same world coordinate system, the object contour extracted by the segmentation method is applied to estimate the 3-D pose parameters of the object. Vice-versa, the surface model projected to the image plane helps in a top-down manner to improve the extraction of the contour. While common alternative segmentation approaches, which integrate 2-D shape knowledge, face the problem that an object can look very differently from various viewpoints, a 3-D free form model ensures that for each view the model can fit the data in the image very well. Moreover, one additionally solves the problem of determining the object’s pose in 3-D space. The performance is demonstrated by numerous experiments with a monocular and a stereo camera system.  相似文献   

14.
15.
We present a method to learn probabilistic object models (POMs) with minimal supervision, which exploit different visual cues and perform tasks such as classification, segmentation, and recognition. We formulate this as a structure induction and learning task and our strategy is to learn and combine elementary POMs that make use of complementary image cues. We describe a novel structure induction procedure, which uses knowledge propagation to enable POMs to provide information to other POMs and “teach them” (which greatly reduces the amount of supervision required for training and speeds up the inference). In particular, we learn a POM-IP defined on Interest Points using weak supervision [1], [2] and use this to train a POM-mask, defined on regional features, which yields a combined POM that performs segmentation/localization. This combined model can be used to train POM-edgelets, defined on edgelets, which gives a full POM with improved performance on classification. We give detailed experimental analysis on large data sets for classification and segmentation with comparison to other methods. Inference takes five seconds while learning takes approximately four hours. In addition, we show that the full POM is invariant to scale and rotation of the object (for learning and inference) and can learn hybrid objects classes (i.e., when there are several objects and the identity of the object in each image is unknown). Finally, we show that POMs can be used to match between different objects of the same category, and hence, enable objects recognition.  相似文献   

16.
We address the problem of object detection and segmentation using global holistic properties of object shape. Global shape representations are highly susceptible to clutter inevitably present in realistic images, and thus can be applied robustly only using a precise segmentation of the object. To this end, we propose a figure/ground segmentation method for extraction of image regions that resemble the global properties of a model boundary structure and are perceptually salient. Our shape representation, called the chordiogram, is based on geometric relationships of object boundary edges, while the perceptual saliency cues we use favor coherent regions distinct from the background. We formulate the segmentation problem as an integer quadratic program and use a semidefinite programming relaxation to solve it. The obtained solutions provide a segmentation of the object as well as a detection score used for object recognition. Our single-step approach achieves state-of-the-art performance on several object detection and segmentation benchmarks.  相似文献   

17.
Feature space trajectory methods for active computer vision   总被引:2,自引:0,他引:2  
We advance new active object recognition algorithms that classify rigid objects and estimate their pose from intensity images. Our algorithms automatically detect if the class or pose of an object is ambiguous in a given image, reposition the sensor as needed, and incorporate data from multiple object views in determining the final object class and pose estimate. A probabilistic feature space trajectory (FST) in a global eigenspace is used to represent 3D distorted views of an object and to estimate the class and pose of an input object. Confidence measures for the class and pose estimates, derived using the probabilistic FST object representation, determine when additional observations are required as well as where the sensor should be positioned to provide the most useful information. We demonstrate the ability to use FSTs constructed from images rendered from computer-aided design models to recognize real objects in real images and present test results for a set of metal machined parts.  相似文献   

18.
书法在文化传承中占据重要地位,书法书写笔迹的生成也一直是计算机图形学的研究重点和难点.现存基于模型和经验的方法,由于建模难度大,大都将笔触表述为简单的几何图形并且缺少变化,难以真实还原毛笔书写的笔触和笔迹.使得现存书法笔迹生成软件仅仅用于娱乐,而难以上升到数字化书法教育层面.文中从计算机视觉的角度出发,通过4个相机获取毛笔的实时书写图像;针对Deeplabv3+语义分割算法无法有效地分割小尺寸类别的缺点进行优化,使用优化的Deeplabv3+算法提取图像中毛笔笔头等关键信息,并通过Hough变换和PnP位姿估计算法计算笔杆相对位姿;基于位姿信息矫正和融合各相机笔触图像,提出一种未知区域估计方法估计相机无法拍摄到的笔触区域.按照不同条件提取400多幅书写图像作为数据集并进行实验结果表明,优化后的Deeplabv3+算法平均交并比(mean intersection-over-union,mIOU)达到0.849,与优化前相比提升了0.117;在小尺寸类别上交并比(intersection-over-union,IOU)达到0.59,提升了0.473.在保证实时性的前提下,最终生成的笔触与传统基于模型和经验的方法相比,可以更加真实地还原书写时的笔触,并避免对毛笔进行复杂的建模,为笔迹生成研究提供一种新的思路.  相似文献   

19.
We present a bottom-up aggregation approach to image segmentation. Beginning with an image, we execute a sequence of steps in which pixels are gradually merged to produce larger and larger regions. In each step, we consider pairs of adjacent regions and provide a probability measure to assess whether or not they should be included in the same segment. Our probabilistic formulation takes into account intensity and texture distributions in a local area around each region. It further incorporates priors based on the geometry of the regions. Finally, posteriors based on intensity and texture cues are combined using “a mixture of experts” formulation. This probabilistic approach is integrated into a graph coarsening scheme, providing a complete hierarchical segmentation of the image. The algorithm complexity is linear in the number of the image pixels and it requires almost no user-tuned parameters. In addition, we provide a novel evaluation scheme for image segmentation algorithms, attempting to avoid human semantic considerations that are out of scope for segmentation algorithms. Using this novel evaluation scheme, we test our method and provide a comparison to several existing segmentation algorithms.  相似文献   

20.
ABSTRACT

A shape prior-based object segmentation is developed in this paper by using a shape transformation distance to constrain object contour evolution. In the proposed algorithm, the transformation distance measures the dissimilarity between two unaligned shapes by cyclic shift, which is called ‘circulant dissimilarity’. This dissimilarity with respect to transformation of the object shape is represented by circular convolution, which could be efficiently computed by using fast Fourier transform. Given a set of training shapes, the kernel density estimation is adopted to model shape prior. By integrating low-level image feature, high-level shape prior and transformation distance, a variational segmentation model is proposed to solve the transformation invariance of shape prior. Numerical experiments demonstrate that circulant dissimilarity-based shape registration outperforms the iterative optimization on explicit pose parameters, and show promising results and highlight the potential of the method for object registration and segmentation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号