首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach.  相似文献   

2.
Active Appearance Models (AAMs) are generative, parametric models that have been successfully used in the past to model deformable objects such as human faces. The original AAMs formulation was 2D, but they have recently been extended to include a 3D shape model. A variety of single-view algorithms exist for fitting and constructing 3D AAMs but one area that has not been studied is multi-view algorithms. In this paper we present multi-view algorithms for both fitting and constructing 3D AAMs. Fitting an AAM to an image consists of minimizing the error between the input image and the closest model instance; i.e. solving a nonlinear optimization problem. In the first part of the paper we describe an algorithm for fitting a single AAM to multiple images, captured simultaneously by cameras with arbitrary locations, rotations, and response functions. This algorithm uses the scaled orthographic imaging model used by previous authors, and in the process of fitting computes, or calibrates, the scaled orthographic camera matrices. In the second part of the paper we describe an extension of this algorithm to calibrate weak perspective (or full perspective) camera models for each of the cameras. In essence, we use the human face as a (non-rigid) calibration grid. We demonstrate that the performance of this algorithm is roughly comparable to a standard algorithm using a calibration grid. In the third part of the paper, we show how camera calibration improves the performance of AAM fitting. A variety of non-rigid structure-from-motion algorithms, both single-view and multi-view, have been proposed that can be used to construct the corresponding 3D non-rigid shape models of a 2D AAM. In the final part of the paper, we show that constructing a 3D face model using non-rigid structure-from-motion suffers from the Bas-Relief ambiguity and may result in a “scaled” (stretched/compressed) model. We outline a robust non-rigid motion-stereo algorithm for calibrated multi-view 3D AAM construction and show how using calibrated multi-view motion-stereo can eliminate the Bas-Relief ambiguity and yield face models with higher 3D fidelity. Electronic Supplementary Material The online version of this article () contains supplementary material, which is available to authorized users.  相似文献   

3.
Deformable models are widely used for image segmentation, most commonly to find single objects within an image. Although several methods have been proposed to segment multiple objects using deformable models, substantial limitations in their utility remain. This paper presents a multiple object segmentation method using a novel and efficient object representation for both two and three dimensions. The new framework guarantees object relationships and topology, prevents overlaps and gaps, enables boundary-specific speeds, and has a computationally efficient evolution scheme that is largely independent of the number of objects. Maintaining object relationships and straightforward use of object-specific and boundary-specific smoothing and advection forces enables the segmentation of objects with multiple compartments, a critical capability in the parcellation of organs in medical imaging. Comparing the new framework with previous approaches shows its superior performance and scalability.  相似文献   

4.
The detailed understanding of animal locomotion is an important part of biology, motion science and robotics. To analyze the motion, high-speed x-ray sequences of walking animals are recorded. The biological evaluation is based on anatomical key points in the images, and the goal is to find these landmarks automatically. Unfortunately, low contrast and occlusions in the images drastically complicate this task. As recently shown, Active Appearance Models (AAMs) can be successfully applied to this problem. However, obtaining reliable quantitative results is a tedious task, as the human error is unknown. In this work, we present the results of a large scale study which allows us to quantify both the tracking performance of humans as well as AAMs. Furthermore, we show that the AAM-based approach provides results which are comparable to those of human experts.  相似文献   

5.
We introduce a segmentation-based detection and top-down figure-ground delineation algorithm. Unlike common methods which use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation. Our algorithm receives as input an image, along with its bottom-up hierarchical segmentation. The shape of each segment is then described both by its significant boundary sections and by regional, dense orientation information derived from the segment’s shape using the Poisson equation. Our method then examines multiple, overlapping segmentation hypotheses, using their shape and color, in an attempt to find a “coherent whole,” i.e., a collection of segments that consistently vote for an object at a single location in the image. Once an object is detected, we propose a novel pixel-level top-down figure-ground segmentation by “competitive coverage” process to accurately delineate the boundaries of the object. In this process, given a particular detection hypothesis, we let the voting segments compete for interpreting (covering) each of the semantic parts of an object. Incorporating competition in the process allows us to resolve ambiguities that arise when two different regions are matched to the same object part and to discard nearby false regions that participated in the voting process. We provide quantitative and qualitative experimental results on challenging datasets. These experiments demonstrate that our method can accurately detect and segment objects with complex shapes, obtaining results comparable to those of existing state of the art methods. Moreover, our method allows us to simultaneously detect multiple instances of class objects in images and to cope with challenging types of occlusions such as occlusions by a bar of varying size or by another object of the same class, that are difficult to handle with other existing class-specific top-down segmentation methods.  相似文献   

6.
Active appearance models (AAMs) are useful for face tracking for the advantages of detailed face interpretation, accurate alignment and high efficiency. However, they are sensitive to initial parameters and may easily be stuck in local minima due to the gradient-descent optimization, which makes the AAM based face tracker unstable in the presence of large pose deviation and fast motion. In this paper, we propose to combine the view-based AAMs with two novel temporal filters to overcome the limitations. First, we build a new view space based on the shape parameters of AAMs, instead of the model parameters controlling both the shape and appearance, for the purpose of pose estimation. Then the Kalman filter is used to simultaneously update the pose and shape parameters for a better fitting of each frame. Second, we propose a temporal matching filter which is twofold. The inter-frame local appearance constraint is incorporated into AAM fitting, where the mechanism of the active shape model (ASM) is also implemented in a unified framework to find more accurate matching points. Moreover, we propose to initialize the shape with correspondences found by a random forest based local feature matching. By introducing the local information and temporal correspondences, the twofold temporal matching filter improves the tracking stability when confronted with fast appearance changes. Experimental results show that our algorithm is more pose robust than basic AAMs and some state-of-art AAM based methods, and that it can also handle large expressions and non-extreme illumination changes in test video sequences.  相似文献   

7.
提出一种新的活动轮廓模型,应用于灰度图像分割。此模型建立在流体 静力学理论之上, 运用流体静力学理论直接驱动连续曲线,逼近被包围的目标。该模型能够 分割多重目标、能够分割嵌套的目标、能够有效地控制过分割现象。  相似文献   

8.

This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.

  相似文献   

9.
In this paper, we present a new framework for three-dimensional (3D) reconstruction of multiple rigid objects from dynamic scenes. Conventional 3D reconstruction from multiple views is applicable to static scenes, in which the configuration of objects is fixed while the images are taken. In our framework, we aim to reconstruct the 3D models of multiple objects in a more general setting where the configuration of the objects varies among views. We solve this problem by object-centered decomposition of the dynamic scenes using unsupervised co-recognition approach. Unlike conventional motion segmentation algorithms that require small motion assumption between consecutive views, co-recognition method provides reliable accurate correspondences of a same object among unordered and wide-baseline views. In order to segment each object region, we benefit from the 3D sparse points obtained from the structure-from-motion. These points are reliable and serve as automatic seed points for a seeded-segmentation algorithm. Experiments on various real challenging image sequences demonstrate the effectiveness of our approach, especially in the presence of abrupt independent motions of objects.  相似文献   

10.
黄叶珏  褚一平 《计算机工程》2010,36(9):232-234,
针对实际应用中待分割目标类型已知的情况,提出一种结合识别信息的多目标视频分割算法,使用训练数据集构建目标以及背景的特征字典,计算视频帧的超像素,构造一个分层条件随机场模型,用于约束视频帧的局部邻域和全局邻域,通过求解分层条件随机场模型,获得最终分割结果。实验结果表明,该算法能够对视频中相互遮挡及残缺不全的多个目标进行有效分割。  相似文献   

11.
针对实际应用中待分割目标类型已知的情况,提出一种结合识别信息的多目标视频分割算法,使用训练数据集构建目标以及背景的特征字典,计算视频帧的超像素,构造一个分层条件随机场模型,用于约束视频帧的局部邻域和全局邻域,通过求解分层条件随机场模型,获得最终分割结果。实验结果表明,该算法能够对视频中相互遮挡及残缺不全的多个目标进行有效分割。  相似文献   

12.
采用统计推断的自动视频对象分割   总被引:7,自引:7,他引:7  
在新一代MPEG-4视频编码标准中,为了支持面向对象编码和实现基于内容的应用,视频对象(VO)的自动分割成为关键技术之一,减背景法是视频对象自动分割的基本方法,但是不同的环境光照条件常常给视频对象的分割带来困难,提出一种基于统计推断的减背景法。该方法首先建立背景统计模型,然后对后续帧进行假设检验,从而分割出视频对象。文中算法采用HSV颜色空间,通过对背景统计模型中各颜色分量的有效分析和区别使用,能够很好地适应不同的环境光照条件。实验表明,文中算法能够在各种光照环境下自动地实现视频对象的准确分割。  相似文献   

13.
M-reps (formerly called DSLs) are a multiscale medial means for modeling and rendering 3D solid geometry. They are particularly well suited to model anatomic objects and in particular to capture prior geometric information effectively in deformable models segmentation approaches. The representation is based on figural models, which define objects at coarse scale by a hierarchy of figures—each figure generally a slab representing a solid region and its boundary simultaneously. This paper focuses on the use of single figure models to segment objects of relatively simple structure.A single figure is a sheet of medial atoms, which is interpolated from the model formed by a net, i.e., a mesh or chain, of medial atoms (hence the name m-reps), each atom modeling a solid region via not only a position and a width but also a local figural frame giving figural directions and an object angle between opposing, corresponding positions on the boundary implied by the m-rep. The special capability of an m-rep is to provide spatial and orientational correspondence between an object in two different states of deformation. This ability is central to effective measurement of both geometric typicality and geometry to image match, the two terms of the objective function optimized in segmentation by deformable models. The other ability of m-reps central to effective segmentation is their ability to support segmentation at multiple levels of scale, with successively finer precision. Objects modeled by single figures are segmented first by a similarity transform augmented by object elongation, then by adjustment of each medial atom, and finally by displacing a dense sampling of the m-rep implied boundary. While these models and approaches also exist in 2D, we focus on 3D objects.The segmentation of the kidney from CT and the hippocampus from MRI serve as the major examples in this paper. The accuracy of segmentation as compared to manual, slice-by-slice segmentation is reported.  相似文献   

14.
This paper proposes a new method to segment and track multiple objects through occlusion by integrating spatial-color Gaussian mixture model (SCGMM) into an energy minimization framework. When occlusion does not occur, a SCGMM is learned for each object. When the objects are subject to occlusion, energy minimization is used to segment the objects from occlusion. To make the learned SCGMMs suitable for the segmentation of the current occlusion, a displacing procedure is utilized to adapt the SCGMMs to the spatial variations. A multi-label energy function is formulated building on the displaced SCGMMs and then minimized using the multi-label graph cut algorithm, thus leading to both the segmentation and tracking results of the objects with occlusion. Experimental validation of the proposed method is performed and presented on several video sequences.  相似文献   

15.
基层层次光流的半自动时空视频分割技术   总被引:1,自引:0,他引:1       下载免费PDF全文
在新一代MPEG-4视频编码标准中,为了支持面向对象编码和实现基于内容的应用,视频的半自动分割成为关键技术之一,为此提出了一种基于层次光流的半自动时空视频分割算法。该算法由空域分割和时域分割组成。在空域分割中,提出的基于点的图形用户界面(PBGUI),在用户的协助下,能够精确地定义需要分割的视频对象(VO)。时域分割根据空域分割的结果采用层次光流算法对视频对象进行边界和整体跟踪。实验结果表明,利用该算法,能够精确地分割出视频对象。  相似文献   

16.
在新一代 MPEG- 4视频编码标准中 ,为了支持面向对象编码和实现基于内容的应用 ,视频的半自动分割成为关键技术之一 ,为此提出了一种基于层次光流的半自动时空视频分割算法 .该算法由空域分割和时域分割组成 .在空域分割中 ,提出的基于点的图形用户界面 (PBGU I) ,在用户的协助下 ,能够精确地定义需要分割的视频对象 (VO) .时域分割根据空域分割的结果采用层次光流算法对视频对象进行边界和整体跟踪 .实验结果表明 ,利用该算法 ,能够较精确地分割出视频对象 .  相似文献   

17.
18.
We construct a segmentation scheme that combines top-down with bottom-up processing. In the proposed scheme, segmentation and recognition are intertwined rather than proceeding in a serial manner. The top-down part applies stored knowledge about object shapes acquired through learning, whereas the bottom-up part creates a hierarchy of segmented regions based on uniformity criteria. Beginning with unsegmented training examples of class and non-class images, the algorithm constructs a bank of class-specific fragments and determines their figure-ground segmentation. This bank is then used to segment novel images in a top-down manner: the fragments are first used to recognize images containing class objects, and then to create a complete cover that best approximates these objects. The resulting segmentation is then integrated with bottom-up multi-scale grouping to better delineate the object boundaries. Our experiments, applied to a large set of four classes (horses, pedestrians, cars, faces), demonstrate segmentation results that surpass those achieved by previous top-down or bottom-up schemes. The main novel aspects of this work are the fragment learning phase, which efficiently learns the figure-ground labeling of segmentation fragments, even in training sets with high object and background variability; combining the top-down segmentation with bottom-up criteria to draw on their relative merits; and the use of segmentation to improve recognition.  相似文献   

19.
The active appearance model (AAM) is a powerful method for modeling and segmenting deformable visual objects. The utility of the AAM stems from two fronts: its compact representation as a linear object class and its rapid fitting procedure, which utilizes fixed linear updates. Although the original fitting procedure works well for objects with restricted variability when initialization is close to the optimum, its efficacy deteriorates in more general settings, with regards to both accuracy and capture range. In this paper, we propose a novel fitting procedure where training is coupled with, and directly addresses, AAM fitting in its deployment. This is achieved by simulating the conditions of real fitting problems and learning the best set of fixed linear mappings, such that performance over these simulations is optimized. The power of the approach does not stem from an update model with larger capacity, but from addressing the whole fitting procedure simultaneously. To motivate the approach, it is compared with a number of existing AAM fitting procedures on two publicly available face databases. It is shown that this method exhibits convergence rates, capture range and convergence accuracy that are significantly better than other linear methods and comparable to a nonlinear method, whilst affording superior computational efficiency.  相似文献   

20.
In this paper, it is introduced an interactive method to object segmentation in image sequences, by combining classical morphological segmentation with motion estimation – the watershed from propagated markers. In this method, the objects are segmented interactively in the first frame and the mask generated by its segmentation provides the markers that will be used to track and segment the object in the next frame. Besides the interactivity, the proposed method has the following important characteristics: generality, rapid response and progressive manual edition. This paper also introduces a new benchmark to do quantitative evaluation of assisted object segmentation methods applied to image sequences. The evaluation is done according to several criteria such as the robustness of segmentation and the easiness to segment the objects through the sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号