共查询到20条相似文献,搜索用时 359 毫秒
1.
We introduce a segmentation-based detection and top-down figure-ground delineation algorithm. Unlike common methods which
use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation.
Our algorithm receives as input an image, along with its bottom-up hierarchical segmentation. The shape of each segment is
then described both by its significant boundary sections and by regional, dense orientation information derived from the segment’s
shape using the Poisson equation. Our method then examines multiple, overlapping segmentation hypotheses, using their shape
and color, in an attempt to find a “coherent whole,” i.e., a collection of segments that consistently vote for an object at
a single location in the image. Once an object is detected, we propose a novel pixel-level top-down figure-ground segmentation
by “competitive coverage” process to accurately delineate the boundaries of the object. In this process, given a particular
detection hypothesis, we let the voting segments compete for interpreting (covering) each of the semantic parts of an object.
Incorporating competition in the process allows us to resolve ambiguities that arise when two different regions are matched
to the same object part and to discard nearby false regions that participated in the voting process.
We provide quantitative and qualitative experimental results on challenging datasets. These experiments demonstrate that our
method can accurately detect and segment objects with complex shapes, obtaining results comparable to those of existing state
of the art methods. Moreover, our method allows us to simultaneously detect multiple instances of class objects in images
and to cope with challenging types of occlusions such as occlusions by a bar of varying size or by another object of the same
class, that are difficult to handle with other existing class-specific top-down segmentation methods. 相似文献
2.
Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent
top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms,
they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how
to combine both top-down and bottom-up cues in a principled manner.
In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train
a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem
in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently
search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous
learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute
high quality segmentations. 相似文献
3.
Ross Michael G. Kaelbling Leslie Pack 《IEEE transactions on pattern analysis and machine intelligence》2009,31(4):661-676
The Segmentation According to Natural Examples (SANE) algorithm learns to segment objects in static images from video training data. SANE uses background subtraction to find the segmentation of moving objects in videos. This provides object segmentation information for each video frame. The collection of frames and segmentations forms a training set that SANE uses to learn the image and shape properties of the observed motion boundaries. When presented with new static images, the trained model infers segmentations similar to the observed motion segmentations. SANE is a general method for learning environment-specific segmentation models. Because it can automatically generate training data from video, it can adapt to a new environment and new objects with relative ease, an advantage over untrained segmentation methods or those that require human-labeled training data. By using the local shape information in the training data, it outperforms a trained local boundary detector. Its performance is competitive with a trained top-down segmentation algorithm that uses global shape. The shape information it learns from one class of objects can assist the segmentation of other classes. 相似文献
4.
Seungjin Lee Author Vitae Kwanho Kim Author VitaeAuthor Vitae Minsu Kim Author VitaeAuthor Vitae 《Pattern recognition》2010,43(3):1116-1128
Even though visual attention models using bottom-up saliency can speed up object recognition by predicting object locations, in the presence of multiple salient objects, saliency alone cannot discern target objects from the clutter in a scene. Using a metric named familiarity, we propose a top-down method for guiding attention towards target objects, in addition to bottom-up saliency. To demonstrate the effectiveness of familiarity, the unified visual attention model (UVAM) which combines top-down familiarity and bottom-up saliency is applied to SIFT based object recognition. The UVAM is tested on 3600 artificially generated images containing COIL-100 objects with varying amounts of clutter, and on 126 images of real scenes. The recognition times are reduced by 2.7× and 2×, respectively, with no reduction in recognition accuracy, demonstrating the effectiveness and robustness of the familiarity based UVAM. 相似文献
5.
In this paper, a new artificial neural network model is proposed for visual object recognition, in which the bottom-up, sensory-driven pathway and top-down, expectation-driven pathway are fused in information processing and their corresponding weights are learned based on the fused neuron activities. During the supervised learning process, the target labels are applied to update the bottom-up synaptic weights of the neural network. Meanwhile, the hypotheses generated by the bottom-up pathway produce expectations on sensory inputs through the top-down pathway. The expectations are constrained by the real data from the sensory inputs, which can be used to update the top-down synaptic weights accordingly. To further improve the visual object recognition performance, the multi-scale histograms of oriented gradients (MS-HOG) method is proposed to extract local features of visual objects from images. Extensive experiments on different image datasets demonstrate the efficiency and robustness of the proposed neural network model with features extracted using the MS-HOG method on visual object recognition compared with other state-of-the-art methods. 相似文献
6.
FORMS: A flexible object recognition and modelling system 总被引:4,自引:1,他引:3
We describe a flexible object recognition and modelling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition. We model all objects at three levels of complexity: (i) the primitives, (ii) the mid-grained shapes, which are deformations of the primitives, and (iii) objects constructed by using a grammar to join mid-grained shapes together. The deformations of the primitives can be characterized by principal component analysis or modal analysis. When doing recognition the representations of these objects are obtained in a bottom-up manner from their silhouettes by a novel method for skeleton extraction and part segmentation based on deformable circles. These representations are then matched to a database of prototypical objects to obtain a set of candidate interpretations. These interpretations are verified in a top-down process. The system is demonstrated to be stable in the presence of noise, the absence of parts, the presence of additional parts, and considerable variations in articulation and viewpoint. Finally, we describe how such a representation scheme can be automatically learnt from examples. 相似文献
7.
João Carreira Fuxin Li Cristian Sminchisescu 《International Journal of Computer Vision》2012,98(3):243-262
We present an approach to visual object-class segmentation and recognition based on a pipeline that combines multiple figure-ground
hypotheses with large object spatial support, generated by bottom-up computational processes that do not exploit knowledge
of specific categories, and sequential categorization based on continuous estimates of the spatial overlap between the image
segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption
that good object-level segments can be obtained in a feed-forward fashion, but also in formulating recognition as a regression problem. Instead of focusing
on a one-vs.-all winning margin that may not preserve the ordering of segment qualities inside the non-maximum (non-winning)
set, our learning method produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses are likely to spatially
overlap the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection
and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape as well as PASCAL VOC 2009
and 2010. 相似文献
8.
Bastian Leibe Aleš Leonardis Bernt Schiele 《International Journal of Computer Vision》2008,77(1-3):259-289
This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes.
Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate
towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each
other and improve the combined performance.
The core part of our approach is a highly flexible learned representation for object shape that can combine the information
observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach
can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result.
This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object
pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis
draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses
and factor out the effects of partial occlusion.
An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object
categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive
object detection performance already from training sets that are between one and two orders of magnitude smaller than those
used in comparable systems. 相似文献
9.
提出一种基于注意力的图像分割算法,在视觉场景选择机制基础上结合目标色彩特征的任务驱动机制,形成了自下而上和自上而下的注意力集成分割机理。该算法在图像的多尺度空间中,把视觉场景的亮度、颜色和方向特征与任务目标色彩特征同时进行提取,生成场景和目标相结合的显著图,然后在基于视觉注意力图像空间中对“场景-目标” 显著图进行归一化的跨尺度融合,最后通过双线性插值和显著图连通区域二值化分割出图像目标注意力焦点。应用该算法对自然场景与室内场景图像进行实验,结果表明该方法在各种环境中尤其是干扰物体较显著的情形下都能成功地分割提取出目标物体。 相似文献
10.
Laura Gui Jean-Philippe Thiran Nikos Paragios 《International Journal of Computer Vision》2009,84(2):146-162
In this paper, we propose a general framework for fusing bottom-up segmentation with top-down object behavior inference over
an image sequence. This approach is beneficial for both tasks, since it enables them to cooperate so that knowledge relevant
to each can aid in the resolution of the other, thus enhancing the final result. In particular, the behavior inference process
offers dynamic probabilistic priors to guide segmentation. At the same time, segmentation supplies its results to the inference
process, ensuring that they are consistent both with prior knowledge and with new image information. The prior models are
learned from training data and they adapt dynamically, based on newly analyzed images. We demonstrate the effectiveness of
our framework via particular implementations that we have employed in the resolution of two hand gesture recognition applications.
Our experimental results illustrate the robustness of our joint approach to segmentation and behavior inference in challenging
conditions involving complex backgrounds and occlusions of the target object. 相似文献
11.
This paper presents a model of 3D object recognition motivated from the robust properties of human vision system (HVS). The
HVS shows the best efficiency and robustness for an object identification task. The robust properties of the HVS are visual
attention, contrast mechanism, feature binding, multi-resolution, size tuning, and part-based representation. In addition,
bottom-up and top-down information are combined cooperatively. Based on these facts, a plausible computational model integrating
these facts under the Monte Carlo optimization technique was proposed. In this scheme, object recognition is regarded as a
parameter optimization problem. The bottom-up process is used to initialize parameters in a discriminative way; the top-down
process is used to optimize them in a generative way. Experimental results show that the proposed recognition model is feasible
for 3D object identification and pose estimation in visible and infrared band images. 相似文献
12.
This article proposes a method to segment Internet images, that is, a group of images corresponding to a specific object (the query) containing a significant amount of irrelevant images. The segmentation algorithm we propose is a combination of two distinct methods based on color. The first one considers all images to classify pixels into two sets: object pixels and background pixels. The second method segments images individually by trying to find a central object. The final segmentation is obtained by intersecting the results from both. The segmentation results are then used to re-rank images and display a clean set of images illustrating the query. The algorithm is tested on various queries for animals, natural and man-made objects, and results are discussed, showing that the obtained segmentation results are suitable for object learning. 相似文献
13.
Chen Pan Dong Sun Park Huijuan Lu Xiangping Wu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(9):1569-1584
The human vision observes an image by making a series of fixations. In fixation, our eyes continually tremble, which is called the microsaccades that may reflect an optimal sampling strategy and spatiotemporal characteristics. Although the decrease in microsaccade magnitude leads to visual fading in our brain that may provide a mechanism to shift fixation. This paper proposes an iterative framework for figure-ground segmentation by sampling-learning via simulating human vision. First, fixation-based sampling is utilized to get a few positive and negative samples. A pixels classifier based on the RGB color could be trained by ELM (extreme learning machine) algorithm, which not only extracts object regions, but also provides a reference boundary of objects. Then, the boundary of object region could be refined by minimizing graph cut. The region of refined object can be re-sampled to provide more accurate samples/pixels involved object and background for the next training. The iteration would convergence when the pixel classifier gets stable segmentation result continually. Based on the ELM algorithm, the proposed method run faster than state-of-the-art method, and can cope with the complexity and uncertainty of the scene. Experimental results demonstrate the learning-based method could reliably segment multiple-color objects from complex scenes. 相似文献
14.
Kokkinos Iasonas Maragos Petros 《IEEE transactions on pattern analysis and machine intelligence》2009,31(8):1486-1501
In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach. 相似文献
15.
DeLiang Wang Xiuwen Liu 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2002,32(3):254-268
Scene analysis is a major aspect of perception and continues to challenge machine perception. This paper addresses the scene-analysis problem by integrating a primitive segmentation stage with a model of associative memory. The model is a multistage system that consists of an initial primitive segmentation stage, a multimodule associative memory, and a short-term memory (STM) layer. Primitive segmentation is performed by a locally excitatory globally inhibitory oscillator network (LEGION), which segments the input scene into multiple parts that correspond to groups of synchronous oscillations. Each segment triggers memory recall and multiple recalled patterns then interact with one another in the STM layer. The STM layer projects to the LEGION network, giving rise to memory-based grouping and segmentation. The system achieves scene analysis entirely in phase space, which provides a unifying mechanism for both bottom-up analysis and top-down analysis. The model is evaluated with a systematic set of three-dimensional (3-D) line drawing objects, which are arranged in an arbitrary fashion to compose input scenes that allow object occlusion. Memory-based organization is responsible for a significant improvement in performance. A number of issues are discussed, including input-anchored alignment, top-down organization, and the role of STM in producing context sensitivity of memory recall. 相似文献
16.
Weng John Ahuja Narendra Huang Thomas S. 《International Journal of Computer Vision》1997,25(2):109-143
This paper presents a framework called Cresceptron for view-based learning, recognition and segmentation. Specifically, it recognizes and segments image patterns that are similar to those learned, using a stochastic distortion model and view-based interpolation, allowing other view points that are moderately different from those used in learning. The learning phase is interactive. The user trains the system using a collection of training images. For each training image, the user manually draws a polygon outlining the region of interest and types in the label of its class. Then, from the directional edges of each of the segmented regions, the Cresceptron uses a hierarchical self-organization scheme to grow a sparsely connected network automatically, adaptively and incrementally during the learning phase. At each level, the system detects new image structures that need to be learned and assigns a new neural plane for each new feature. The network grows by creating new nodes and connections which memorize the new image structures and their context as they are detected. Thus, the structure of the network is a function of the training exemplars. The Cresceptron incorporates both individual learning and class learning; with the former, each training example is treated as a different individual while with the latter, each example is a sample of a class. In the performance phase, segmentation and recognition are tightly coupled. No foreground extraction is necessary, which is achieved by backtracking the response of the network down the hierarchy to the image parts contributing to recognition. Several stochastic shape distortion models are analyzed to show why multilevel matching such as that in the Cresceptron can deal with more general stochastic distortions that a single-level matching scheme cannot. The system is demonstrated using images from broadcast television and other video segments to learn faces and other objects, and then later to locate and to recognize similar, but possibly distorted, views of the same objects. 相似文献
17.
Cui Mao Arunkumar Gururajan Hamed Sari-Sarraf Eric Hequet 《Machine Vision and Applications》2012,23(2):349-361
This paper presents an efficient and practical approach for automatic, unsupervised object detection and segmentation in two-texture
images based on the concept of Gabor filter optimization. The entire process occurs within a hierarchical framework and consists
of the steps of detection, coarse segmentation, and fine segmentation. In the object detection step, the image is first processed
using a Gabor filter bank. Then, the histograms of the filtered responses are analyzed using the scale-space approach to predict
the presence/absence of an object in the target image. If the presence of an object is reported, the proposed approach proceeds
to the coarse segmentation stage, wherein the best Gabor filter (among the bank of filters) is automatically chosen, and used
to segment the image into two distinct regions. Finally, in the fine segmentation step, the coefficients of the best Gabor
filter (output from the previous stage) are iteratively refined in order to further fine-tune and improve the segmentation
map produced by the coarse segmentation step. In the validation study, the proposed approach is applied as part of a machine
vision scheme with the goal of quantifying the stain-release property of fabrics. To that end, the presented hierarchical
scheme is used to detect and segment stains on a sizeable set of digitized fabric images, and the performance evaluation of
the detection, coarse segmentation, and fine segmentation steps is conducted using appropriate metrics. The promising nature
of these results bears testimony to the efficacy of the proposed approach. 相似文献
18.
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a parsing graph, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions. 相似文献
19.
Scalability is an important issue in object recognition as it reduces database storage and recognition time. In this paper, we propose a new scalable 3D object representation and a learning method to recognize many everyday objects. The key proposal for scalable object representation is to combine the concept of feature sharing with multi-view clustering in part-based object representation, in particular a common-frame constellation model (CFCM). In this representation scheme, we also propose a fully automatic learning method: appearance-based automatic feature clustering and sequential construction of clustered CFCMs from labeled multi-views and multiple objects. We evaluated the scalability of the proposed method to COIL-100 DB and applied the learning scheme to 112 objects with 620 training views. Experimental results show the scalable learning results in almost constant recognition performance relative to the number of objects. 相似文献
20.
目的 针对细粒度图像分类中的背景干扰问题,提出一种利用自上而下注意图分割的分类模型。方法 首先,利用卷积神经网络对细粒度图像库进行初分类,得到基本网络模型。再对网络模型进行可视化分析,发现仅有部分图像区域对目标类别有贡献,利用学习好的基本网络计算图像像素对相关类别的空间支持度,生成自上而下注意图,检测图像中的关键区域。再用注意图初始化GraphCut算法,分割出关键的目标区域,从而提高图像的判别性。最后,对分割图像提取CNN特征实现细粒度分类。结果 该模型仅使用图像的类别标注信息,在公开的细粒度图像库Cars196和Aircrafts100上进行实验验证,最后得到的平均分类正确率分别为86.74%和84.70%。这一结果表明,在GoogLeNet模型基础上引入注意信息能够进一步提高细粒度图像分类的正确率。结论 基于自上而下注意图的语义分割策略,提高了细粒度图像的分类性能。由于不需要目标窗口和部位的标注信息,所以该模型具有通用性和鲁棒性,适用于显著性目标检测、前景分割和细粒度图像分类应用。 相似文献