首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
We introduce a segmentation-based detection and top-down figure-ground delineation algorithm. Unlike common methods which use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation. Our algorithm receives as input an image, along with its bottom-up hierarchical segmentation. The shape of each segment is then described both by its significant boundary sections and by regional, dense orientation information derived from the segment’s shape using the Poisson equation. Our method then examines multiple, overlapping segmentation hypotheses, using their shape and color, in an attempt to find a “coherent whole,” i.e., a collection of segments that consistently vote for an object at a single location in the image. Once an object is detected, we propose a novel pixel-level top-down figure-ground segmentation by “competitive coverage” process to accurately delineate the boundaries of the object. In this process, given a particular detection hypothesis, we let the voting segments compete for interpreting (covering) each of the semantic parts of an object. Incorporating competition in the process allows us to resolve ambiguities that arise when two different regions are matched to the same object part and to discard nearby false regions that participated in the voting process. We provide quantitative and qualitative experimental results on challenging datasets. These experiments demonstrate that our method can accurately detect and segment objects with complex shapes, obtaining results comparable to those of existing state of the art methods. Moreover, our method allows us to simultaneously detect multiple instances of class objects in images and to cope with challenging types of occlusions such as occlusions by a bar of varying size or by another object of the same class, that are difficult to handle with other existing class-specific top-down segmentation methods.  相似文献   

2.
Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how to combine both top-down and bottom-up cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute high quality segmentations.  相似文献   

3.
The Segmentation According to Natural Examples (SANE) algorithm learns to segment objects in static images from video training data. SANE uses background subtraction to find the segmentation of moving objects in videos. This provides object segmentation information for each video frame. The collection of frames and segmentations forms a training set that SANE uses to learn the image and shape properties of the observed motion boundaries. When presented with new static images, the trained model infers segmentations similar to the observed motion segmentations. SANE is a general method for learning environment-specific segmentation models. Because it can automatically generate training data from video, it can adapt to a new environment and new objects with relative ease, an advantage over untrained segmentation methods or those that require human-labeled training data. By using the local shape information in the training data, it outperforms a trained local boundary detector. Its performance is competitive with a trained top-down segmentation algorithm that uses global shape. The shape information it learns from one class of objects can assist the segmentation of other classes.  相似文献   

4.
Even though visual attention models using bottom-up saliency can speed up object recognition by predicting object locations, in the presence of multiple salient objects, saliency alone cannot discern target objects from the clutter in a scene. Using a metric named familiarity, we propose a top-down method for guiding attention towards target objects, in addition to bottom-up saliency. To demonstrate the effectiveness of familiarity, the unified visual attention model (UVAM) which combines top-down familiarity and bottom-up saliency is applied to SIFT based object recognition. The UVAM is tested on 3600 artificially generated images containing COIL-100 objects with varying amounts of clutter, and on 126 images of real scenes. The recognition times are reduced by 2.7× and 2×, respectively, with no reduction in recognition accuracy, demonstrating the effectiveness and robustness of the familiarity based UVAM.  相似文献   

5.
In this paper, a new artificial neural network model is proposed for visual object recognition, in which the bottom-up, sensory-driven pathway and top-down, expectation-driven pathway are fused in information processing and their corresponding weights are learned based on the fused neuron activities. During the supervised learning process, the target labels are applied to update the bottom-up synaptic weights of the neural network. Meanwhile, the hypotheses generated by the bottom-up pathway produce expectations on sensory inputs through the top-down pathway. The expectations are constrained by the real data from the sensory inputs, which can be used to update the top-down synaptic weights accordingly. To further improve the visual object recognition performance, the multi-scale histograms of oriented gradients (MS-HOG) method is proposed to extract local features of visual objects from images. Extensive experiments on different image datasets demonstrate the efficiency and robustness of the proposed neural network model with features extracted using the MS-HOG method on visual object recognition compared with other state-of-the-art methods.  相似文献   

6.
FORMS: A flexible object recognition and modelling system   总被引:4,自引:1,他引:3  
We describe a flexible object recognition and modelling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition. We model all objects at three levels of complexity: (i) the primitives, (ii) the mid-grained shapes, which are deformations of the primitives, and (iii) objects constructed by using a grammar to join mid-grained shapes together. The deformations of the primitives can be characterized by principal component analysis or modal analysis. When doing recognition the representations of these objects are obtained in a bottom-up manner from their silhouettes by a novel method for skeleton extraction and part segmentation based on deformable circles. These representations are then matched to a database of prototypical objects to obtain a set of candidate interpretations. These interpretations are verified in a top-down process. The system is demonstrated to be stable in the presence of noise, the absence of parts, the presence of additional parts, and considerable variations in articulation and viewpoint. Finally, we describe how such a representation scheme can be automatically learnt from examples.  相似文献   

7.
We present an approach to visual object-class segmentation and recognition based on a pipeline that combines multiple figure-ground hypotheses with large object spatial support, generated by bottom-up computational processes that do not exploit knowledge of specific categories, and sequential categorization based on continuous estimates of the spatial overlap between the image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in formulating recognition as a regression problem. Instead of focusing on a one-vs.-all winning margin that may not preserve the ordering of segment qualities inside the non-maximum (non-winning) set, our learning method produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses are likely to spatially overlap the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape as well as PASCAL VOC 2009 and 2010.  相似文献   

8.
Robust Object Detection with Interleaved Categorization and Segmentation   总被引:5,自引:0,他引:5  
This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.  相似文献   

9.
提出一种基于注意力的图像分割算法,在视觉场景选择机制基础上结合目标色彩特征的任务驱动机制,形成了自下而上和自上而下的注意力集成分割机理。该算法在图像的多尺度空间中,把视觉场景的亮度、颜色和方向特征与任务目标色彩特征同时进行提取,生成场景和目标相结合的显著图,然后在基于视觉注意力图像空间中对“场景-目标” 显著图进行归一化的跨尺度融合,最后通过双线性插值和显著图连通区域二值化分割出图像目标注意力焦点。应用该算法对自然场景与室内场景图像进行实验,结果表明该方法在各种环境中尤其是干扰物体较显著的情形下都能成功地分割提取出目标物体。  相似文献   

10.
In this paper, we propose a general framework for fusing bottom-up segmentation with top-down object behavior inference over an image sequence. This approach is beneficial for both tasks, since it enables them to cooperate so that knowledge relevant to each can aid in the resolution of the other, thus enhancing the final result. In particular, the behavior inference process offers dynamic probabilistic priors to guide segmentation. At the same time, segmentation supplies its results to the inference process, ensuring that they are consistent both with prior knowledge and with new image information. The prior models are learned from training data and they adapt dynamically, based on newly analyzed images. We demonstrate the effectiveness of our framework via particular implementations that we have employed in the resolution of two hand gesture recognition applications. Our experimental results illustrate the robustness of our joint approach to segmentation and behavior inference in challenging conditions involving complex backgrounds and occlusions of the target object.  相似文献   

11.
This paper presents a model of 3D object recognition motivated from the robust properties of human vision system (HVS). The HVS shows the best efficiency and robustness for an object identification task. The robust properties of the HVS are visual attention, contrast mechanism, feature binding, multi-resolution, size tuning, and part-based representation. In addition, bottom-up and top-down information are combined cooperatively. Based on these facts, a plausible computational model integrating these facts under the Monte Carlo optimization technique was proposed. In this scheme, object recognition is regarded as a parameter optimization problem. The bottom-up process is used to initialize parameters in a discriminative way; the top-down process is used to optimize them in a generative way. Experimental results show that the proposed recognition model is feasible for 3D object identification and pose estimation in visible and infrared band images.  相似文献   

12.
This article proposes a method to segment Internet images, that is, a group of images corresponding to a specific object (the query) containing a significant amount of irrelevant images. The segmentation algorithm we propose is a combination of two distinct methods based on color. The first one considers all images to classify pixels into two sets: object pixels and background pixels. The second method segments images individually by trying to find a central object. The final segmentation is obtained by intersecting the results from both. The segmentation results are then used to re-rank images and display a clean set of images illustrating the query. The algorithm is tested on various queries for animals, natural and man-made objects, and results are discussed, showing that the obtained segmentation results are suitable for object learning.  相似文献   

13.
Color image segmentation by fixation-based active learning with ELM   总被引:1,自引:0,他引:1  
The human vision observes an image by making a series of fixations. In fixation, our eyes continually tremble, which is called the microsaccades that may reflect an optimal sampling strategy and spatiotemporal characteristics. Although the decrease in microsaccade magnitude leads to visual fading in our brain that may provide a mechanism to shift fixation. This paper proposes an iterative framework for figure-ground segmentation by sampling-learning via simulating human vision. First, fixation-based sampling is utilized to get a few positive and negative samples. A pixels classifier based on the RGB color could be trained by ELM (extreme learning machine) algorithm, which not only extracts object regions, but also provides a reference boundary of objects. Then, the boundary of object region could be refined by minimizing graph cut. The region of refined object can be re-sampled to provide more accurate samples/pixels involved object and background for the next training. The iteration would convergence when the pixel classifier gets stable segmentation result continually. Based on the ELM algorithm, the proposed method run faster than state-of-the-art method, and can cope with the complexity and uncertainty of the scene. Experimental results demonstrate the learning-based method could reliably segment multiple-color objects from complex scenes.  相似文献   

14.
In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach.  相似文献   

15.
Scene analysis is a major aspect of perception and continues to challenge machine perception. This paper addresses the scene-analysis problem by integrating a primitive segmentation stage with a model of associative memory. The model is a multistage system that consists of an initial primitive segmentation stage, a multimodule associative memory, and a short-term memory (STM) layer. Primitive segmentation is performed by a locally excitatory globally inhibitory oscillator network (LEGION), which segments the input scene into multiple parts that correspond to groups of synchronous oscillations. Each segment triggers memory recall and multiple recalled patterns then interact with one another in the STM layer. The STM layer projects to the LEGION network, giving rise to memory-based grouping and segmentation. The system achieves scene analysis entirely in phase space, which provides a unifying mechanism for both bottom-up analysis and top-down analysis. The model is evaluated with a systematic set of three-dimensional (3-D) line drawing objects, which are arranged in an arbitrary fashion to compose input scenes that allow object occlusion. Memory-based organization is responsible for a significant improvement in performance. A number of issues are discussed, including input-anchored alignment, top-down organization, and the role of STM in producing context sensitivity of memory recall.  相似文献   

16.
This paper presents a framework called Cresceptron for view-based learning, recognition and segmentation. Specifically, it recognizes and segments image patterns that are similar to those learned, using a stochastic distortion model and view-based interpolation, allowing other view points that are moderately different from those used in learning. The learning phase is interactive. The user trains the system using a collection of training images. For each training image, the user manually draws a polygon outlining the region of interest and types in the label of its class. Then, from the directional edges of each of the segmented regions, the Cresceptron uses a hierarchical self-organization scheme to grow a sparsely connected network automatically, adaptively and incrementally during the learning phase. At each level, the system detects new image structures that need to be learned and assigns a new neural plane for each new feature. The network grows by creating new nodes and connections which memorize the new image structures and their context as they are detected. Thus, the structure of the network is a function of the training exemplars. The Cresceptron incorporates both individual learning and class learning; with the former, each training example is treated as a different individual while with the latter, each example is a sample of a class. In the performance phase, segmentation and recognition are tightly coupled. No foreground extraction is necessary, which is achieved by backtracking the response of the network down the hierarchy to the image parts contributing to recognition. Several stochastic shape distortion models are analyzed to show why multilevel matching such as that in the Cresceptron can deal with more general stochastic distortions that a single-level matching scheme cannot. The system is demonstrated using images from broadcast television and other video segments to learn faces and other objects, and then later to locate and to recognize similar, but possibly distorted, views of the same objects.  相似文献   

17.
This paper presents an efficient and practical approach for automatic, unsupervised object detection and segmentation in two-texture images based on the concept of Gabor filter optimization. The entire process occurs within a hierarchical framework and consists of the steps of detection, coarse segmentation, and fine segmentation. In the object detection step, the image is first processed using a Gabor filter bank. Then, the histograms of the filtered responses are analyzed using the scale-space approach to predict the presence/absence of an object in the target image. If the presence of an object is reported, the proposed approach proceeds to the coarse segmentation stage, wherein the best Gabor filter (among the bank of filters) is automatically chosen, and used to segment the image into two distinct regions. Finally, in the fine segmentation step, the coefficients of the best Gabor filter (output from the previous stage) are iteratively refined in order to further fine-tune and improve the segmentation map produced by the coarse segmentation step. In the validation study, the proposed approach is applied as part of a machine vision scheme with the goal of quantifying the stain-release property of fabrics. To that end, the presented hierarchical scheme is used to detect and segment stains on a sizeable set of digitized fabric images, and the performance evaluation of the detection, coarse segmentation, and fine segmentation steps is conducted using appropriate metrics. The promising nature of these results bears testimony to the efficacy of the proposed approach.  相似文献   

18.
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a parsing graph, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.  相似文献   

19.
Scalability is an important issue in object recognition as it reduces database storage and recognition time. In this paper, we propose a new scalable 3D object representation and a learning method to recognize many everyday objects. The key proposal for scalable object representation is to combine the concept of feature sharing with multi-view clustering in part-based object representation, in particular a common-frame constellation model (CFCM). In this representation scheme, we also propose a fully automatic learning method: appearance-based automatic feature clustering and sequential construction of clustered CFCMs from labeled multi-views and multiple objects. We evaluated the scalability of the proposed method to COIL-100 DB and applied the learning scheme to 112 objects with 620 training views. Experimental results show the scalable learning results in almost constant recognition performance relative to the number of objects.  相似文献   

20.
目的 针对细粒度图像分类中的背景干扰问题,提出一种利用自上而下注意图分割的分类模型。方法 首先,利用卷积神经网络对细粒度图像库进行初分类,得到基本网络模型。再对网络模型进行可视化分析,发现仅有部分图像区域对目标类别有贡献,利用学习好的基本网络计算图像像素对相关类别的空间支持度,生成自上而下注意图,检测图像中的关键区域。再用注意图初始化GraphCut算法,分割出关键的目标区域,从而提高图像的判别性。最后,对分割图像提取CNN特征实现细粒度分类。结果 该模型仅使用图像的类别标注信息,在公开的细粒度图像库Cars196和Aircrafts100上进行实验验证,最后得到的平均分类正确率分别为86.74%和84.70%。这一结果表明,在GoogLeNet模型基础上引入注意信息能够进一步提高细粒度图像分类的正确率。结论 基于自上而下注意图的语义分割策略,提高了细粒度图像的分类性能。由于不需要目标窗口和部位的标注信息,所以该模型具有通用性和鲁棒性,适用于显著性目标检测、前景分割和细粒度图像分类应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号