首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
2.
In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach.  相似文献   

3.
Recently, various bag-of-features (BoF) methods show their good resistance to within-class variations and occlusions in object categorization. In this paper, we present a novel approach for multi-object categorization within the BoF framework. The approach addresses two issues in BoF related methods simultaneously: how to avoid scene modeling and how to predict labels of an image when multiple categories of objects are co-existing. We employ a biased sampling strategy which combines the bottom-up, biologically inspired saliency information and loose, top-down class prior information for object class modeling. Then this biased sampling component is further integrated with a multi-instance multi-label leaning and classification algorithm. With the proposed biased sampling strategy, we can perform multi-object categorization within an image without semantic segmentation. The experimental results on PASCAL VOC2007 and SUN09 show that the proposed method significantly improves the discriminative ability of BoF methods and achieves good performance in multi-object categorization tasks.  相似文献   

4.
《Real》1999,5(2):95-107
Human beings act mysteriously well on object recognition tasks; they perceive images by sensors and convey information that is processed in parallel in the brain. To some extent, massively parallel computers offer a natural support for similar tasks, since the detection of an object in a scene can be performed by repeating the same operations in different zones of the scene. Unfortunately, most parametric models, commonly used in computer vision, are not very suitable for complex matching operations that involve both noise and severe image distortions.In this paper we discuss an expectation-driven approach for object recognition where, on the basis of the shape of the object to be recognized, we select a few possible zones of the scene where attention will be focused (shape perception): then we examine the previously selected areas, tyring to confirm or reject hypotheses of objects, if any (object classification). We propose the use of an architecture that relies on neural networks for both shape perception and object classification. A vision system based on the discussed architectures has been tested on board a mobile robot as a support for its localization and navigation in indoor environments. The obtained results demonstrated good tolerance with respect to both noise and landmark distortions, allowing the robot to perform its task “just-in-time”. The proposed approach has also been tested on a massively parallel architecture, with promising performance.  相似文献   

5.
付豪  徐和根  张志明  齐少华 《计算机应用》2021,41(11):3337-3344
针对动态场景下的定位与静态语义地图构建问题,提出了一种基于语义和光流约束的动态环境下的同步定位与地图构建(SLAM)算法,以降低动态物体对定位与建图的影响。首先,对于输入的每一帧,通过语义分割获得图像中物体的掩模,再通过几何方法过滤不符合极线约束的特征点;接着,结合物体掩模与光流计算出每个物体的动态概率,根据动态概率过滤特征点以得到静态特征点,再利用静态特征点进行后续的相机位姿估计;然后,基于RGB-D图片和物体动态概率建立静态点云,并结合语义分割建立语义八叉树地图。最后,基于静态点云与语义分割创建稀疏语义地图。公共TUM数据集上的测试结果表明,高动态场景下,所提算法与ORB-SLAM2相比,在绝对轨迹误差和相对位姿误差上能取得95%以上的性能提升,与DS-SLAM、DynaSLAM相比分别减小了41%和11%的绝对轨迹误差,验证了该算法在高动态场景中具有较好的定位精度和鲁棒性。地图构建的实验结果表明,所提算法创建了静态语义地图,与点云地图相比,稀疏语义地图的存储空间需求量降低了99%。  相似文献   

6.
视觉理解,如物体检测、语义和实例分割以及动作识别等,在人机交互和自动驾驶等领域中有着广泛的应用并发挥着至关重要的作用。近年来,基于全监督学习的深度视觉理解网络取得了显著的性能提升。然而,物体检测、语义和实例分割以及视频动作识别等任务的数据标注往往需要耗费大量的人力和时间成本,已成为限制其广泛应用的一个关键因素。弱监督学习作为一种降低数据标注成本的有效方式,有望对缓解这一问题提供可行的解决方案,因而获得了较多的关注。围绕视觉弱监督学习,本文将以物体检测、语义和实例分割以及动作识别为例综述国内外研究进展,并对其发展方向和应用前景加以讨论分析。在简单回顾通用弱监督学习模型,如多示例学习(multiple instance learning, MIL)和期望—最大化(expectation-maximization, EM)算法的基础上,针对物体检测和定位,从多示例学习、类注意力图机制等方面分别进行总结,并重点回顾了自训练和监督形式转换等方法;针对语义分割任务,根据不同粒度的弱监督形式,如边界框标注、图像级类别标注、线标注或点标注等,对语义分割研究进展进行总结分析,并主要回顾了基于图像级别类别...  相似文献   

7.
This paper presents a novel image representation method for generic object recognition by using higher-order local autocorrelations on posterior probability images. The proposed method is an extension of the bag-of-features approach to posterior probability images. The standard bag-of-features approach is approximately thought of as a method that classifies an image to a category whose sum of posterior probabilities on a posterior probability image is maximum. However, by using local autocorrelations of posterior probability images, the proposed method extracts richer information than the standard bag-of-features. Experimental results reveal that the proposed method exhibits higher classification performances than the standard bag-of-features method.  相似文献   

8.
《Advanced Robotics》2013,27(8):833-846
A systematic approach to the modeling of deformable string objects is presented. Various string objects such as cords and wires are manipulated in many manufacturing processes. In such processes, it is important for successful manipulation to evaluate their shapes on a computer in advance because their shapes can be changed easily even under the same conditions. We refer to the situation that a shape can be changed into another shape even under the same constraints as shape bifurcation. In this paper, we will develop an analytical model of the shape of string objects including shape bifurcation. First, we will investigate the mechanism of the shape bifurcation phenomena based on the potential energy. Then, we will propose a hypothesis on the mechanism of shape bifurcation based on local minima of the potential energy. Second, the potential energy of a string object and the geometric constraints imposed on it are formulated. The shape of the object can be derived by minimizing the potential energy under the geometric constraints. Thirdly, a procedure to compute the shape of a deformed string object is developed considering the local minima of the energy. Finally, we show some numerical examples with shape bifurcation using our proposed method. From the results, we conclude that our proposed method accurately describes deformed shapes of string objects including shape bifurcation.  相似文献   

9.
To enable content based functionalities in video processing algorithms, decomposition of scenes into semantic objects is necessary. A semi-automatic Markov random field based multiresolution algorithm is presented for video object extraction in a complex scene. In the first frame, spatial segmentation and user intervention determine objects of interest. The specified objects are subsequently tracked in successive frames and newly appeared objects/regions are also detected. The video object extraction algorithm includes discrete wavelet transform decomposition multiresolution Markov random field (MRF)-based spatial segmentation with emphasis on border smoothness at different resolutions, and an MRF-based backward region classification that determines the tracked objects in the scene. Finally, a motion constraint, embedded in the region classifier, determines the newly appeared objects/regions and completes the proposed algorithm towards an efficient video segmentation algorithm. The results are applicable for generic segmentation applications, however the proposed multiresolution video segmentation algorithm supports scalable object-based wavelet coding in particular. Moreover, compared to traditional object extraction algorithms, it produces smoother and more visually pleasing shape masks at different resolutions. The proposed effective multiresolution video object extraction method allows for larger motion, better noise tolerance and less computational complexity  相似文献   

10.
An interactive example-driven approach to graphics recognition in engineering drawings is proposed. The scenario is that the user first interactively provides an example of a graphic object; the system instantly learns its graphical knowledge and uses the acquired knowledge to recognize the same type of graphic objects. The proposed approach represents the graphical knowledge of an object in terms of its structural components and their syntactical relationships. We summarize four types of geometric constraints for knowledge representation, based on which we develop an algorithm for knowledge acquisition. Another algorithm for graphics recognition using the acquired graphical knowledge is also proposed, which is actually a sequential examination of these constraints. In the algorithm, we first guess the next component’s attributes (e.g., size, position and orientation) by reasoning from an earlier found component and the constraint between them, and then search for this hypothetical component in the drawing. If all of the hypothetical components are found, a graphic object of this type is recognized. For improving the system’s recognition accuracy, we develop a user feedback scheme, which can update the graphical knowledge from both positive (missing) and negative (mis-recognized) examples provided by the user for subsequent recognition. Experiments have shown that our proposed approach is both efficient and effective for recognizing various types of graphic objects in engineering drawings. This paper is an extension of our papers published in ICDAR2003 and GREC2003.  相似文献   

11.
针对传统视觉SLAM在动态场景下容易出现特征匹配错误从而导致定位精度下降的问题,提出了一种基于动态物体跟踪的语义SLAM算法。基于经典的视觉SLAM框架,提取动态物体进行帧间跟踪,并利用动态物体的位姿信息来辅助相机自身的定位。首先,算法在数据预处理中使用YOLACT、RAFT以及SC-Depth网络,分别提取图像中的语义掩膜、光流向量以及像素深度值。其次,视觉前端模块根据所提信息,通过语义分割掩膜、运动一致性检验以及遮挡点检验算法计算概率图以平滑区分场景中的动态特征与静态特征。然后,后端中的捆集调整模块融合了物体运动的多特征约束以提高算法在动态场景中的位姿估计性能。最后,在KITTI和OMD数据集的动态场景中进行对比验证。实验表明,所提算法能够准确地跟踪动态物体,在室内外动态场景中具备鲁棒、良好的定位性能。  相似文献   

12.
目的 杂乱场景下的物体抓取姿态检测是智能机器人的一项基本技能。尽管六自由度抓取学习取得了进展,但先前的方法在采样和学习中忽略了物体尺寸差异,导致在小物体上抓取表现较差。方法 提出了一种物体掩码辅助采样方法,在所有物体上采样相同的点以平衡抓取分布,解决了采样点分布不均匀问题。此外,学习时采用多尺度学习策略,在物体部分点云上使用多尺度圆柱分组以提升局部几何表示能力,解决了由物体尺度差异导致的学习抓取操作参数困难问题。通过设计一个端到端的抓取网络,嵌入了提出的采样和学习方法,能够有效提升物体抓取检测性能。结果 在大型基准数据集GraspNet-1Billion上进行评估,本文方法取得对比方法中的最优性能,其中在小物体上的抓取指标平均提升了7%,大量的真实机器人实验也表明该方法具有抓取未知物体的良好泛化性能。结论 本文聚焦于小物体上的抓取,提出了一种掩码辅助采样方法嵌入到提出的端到端学习网络中,并引入了多尺度分组学习策略提高物体的局部几何表示,能够有效提升在小尺寸物体上的抓取质量,并在所有物体上的抓取评估结果都超过了对比方法。  相似文献   

13.
Moving shadow detection and removal for traffic sequences   总被引:3,自引:0,他引:3  
Segmentation of moving objects in a video sequence is a basic task for application of computer vision. However, shadows extracted along with the objects can result in large errors in object localization and recognition. In this paper, we propose a method of moving shadow detection based on edge information, which can effectively detect the cast shadow of a moving vehicle in a traffic scene. Having confirmed shadows existing in a figure, we execute the shadow removal algorithm proposed in this paper to segment the shadow from the foreground. The shadow eliminating algorithm removes the boundary of the cast shadow and preserves object edges firstly; secondly, it reconstructs coarse object shapes based on the edge information of objects; and finally, it extracts the cast shadow by subtracting the moving object from the change detection mask and performs further processing. The proposed method has been further tested on images taken under different shadow orientations, vehicle colors and vehicle sizes, and the results have revealed that shadows can be successfully eliminated and thus good video segmentation can be obtained.  相似文献   

14.
15.
Due to distortion, noise, segmentation errors, overlap, and occlusion of objects in digital images, it is usually impossible to extract complete object contours or to segment the whole objects. However, in many cases parts of contours can be correctly reconstructed either by performing edge grouping or as parts of boundaries of segmented regions. Therefore, recognition of objects based on their contour parts seems to be a promising as well as a necessary research direction.The main contribution of this paper is a system for detection and recognition of contour parts in digital images. Both detection and recognition are based on shape similarity of contour parts. For each contour part produced by contour grouping, we use shape similarity to retrieve the most similar contour parts in a database of known contour segments. A shape-based classification of the retrieved contour parts performs then a simultaneous detection and recognition.An important step in our approach is the construction of the database of known contour segments. First complete contours of known objects are decomposed into parts using discrete curve evolution. Then, their representation is constructed that is invariant to scaling, rotation, and translation.  相似文献   

16.
We introduce a segmentation-based detection and top-down figure-ground delineation algorithm. Unlike common methods which use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation. Our algorithm receives as input an image, along with its bottom-up hierarchical segmentation. The shape of each segment is then described both by its significant boundary sections and by regional, dense orientation information derived from the segment’s shape using the Poisson equation. Our method then examines multiple, overlapping segmentation hypotheses, using their shape and color, in an attempt to find a “coherent whole,” i.e., a collection of segments that consistently vote for an object at a single location in the image. Once an object is detected, we propose a novel pixel-level top-down figure-ground segmentation by “competitive coverage” process to accurately delineate the boundaries of the object. In this process, given a particular detection hypothesis, we let the voting segments compete for interpreting (covering) each of the semantic parts of an object. Incorporating competition in the process allows us to resolve ambiguities that arise when two different regions are matched to the same object part and to discard nearby false regions that participated in the voting process. We provide quantitative and qualitative experimental results on challenging datasets. These experiments demonstrate that our method can accurately detect and segment objects with complex shapes, obtaining results comparable to those of existing state of the art methods. Moreover, our method allows us to simultaneously detect multiple instances of class objects in images and to cope with challenging types of occlusions such as occlusions by a bar of varying size or by another object of the same class, that are difficult to handle with other existing class-specific top-down segmentation methods.  相似文献   

17.
A novel multilevel decision fusion approach is proposed for urban mapping using very-high-resolution (VHR) multi/hyperspectral imagery. The proposed framework consists of three levels: (1) at level I, we first propose a self-dual filter for extracting structural features from the VHR imagery–subsequently, the spectral and structural features are integrated based on a weighted probability fusion; (2) level II extends level I by implementing the spectral–structural fusion in an object-based framework; and (3) at level III, the object-based probabilistic outputs at level II are used to identify unreliable objects, and shape attributes of these unreliable objects are then considered for refinement of classification. At this level, a decision-level object merging is used to improve the initial segmentation, since shape feature extraction is highly dependent on the quality of segmentation. Experiments were conducted on a Hyperspectral Digital Imagery Collection Experiment (HYDICE) DC Mall image and a QuickBird Beijing data set. The results revealed that the proposed approach provided progressively increasing accuracies when the multilevel features were gradually considered in the processing chain.  相似文献   

18.
In this article we present a new appearance-based approach for the classification and the localization of 3-D objects in complex scenes. A main problem for object recognition is that the size and the appearance of the objects in the image vary for 3-D transformations. For this reason, we model the region of the object in the image as well as the object features themselves as functions of these transformations. We integrate the model into a statistical framework, and so we can deal with noise and illumination changes. To handle heterogeneous background and occlusions, we introduce a background model and an assignment function. Thus, the object recognition system becomes robust, and a reliable distinction, which features belong to the object and which to the background, is possible. Experiments on three large data sets that contain rotations orthogonal to the image plane and scaling with together more than 100 000 images show that the approach is well suited for this task.  相似文献   

19.
在视频应用中,运动目标的提取是一个重要的研究课题。为了对运动目标进行更有效的分割,提出了一种从视频序列中自动提取运动目标的空时分割算法。该算法在时域分割中采用基于齐异矢量消除的目标检测方法来获得运动目标的初始模板。通常,该初始模板具有不连续的边界和一些"孔"。为了得到较为完整的目标区域,用具有距离约束的区域生长算法来补偿初始模板。而在空域分割中,分水岭分割则通过考虑全局信息来增强其分割的精确性。然后,精确的运动目标即可通过空时融合模块提取出来。试验结果表明,该空时分割算法是有效的。  相似文献   

20.
We present a method to learn probabilistic object models (POMs) with minimal supervision, which exploit different visual cues and perform tasks such as classification, segmentation, and recognition. We formulate this as a structure induction and learning task and our strategy is to learn and combine elementary POMs that make use of complementary image cues. We describe a novel structure induction procedure, which uses knowledge propagation to enable POMs to provide information to other POMs and “teach them” (which greatly reduces the amount of supervision required for training and speeds up the inference). In particular, we learn a POM-IP defined on Interest Points using weak supervision [1], [2] and use this to train a POM-mask, defined on regional features, which yields a combined POM that performs segmentation/localization. This combined model can be used to train POM-edgelets, defined on edgelets, which gives a full POM with improved performance on classification. We give detailed experimental analysis on large data sets for classification and segmentation with comparison to other methods. Inference takes five seconds while learning takes approximately four hours. In addition, we show that the full POM is invariant to scale and rotation of the object (for learning and inference) and can learn hybrid objects classes (i.e., when there are several objects and the identity of the object in each image is unknown). Finally, we show that POMs can be used to match between different objects of the same category, and hence, enable objects recognition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号