首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
We present an approach to the recognition of complex-shaped objects in cluttered environments based on edge information. We first use example images of a target object in typical environments to train a classifier cascade that determines whether edge pixels in an image belong to an instance of the desired object or the clutter. Presented with a novel image, we use the cascade to discard clutter edge pixels and group the object edge pixels into overall detections of the object. The features used for the edge pixel classification are localized, sparse edge density operations. Experiments validate the effectiveness of the technique for recognition of a set of complex objects in a variety of cluttered indoor scenes under arbitrary out-of-image-plane rotation. Furthermore, our experiments suggest that the technique is robust to variations between training and testing environments and is efficient at runtime.  相似文献   

3.
This paper introduces anew free-form surface representation scheme for the purpose of fast and accurate registration and matching. Accurate registration of surfaces is a common task in computer vision. The proposed representation scheme captures the surface curvature information (seen from certain points) and produces images, called "surface signatures," at these points. Matching signatures of different surfaces enables the recovery of the transformation parameters between these surfaces. We propose using template matching to compare the signature images. To enable partial matching, another criterion, the overlap ratio is used. This representation scheme can be used as a global representation of the surface as well as a local one and performs near real-time registration. We show that the signature representation can be used to recover scaling transformation as well as matching objects in 3D scenes in the presence of clutter and occlusion. Applications presented include: free-form object matching, multimodal medical volumes registration, and dental teeth reconstruction from intraoral images.  相似文献   

4.
5.
Viewpoint independent recognition of free-form objects and their segmentation in the presence of clutter and occlusions is a challenging task. We present a novel 3D model-based algorithm which performs this task automatically and efficiently. A 3D model of an object is automatically constructed offline from its multiple unordered range images (views). These views are converted into multidimensional table representations (which we refer to as tensors). Correspondences are automatically established between these views by simultaneously matching the tensors of a view with those of the remaining views using a hash table-based voting scheme. This results in a graph of relative transformations used to register the views before they are integrated into a seamless 3D model. These models and their tensor representations constitute the model library. During online recognition, a tensor from the scene is simultaneously matched with those in the library by casting votes. Similarity measures are calculated for the model tensors which receive the most votes. The model with the highest similarity is transformed to the scene and, if it aligns accurately with an object in the scene, that object is declared as recognized and is segmented. This process is repeated until the scene is completely segmented. Experiments were performed on real and synthetic data comprised of 55 models and 610 scenes and an overall recognition rate of 95 percent was achieved. Comparison with the spin images revealed that our algorithm is superior in terms of recognition rate and efficiency.  相似文献   

6.
Traditional approaches to three dimensional object recognition exploit the relationship between three dimensional object geometry and two dimensional image geometry. The capability of object recognition systems can be improved by also incorporating information about the color of object surfaces. Using physical models for image formation, the authors derive invariants of local color pixel distributions that are independent of viewpoint and the configuration, intensity, and spectral content of the scene illumination. These invariants capture information about the distribution of spectral reflectance which is intrinsic to a surface and thereby provide substantial discriminatory power for identifying a wide range of surfaces including many textured surfaces. These invariants can be computed efficiently from color image regions without requiring any form of segmentation. The authors have implemented an object recognition system that indexes into a database of models using the invariants and that uses associated geometric information for hypothesis verification and pose estimation. The approach to recognition is based on the computation of local invariants and is therefore relatively insensitive to occlusion. The authors present several examples demonstrating the system's ability to recognize model objects in cluttered scenes independent of object configuration and scene illumination. The discriminatory power of the invariants has been demonstrated by the system's ability to process a large set of regions over complex scenes without generating false hypotheses  相似文献   

7.
8.
3D local shapes are a critical cue for object recognition in 3D point clouds. This paper presents an instance-based 3D object recognition method via informative and discriminative shape primitives. We propose a shape primitive model that measures geometrical informativity and discriminativity of 3D local shapes of an object. Discriminative shape primitives of the object are extracted automatically by model parameter optimization. We achieve object recognition from 2.5/3D scenes via shape primitive classification and recover the 3D poses of the identified objects simultaneously. The effectiveness and the robustness of the proposed method were verified on popular instance-based 3D object recognition datasets. The experimental results show that the proposed method outperforms some existing instance-based 3D object recognition pipelines in the presence of noise, varying resolutions, clutter and occlusion.  相似文献   

9.
The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images include occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we address the problem of incorporating different types of contextual information for robust object categorization in computer vision. We review different ways of using contextual information in the field of object categorization, considering the most common levels of extraction of context and the different levels of contextual interactions. We also examine common machine learning models that integrate context information into object recognition frameworks and discuss scalability, optimizations and possible future approaches.  相似文献   

10.
We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor. This research was supported by EC project VIBES, the Fund for Scientific Research Flanders, and the IST Network of Excellence PASCAL.  相似文献   

11.
This paper concerns the problem of recognition and localization of three-dimensional objects from range data. Most of the previous approaches suffered from one or both of the following shortcomings: (1) They dealt with single object scenes and/or (2) they dealt with polyhedral objects or objects that were approximated as polyhedra. The work in this paper addresses both of these shortcomings. The input scenes are allowed to contain multiple objects with partial occlusion. The objects are not restricted to polyhedra but are allowed to have a piecewise combination of curved surfaces, namely, spherical, cylindrical, and conical surfaces. This restriction on the types of curved surfaces is not unreasonable since most objects encountered in an industrial environment can be thus modeled. This paper shows how the qualitative classification of the surfaces based on the signs of the mean and Gaussian curvature can be used to come up withdihedral feature junctions as features to be used for recognition and localization. Dihedral feature junctions are robust to occlusion, offer a viewpoint independent modeling technique for the object models, do not require elaborate segmentation, and the feature extraction process is amenable to parallelism. Hough clustering on account of its ease of parallelization is chosen as the constraint propagation/ satisfaction mechanisms. Experimental results are presented using the Connection Machine. The fine-grained architecture of the Connection Machine is shown to be well suited for the recognition/localization technique presented in this paper.  相似文献   

12.
Even though visual attention models using bottom-up saliency can speed up object recognition by predicting object locations, in the presence of multiple salient objects, saliency alone cannot discern target objects from the clutter in a scene. Using a metric named familiarity, we propose a top-down method for guiding attention towards target objects, in addition to bottom-up saliency. To demonstrate the effectiveness of familiarity, the unified visual attention model (UVAM) which combines top-down familiarity and bottom-up saliency is applied to SIFT based object recognition. The UVAM is tested on 3600 artificially generated images containing COIL-100 objects with varying amounts of clutter, and on 126 images of real scenes. The recognition times are reduced by 2.7× and 2×, respectively, with no reduction in recognition accuracy, demonstrating the effectiveness and robustness of the familiarity based UVAM.  相似文献   

13.
Most existing 2D object recognition algorithms are not perspective (or projective) invariant, and hence are not suitable for many real-world applications. By contrast, one of the primary goals of this research is to develop a flat object matching system that can identify and localise an object, even when seen from different viewpoints in 3D space. In addition, we also strive to achieve good scale invariance and robustness against partial occlusion as in any practical 2D object recognition system. The proposed system uses multi-view model representations and objects are recognised by self-organised dynamic link matching. The merit of this approach is that it offers a compact framework for concurrent assessments of multiple match hypotheses by promoting competitions and/or co-operations among several local mappings of model and test image feature correspondences. Our experiments show that the system is very successful in recognising object to perspective distortion, even in rather cluttered scenes. Receiveed: 29 May 1998?,Received in revised form: 12 October 1998?Accepted: 26 October 1998  相似文献   

14.
Bir  Yingqiang 《Pattern recognition》2003,36(12):2855-2873
Recognition of occluded objects in synthetic aperture radar (SAR) images is a significant problem for automatic target recognition. Stochastic models provide some attractive features for pattern matching and recognition under partial occlusion and noise. In this paper, we present a hidden Markov modeling based approach for recognizing objects in SAR images. We identify the peculiar characteristics of SAR sensors and using these characteristics we develop feature based multiple models for a given SAR image of an object. The models exploiting the relative geometry of feature locations or the amplitude of SAR radar return are based on sequentialization of scattering centers extracted from SAR images. In order to improve performance we integrate these models synergistically using their probabilistic estimates for recognition of a particular target at a specific azimuth. Experimental results are presented using both synthetic and real SAR images.  相似文献   

15.
Distinctive Image Features from Scale-Invariant Keypoints   总被引:517,自引:6,他引:517  
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.  相似文献   

16.
Mel BW  Fiser J 《Neural computation》2000,12(4):731-762
We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errors-Malsburg's classical binding problem. We begin by deriving an analytical model that makes explicit how recognition performance is affected by the number of objects that must be distinguished, the number of features included in the representation, the complexity of individual objects, and the clutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of pose, and without explicit segmentation. Using the domain of text to model object recognition in cluttered scenes, we show that with corrections for the nonuniform probability and nonindependence of text features, the analytical model achieves good fits to measured recognition rates in simulations involving a wide range of clutter loads, word size, and feature counts. We then introduce a greedy algorithm for feature learning, derived from the analytical model, which grows a representation by choosing those conjunctive features that are most likely to distinguish objects from the cluttered backgrounds in which they are embedded. We show that the representations produced by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several insights regarding the properties of visual representations optimized for specific recognition tasks.  相似文献   

17.
Mel B  Fiser J 《Neural computation》2000,12(2):247-278
We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errors-Malsburg's classical binding problem. We begin by deriving an analytical model that makes explicit how recognition performance is affected by the number of objects that must be distinguished, the number of features included in the representation, the complexity of individual objects, and the clutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of pose, and without explicit segmentation. Using the domain of text to model object recognition in cluttered scenes, we show that with corrections for the nonuniform probability and nonindependence of text features, the analytical model achieves good fits to measured recognition rates in simulations involving a wide range of clutter loads, word size, and feature counts. We then introduce a greedy algorithm for feature learning, derived from the analytical model, which grows a representation by choosing those conjunctive features that are most likely to distinguish objects from the cluttered backgrounds in which they are embedded. We show that the representations produced by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several insights regarding the properties of visual representations optimized for specific recognition tasks.  相似文献   

18.
In this paper, we discuss an appearance-matching approach to the difficult problem of interpreting color scenes containing occluded objects. We have explored the use of an iterative, coarse-to-fine sum-squared-error method that uses information from hypothesized occlusion events to perform run-time modification of scene-to-template similarity measures. These adjustments are performed by using a binary mask to adaptively exclude regions of the template image from the squared-error computation. At each iteration higher resolution scene data as well as information derived from the occluding interactions between multiple object hypotheses are used to adjust these masks. We present results which demonstrate that such a technique is reasonably robust over a large database of color test scenes containing objects at a variety of scales, and tolerates minor 3D object rotations and global illumination variations. Received: 21 November 1996 / Accepted: 14 October 1997  相似文献   

19.
在复杂的自然场景中,目标识别存在背景干扰、周围物体遮挡和光照变化等问题,同时识别的目标大多拥有多种不同的尺寸和类型.针对上述目标识别存在的问题,本文提出了一种基于改进YOLOv3的非限制自然场景中中等或较大尺寸的目标识别方法 (简称CDSP-YOLO).该方法采用CLAHE图像增强预处理方法来消除自然场景中光照变化对目标识别效果的影响,并使用随机空间采样池化(S3Pool)作为特征提取网络的下采样方法来保留特征图的空间信息解决复杂环境中的背景干扰问题,而且对多尺度识别进行改进来解决YOLOv3对于中等或较大尺寸目标识别效果不佳的问题.实验结果表明:本文提出的方法在移动通信铁塔测试集上的准确率达97%,召回率达80%.与YOLOv3相比,该方法在非限制自然场景中的目标识别应用上具有更好的性能和推广应用前景.  相似文献   

20.
In an earlier study it was shown that the low level image segmentation technique known as binary object forest (BOF) analysis could be successfully used to extract one or two moving objects from complex backgrounds, even when the motion involved was very large. The method involved performing BOF analysis on each of a pair of images from a sequence and then matching the vertices of the resulting graphs. In the present study the problem of tracking multiple objects in complex backgrounds and in difficult circumstances such as partial occlusion, is considered. The approach taken is once again to perform an initial BOF analysis of each image but now to attempt matching over subgraphs of the BOF rather than simply on individual vertices. It is shown theoretically and experimentally that this results in a much more robust matching scheme. This increase in robustness not only allows multiple objects to be tracked but facilitates correct matching even when partial object occlusion occurs and when motion towards the sensor results in large (apparent) size changes between frames.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号