首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于图像中物体之间的空间关系的图像检索往往受困于待处理的图像中物体种类和空间位置难以自动准确地获取。文中基于物体识别算法的输出,提出一种对物体空间关系的三元组表示法,给出基于这种表示方法对图像索引、相似度计算和检索排序的方法及允许用户使用查询词和空间关系表达查询需求的二维输入界面,并实现原型系统。这种表示法具有良好的鲁棒性,可容忍物体识别算法一定程度的误差,将物体识别得到的置信度加入三元组表示法置信度计算和排序算法中,减少物体识别结果误差对检索性能的影响。在原型系统上的实验表明,该系统在实验中对包含物体位置关系的检索给出更准确的结果,在NDCG@m、MAP、F@m上均优于现有系统。  相似文献   

2.
3.
Feature space trajectory methods for active computer vision   总被引:2,自引:0,他引:2  
We advance new active object recognition algorithms that classify rigid objects and estimate their pose from intensity images. Our algorithms automatically detect if the class or pose of an object is ambiguous in a given image, reposition the sensor as needed, and incorporate data from multiple object views in determining the final object class and pose estimate. A probabilistic feature space trajectory (FST) in a global eigenspace is used to represent 3D distorted views of an object and to estimate the class and pose of an input object. Confidence measures for the class and pose estimates, derived using the probabilistic FST object representation, determine when additional observations are required as well as where the sensor should be positioned to provide the most useful information. We demonstrate the ability to use FSTs constructed from images rendered from computer-aided design models to recognize real objects in real images and present test results for a set of metal machined parts.  相似文献   

4.
提出一种基于组件词表的物体识别方法,通过AdaBoost从物体样本图像的组件中选取一些最具区分性的组件,构成组件词表。每幅图像都用词表中的组件来表征,在此基础上用稀疏神经网络来训练分类器。实验结果表明,该方法识别精度较高,对于遮挡和复杂背景有较强的鲁棒性。  相似文献   

5.
Visual learning and recognition of 3-d objects from appearance   总被引:33,自引:9,他引:24  
The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image.A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.  相似文献   

6.
Robust Object Detection with Interleaved Categorization and Segmentation   总被引:5,自引:0,他引:5  
This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.  相似文献   

7.
We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.  相似文献   

8.
刚性目标轮廓具有明显几何特性且不易受光照、纹理和颜色等因素影响.结合上述特性和图像稀疏表示原理,提出一种适用于刚性目标的分级检测算法.在基于部件模型(Part-based model, PBM)的框架下,采用匹配追踪算法将目标轮廓自适应地稀疏表示为几何部件的组合,根据部件与目标轮廓的匹配度,构建描述部件空间关系的有序链式结构.利用该链式结构的有序特性逐级缩小待检测范围,以匹配度为权值对各级部件显著图进行加权融合生成目标显著图. PASCAL图像库上的检测结果表明,该检测方法对具有显著轮廓特征的刚性目标有较好的检测结果,检测时耗较现有算法减少约60%~90%.  相似文献   

9.
10.
As an important problem in image understanding, salient object detection is essential for image classification, object recognition, as well as image retrieval. In this paper, we propose a new approach to detect salient objects from an image by using content-sensitive hypergraph representation and partitioning. Firstly, a polygonal potential Region-Of-Interest (p-ROI) is extracted through analyzing the edge distribution in an image. Secondly, the image is represented by a content-sensitive hypergraph. Instead of using fixed features and parameters for all the images, we propose a new content-sensitive method for feature selection and hypergraph construction. In this method, the most discriminant color channel which maximizes the difference between p-ROI and the background is selected for each image. Also the number of neighbors in hyperedges is adjusted automatically according to the image content. Finally, an incremental hypergraph partitioning is utilized to generate the candidate regions for the final salient object detection, in which all the candidate regions are evaluated by p-ROI and the best match one will be the selected as final salient object. Our approach has been extensively evaluated on a large benchmark image database. Experimental results show that our approach can not only achieve considerable improvement in terms of commonly adopted performance measures in salient object detection, but also provide more precise object boundaries which is desirable for further image processing and understanding.  相似文献   

11.
We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects.  相似文献   

12.
Collision Detection between Robot Arms and People   总被引:1,自引:0,他引:1  
As the result of an increasing number of robots performing tasks in a range of human life activites, human–robot interaction has become a very active research field. Safety of people around robots is a major concern. This paper presents some research in this context: our aim is to avoid mechanical injure of people interacting with robots. We approach the collision detection problem in a scene with people and several moving robot arms. Fast collision detection for practical motion planning depends on an adequate spatial representation for the objects involved in the scene. The authors have previosly proposed a system that automatically generates a hierarchy of approximations for general objects. The spatial model has interesting properties and has been used in efficient collision detection algorithms between moving robots [8]. In spatial representations, there is a trade-off between generality and efficiency. Some existing approaches claim to be general but they are less efficient. In this paper, we present two extensions to the spatial model. First, the system can deal with a general class of objects, those that are composed of nonhomogeneous generalized cylinders. Secondly, a simple method for automatic converting from a polyhedral representation to such a generalized cylinder is presented. Therefore, we enhance the generality of the system but without compromising the efficiency. With these extensions virtually any object can be dealt with, and particularly those composing the human body.  相似文献   

13.
Due to distortion, noise, segmentation errors, overlap, and occlusion of objects in digital images, it is usually impossible to extract complete object contours or to segment the whole objects. However, in many cases parts of contours can be correctly reconstructed either by performing edge grouping or as parts of boundaries of segmented regions. Therefore, recognition of objects based on their contour parts seems to be a promising as well as a necessary research direction.The main contribution of this paper is a system for detection and recognition of contour parts in digital images. Both detection and recognition are based on shape similarity of contour parts. For each contour part produced by contour grouping, we use shape similarity to retrieve the most similar contour parts in a database of known contour segments. A shape-based classification of the retrieved contour parts performs then a simultaneous detection and recognition.An important step in our approach is the construction of the database of known contour segments. First complete contours of known objects are decomposed into parts using discrete curve evolution. Then, their representation is constructed that is invariant to scaling, rotation, and translation.  相似文献   

14.
高文 《计算机学报》1996,19(2):110-119
本文提出一种基于符号运算和面向规则线画图象的自动图象理解方法,方法用代数符号表达空间物体,用两个符号串描述物体在特定的投影空间上的相互关系,上方法所构造的代数系统规定了包括微分,腐蚀等等代数操作定义了若干条运算规则。利用这些操作和规则,可以实现对图象中物体的空间位置,运动趋势,自身大小的变化等等的自动分析。  相似文献   

15.
Salient object detection aims to identify both spatial locations and scales of the salient object in an image. However, previous saliency detection methods generally fail in detecting the whole objects, especially when the salient objects are actually composed of heterogeneous parts. In this work, we propose a saliency bias and diffusion method to effectively detect the complete spatial support of salient objects. We first introduce a novel saliency-aware feature to bias the objectness detection for saliency detection on a given image and incorporate the saliency clues explicitly in refining the saliency map. Then, we propose a saliency diffusion method to fuse the saliency confidences of different parts from the same object for discovering the whole salient object, which uses the learned visual similarities among object regions to propagate the saliency values across them. Benefiting from such bias and diffusion strategy, the performance of salient object detection is significantly improved, as shown in the comprehensive experimental evaluations on four benchmark data sets, including MSRA-1000, SOD, SED, and THUS-10000.  相似文献   

16.
17.
现有基于深度学习的显著性检测算法主要针对二维RGB图像设计,未能利用场景图像的三维视觉信息,而当前光场显著性检测方法则多数基于手工设计,特征表示能力不足,导致上述方法在各种挑战性自然场景图像上的检测效果不理想。提出一种基于卷积神经网络的多模态多级特征精炼与融合网络算法,利用光场图像丰富的视觉信息,实现面向四维光场图像的精准显著性检测。为充分挖掘三维视觉信息,设计2个并行的子网络分别处理全聚焦图像和深度图像。在此基础上,构建跨模态特征聚合模块实现对全聚焦图像、焦堆栈序列和深度图3个模态的跨模态多级视觉特征聚合,以更有效地突出场景中的显著性目标对象。在DUTLF-FS和HFUT-Lytro光场基准数据集上进行实验对比,结果表明,该算法在5个权威评估度量指标上均优于MOLF、AFNet、DMRA等主流显著性目标检测算法。  相似文献   

18.
Detecting independent objects in images and videos is an important perceptual grouping problem. One common perceptual grouping cue that can facilitate this objective is the cue of contour closure, reflecting the spatial coherence of objects in the world and their projections as closed boundaries separating figure from background. Detecting contour closure in images consists of finding a cycle of disconnected contour fragments that separates an object from its background. Searching the entire space of possible groupings is intractable, and previous approaches have adopted powerful perceptual grouping heuristics, such as proximity and co-curvilinearity, to constrain the search. We introduce a new formulation of the problem, by transforming the problem of finding cycles of contour fragments to finding subsets of superpixels whose collective boundary has strong edge support (few gaps) in the image. Our cost function, a ratio of a boundary gap measure to area, promotes spatially coherent sets of superpixels. Moreover, its properties support a global optimization procedure based on parametric maxflow. Extending closure detection to videos, we introduce the concept of spatiotemporal closure. Analogous to image closure, we formulate our spatiotemporal closure cost over a graph of spatiotemporal superpixels. Our cost function is a ratio of motion and appearance discontinuity measures on the boundary of the selection to an internal homogeneity measure of the selected spatiotemporal volume. The resulting approach automatically recovers coherent components in images and videos, corresponding to objects, object parts, and objects with surrounding context, providing a good set of multiscale hypotheses for high-level scene analysis. We evaluate both our image and video closure frameworks by comparing them to other closure detection approaches, and find that they yield improved performance.  相似文献   

19.

In this paper we present a novel moment-based skeleton detection for representing human objects in RGB-D videos with animated 3D skeletons. An object often consists of several parts, where each of them can be concisely represented with a skeleton. However, it remains as a challenge to detect the skeletons of individual objects in an image since it requires an effective part detector and a part merging algorithm to group parts into objects. In this paper, we present a novel fully unsupervised learning framework to detect the skeletons of human objects in a RGB-D video. The skeleton modeling algorithm uses a pipeline architecture which consists of a series of cascaded operations, i.e., symmetry patch detection, linear time search of symmetry patch pairs, part and symmetry detection, symmetry graph partitioning, and object segmentation. The properties of geometric moment-based functions for embedding symmetry features into centers of symmetry patches are also investigated in detail. As compared with the state-of-the-art deep learning approaches for skeleton detection, the proposed approach does not require tedious human labeling work on training images to locate the skeleton pixels and their associated scale information. Although our algorithm can detect parts and objects simultaneously, a pre-learned convolution neural network (CNN) can be used to locate the human object from each frame of the input video RGB-D video in order to achieve the goal of constructing real-time applications. This much reduces the complexity to detect the skeleton structure of individual human objects with our proposed method. Using the segmented human object skeleton model, a video surveillance application is constructed to verify the effectiveness of the approach. Experimental results show that the proposed method gives good performance in terms of detection and recognition using publicly available datasets.

  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号