首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multimedia Tools and Applications - Few-shot learning aims to train classifiers to learn new visual object categories from few training examples. Recently, metric-learning based methods have made...  相似文献   

2.
Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manually annotated with masks or bounding boxes. The reliance on time-consuming human labeling effectively limits the application of these methods to problems involving very few categories. Furthermore, the human selection of the masks introduces arbitrary biases (e.g., in terms of window size and location) which may be suboptimal for classification. We propose a novel method for learning a discriminative subwindow classifier from examples annotated with binary labels indicating the presence of an object or action of interest, but not its location. During training, our approach simultaneously localizes the instances of the positive class and learns a subwindow SVM to recognize them. We extend our method to classification of time series by presenting an algorithm that localizes the most discriminative set of temporal segments in the signal. We evaluate our approach on several datasets for object and action recognition and show that it achieves results similar and in many cases superior to those obtained with full supervision.  相似文献   

3.
One-shot learning of object categories   总被引:6,自引:0,他引:6  
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.  相似文献   

4.
We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.  相似文献   

5.
Grasping robotic hands is classified into three categories based on the object connectivity. We decompose the space of contact forces into four subspaces and develop a method to determine the dimensions of the subspaces with respect to the connectivity of the grasped object. The relationships we obtain reveal the kinematic and static characteristics of three categories of grasps. It indicates how contact forces can be decomposed corresponding to each type of grasp. The technique also provides a guideline for determining the distribution of contact forces on grasped objects. We analyze how power grasps are identified from the object connectivity and used to synthesize hand configurations for grasping and manipulation tasks. A physical interpretation of the subspaces and the determination of their dimensions are illustrated by examples.  相似文献   

6.
Robust object recognition with cortex-like mechanisms   总被引:9,自引:0,他引:9  
We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex  相似文献   

7.
There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on data sets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new data set with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typical scenes in a data set.  相似文献   

8.
This paper addresses the automatic construction of complex spline object models from a few photographs. Our approach combines silhouettes from registered images to construct a G1-continuous triangular spline approximation of an object with unknown topology. We apply a similar optimization procedure to estimate the pose of a modeled object from a single image. Experimental examples of model construction and pose estimation are presented for several complex objects  相似文献   

9.
基于Mean-shift的粘连人体目标分割算法   总被引:1,自引:0,他引:1  
人体目标分割是人体目标视觉分析的关键问题之一。提出了一种基于Mean-shift的粘连人体目标分割算法。首先对视频图像进行预处理,从中分离出运动区域,根据人体外形的统计特征建立人体目标模板。在运动区域中均匀取若干个数据点作为种子点。从种子点出发,基于人体目标模板,应用Mean-shift算法不断迭代逼近模态点。对取得的模态点集合进行聚类,从而自动确定分类数,即运动区域中的人体目标数,并进行合理分割。基于PETS 2006数据库的试验验证了该方法的可行性。  相似文献   

10.
《Advanced Robotics》2013,27(10):1143-1154
The acquisition of object categories which underlie the human lexicon is a prerequisite for domestic robots to communicate with users in a human-like manner. The theory of J. J. Gibson inspires the approach to obtain shared categories through interaction with the shared environment, where explorative behaviors of infants play the role of obtaining distinctive features of objects to shape their categories. Although several existing studies have reproduced the exploratory behaviors of infants by robots to investigate their roles in acquiring such categories, those active categorization methods utilized static touches and the recognition tended to fail by changes of contact conditions. This paper introduces another possible approach to object categorization — object category acquisition by dynamic touch. Dynamic touch (e.g., shaking) provides the agent with the information of the whole object to enable quick and robust recognition. The amplitude spectrum of auditory data which humans obtain during shaking is found to be an effective feature for identifying the object categories of differing dynamics, e.g., rigid objects, paper materials and bottles of water, even though the objects within each category vary in size, shape, amount and contact conditions. Experimental results are given to show the validity of the proposed method and future issues are discussed.  相似文献   

11.
Part decomposition and conversely, the construction of composite objects from individual parts have long been recognized as ubiquitous and essential mechanisms involving abstraction. This applies, in particular, in areas such as CAD, manufacturing, software development and computer graphics. Although the part-of relationship is distinguished in object oriented modeling techniques, it ranks far behind the concept of generalization/specialization and a rigorous definition of its semantics is still missing. We first show in which ways a shift in emphasis on the part-of relationship leads to analysis and design models that are easier to understand and to maintain. We then investigate the properties of part-of relationships in order to define their semantics. This is achieved by means of a categorization of part-of relationships and by associating semantic constraints with individual categories. We further suggest a precise and, compared with existing techniques, less redundant specification of constraints accompanying part-of categories based on the degree of exclusiveness and dependence of parts on composite objects. Although the approach appears generally applicable, the object oriented Unified Modeling Language (UMF) is used to present our findings. Several examples demonstrate the applicability of the categories introduced  相似文献   

12.
The goal of an object category discovery system is to annotate a pool of unlabeled image data, where the set of labels is initially unknown to the system, and must therefore be discovered over time by querying a human annotator. The annotated data is then used to train object detectors in a standard supervised learning setting, possibly in conjunction with category discovery itself. Category discovery systems can be evaluated in terms of both accuracy of the resulting object detectors, and the efficiency with which they discover categories and annotate the training data. To improve the accuracy and efficiency of category discovery, we propose an iterative framework which alternates between optimizing nearest neighbor classification for known categories with multiple kernel metric learning, and detecting clusters of unlabeled image regions likely to belong to a novel, unknown categories. Experimental results on the MSRC and PASCAL VOC2007 data sets show that the proposed method improves clustering for category discovery, and efficiently annotates image regions belonging to the discovered classes.  相似文献   

13.
From an early stage in their development, human infants show a profound drive to explore the objects around them. Research in psychology has shown that this exploration is fundamental for learning the names of objects and object categories. To address this problem in robotics, this paper presents a behavior-grounded approach that enables a robot to recognize the semantic labels of objects using its own behavioral interaction with them. To test this method, our robot interacted with 100 different objects grouped according to 20 different object categories. The robot performed 10 different behaviors on them, while using three sensory modalities (vision, proprioception and audio) to detect any perceptual changes. The results show that the robot was able to use multiple sensorimotor contexts in order to recognize a large number of object categories. Furthermore, the category recognition model presented in this paper was able to identify sensorimotor contexts that can be used to detect specific categories. Most importantly, the robot’s model was able to reduce exploration time by half by dynamically selecting which exploratory behavior should be applied next when classifying a novel object.  相似文献   

14.
Urban object recognition is the ability to categorize ambient objects into several classes and it plays an important role in various urban robotic missions, such as surveillance, rescue, and SLAM. However, there were several difficulties when previous studies on urban object recognition in point clouds were adopted for robotic missions: offline-batch processing, deterministic results in classification, and necessity of many training examples. The aim of this paper is to propose an urban object recognition algorithm for urban robotic missions with useful properties: online processing, classification results with probabilistic outputs, and training with a few examples based on a generative model. To achieve this, the proposed algorithm utilizes the consecutive point information (CPI) of a 2D LIDAR sensor. This additional information was useful for designing an online algorithm consisting of segmentation and classification. Experimental results show that the proposed algorithm using CPI enhances the applicability of urban object recognition for various urban robotic missions.  相似文献   

15.
One of the basic skills for a robot autonomous grasping is to select the appropriate grasping point for an object. Several recent works have shown that it is possible to learn grasping points from different types of features extracted from a single image or from more complex 3D reconstructions. In the context of learning through experience, this is very convenient, since it does not require a full reconstruction of the object and implicitly incorporates kinematic constraints as the hand morphology. These learning strategies usually require a large set of labeled examples which can be expensive to obtain. In this paper, we address the problem of actively learning good grasping points to reduce the number of examples needed by the robot. The proposed algorithm computes the probability of successfully grasping an object at a given location represented by a feature vector. By autonomously exploring different feature values on different objects, the systems learn where to grasp each of the objects. The algorithm combines beta–binomial distributions and a non-parametric kernel approach to provide the full distribution for the probability of grasping. This information allows to perform an active exploration that efficiently learns good grasping points even among different objects. We tested our algorithm using a real humanoid robot that acquired the examples by experimenting directly on the objects and, therefore, it deals better with complex (anthropomorphic) hand–object interactions whose results are difficult to model, or predict. The results show a smooth generalization even in the presence of very few data as is often the case in learning through experience.  相似文献   

16.
The discriminative power of a feature has an impact on the convergence rate in training and running speed in evaluating an object detector. In this paper, a novel distribution-based discriminative feature is proposed to distinguish objects of rigid object categories from background. It fully makes use of the advantage of local binary pattern (LBP) that specializes in encoding local structures and statistic information of distribution from training data, which is utilized in getting optimal separating hyperplane. The proposed feature maintains the merit of simplicity in calculation and powerful discriminative ability to distinguish objects from background patches. Three LBP-based features are derived to adaptive projection ones, which are more discriminative than original versions. The asymmetric Gentle Adaboost organized in nested cascade structure constructs the final detector. The proposed features are evaluated on two different object categories: frontal human faces and side-view cars. Experimental results demonstrate that the proposed features are more discriminative than traditional Haarlike features and multi-block LBP (MBLBP) features. Furthermore they are also robust in monotonous variations of illumination.  相似文献   

17.
We present a system architecture for domestic robots that allows them to learn object categories after one sample object was initially learned. We explore the situation in which a human teaches a robot a novel object, and the robot enhances such learning by using a large amount of image data from the Internet. The main goal of this research is to provide a robot with capabilities to enhance its learning while minimizing time and effort required for a human to train a robot. Our active learning approach consists of learning the object name using speech interface, and creating a visual object model by using a depth-based attention model adapted to the robot’s personal space. Given the object’s name (keyword), a large amount of object-related images from two main image sources (Google Images and the LabelMe website) are collected. We deal with the problem of separating good training samples from noisy images by performing two steps: (1) Similar image selection using a Simile Selector Classifier, and (2) non-real image filtering by implementing a variant of Gaussian Discriminant Analysis. After web image selection, object category classifiers are then trained and tested using different objects of the same category. Our experiments demonstrate the effectiveness of our robot learning approach.  相似文献   

18.
19.
Robust Object Detection with Interleaved Categorization and Segmentation   总被引:5,自引:0,他引:5  
This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.  相似文献   

20.
In the framework of online object retrieval with learning, we address the problem of graph matching using kernel functions. An image is represented by a graph of regions where the edges represent the spatial relationships. Kernels on graphs are built from kernel on walks in the graph. This paper firstly proposes new kernels on graphs and on walks, which are very efficient for graphs of regions. Secondly we propose fast solutions for exact or approximate computation of these kernels. Thirdly we show results for the retrieval of images containing a specific object with the help of very few examples and counter-examples in the framework of an active retrieval scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号