首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Visual learning and recognition of 3-d objects from appearance   总被引:33,自引:9,他引:24  
The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image.A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.  相似文献   

2.
二维人体姿态估计作为人体动作识别的基础,随着深度学习和神经网络的流行已经成为备受学者关注的研究热点.与传统方法相比,深度学习能够得到更深层图像特征,对数据的表达更准确,因此已成为研究的主流方向.本文主要介绍了二维人体姿态估计算法,首先根据检测人数分为单人姿态估计与多人姿态估计两类,其次对单人姿态估计分为基于坐标回归与基于热图检测的方法;对多人姿态估计可分为自顶向下(top-down)和自底向上(bottom-up)的方法.最后介绍了姿态估计常用数据集以及评价指标对部分多人姿态估计算法的性能指标进行了对比,并对人体姿态估计研究所面临的问题与发展趋势进行了阐述.  相似文献   

3.
4.
Closed-loop object recognition using reinforcement learning   总被引:1,自引:0,他引:1  
Current computer vision systems whose basic methodology is open-loop or filter type typically use image segmentation followed by object recognition algorithms. These systems are not robust for most real-world applications. In contrast, the system presented here achieves robust performance by using reinforcement learning to induce a mapping from input images to corresponding segmentation parameters. This is accomplished by using the confidence level of model matching as a reinforcement signal for a team of learning automata to search for segmentation parameters during training. The use of the recognition algorithm as part of the evaluation function for image segmentation gives rise to significant improvement of the system performance by automatic generation of recognition strategies. The system is verified through experiments on sequences of indoor and outdoor color images with varying external conditions  相似文献   

5.
A major drawback of statistical models of non-rigid, deformable objects, such as the active appearance model (AAM), is the required pseudo-dense annotation of landmark points for every training image. We propose a regression-based approach for automatic annotation of face images at arbitrary pose and expression, and for deformable model building using only the annotated frontal images. We pose the problem of learning the pattern of manual annotation as a data-driven regression problem and explore several regression strategies to effectively predict the spatial arrangement of the landmark points for unseen face images, with arbitrary expression, at arbitrary poses. We show that the proposed fully sparse non-linear regression approach outperforms other regression strategies by effectively modelling the changes in the shape of the face under varying pose and is capable of capturing the subtleties of different facial expressions at the same time, thus, ensuring the high quality of the generated synthetic images. We show the generalisability of the proposed approach by automatically annotating the face images from four different databases and verifying the results by comparing them with a ground truth obtained from manual annotations.  相似文献   

6.
7.
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.  相似文献   

8.
In this article we present a new appearance-based approach for the classification and the localization of 3-D objects in complex scenes. A main problem for object recognition is that the size and the appearance of the objects in the image vary for 3-D transformations. For this reason, we model the region of the object in the image as well as the object features themselves as functions of these transformations. We integrate the model into a statistical framework, and so we can deal with noise and illumination changes. To handle heterogeneous background and occlusions, we introduce a background model and an assignment function. Thus, the object recognition system becomes robust, and a reliable distinction, which features belong to the object and which to the background, is possible. Experiments on three large data sets that contain rotations orthogonal to the image plane and scaling with together more than 100 000 images show that the approach is well suited for this task.  相似文献   

9.
10.
This paper addresses learning and recognition of human behavior models from multimodal observation in a smart home environment. The proposed approach is part of a framework for acquiring a high-level contextual model for human behavior in an augmented environment. A 3-D video tracking system creates and tracks entities (persons) in the scene. Further, a speech activity detector analyzes audio streams coming from head set microphones and determines for each entity, whether the entity speaks or not. An ambient sound detector detects noises in the environment. An individual role detector derives basic activity like ldquowalkingrdquo or ldquointeracting with tablerdquo from the extracted entity properties of the 3-D tracker. From the derived multimodal observations, different situations like ldquoaperitifrdquo or ldquopresentationrdquo are learned and detected using statistical models (HMMs). The objective of the proposed general framework is two-fold: the automatic offline analysis of human behavior recordings and the online detection of learned human behavior models. To evaluate the proposed approach, several multimodal recordings showing different situations have been conducted. The obtained results, in particular for offline analysis, are very good, showing that multimodality as well as multiperson observation generation are beneficial for situation recognition.  相似文献   

11.
12.
Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.  相似文献   

13.
Paper introduces a 3-D shape representation scheme for automatic face analysis and identification, and demonstrates its invariance to facial expression. The core of this scheme lies on the combination of statistical shape modelling and non-rigid deformation matching. While the former matches 3-D faces with facial expression, the latter provides a low-dimensional feature vector that controls the deformation of model for matching the shape of new input, thereby enabling robust identification of 3-D faces. The proposed scheme is also able to handle the pose variation without large part of missing data. To assist the establishment of dense point correspondences, a modified free-form-deformation based on B-spline warping is applied with the help of extracted landmarks. The hybrid iterative closest point method is introduced for matching the models and new data. The feasibility and effectiveness of the proposed method was investigated using standard publicly available Gavab and BU-3DFE datasets, which contain faces with expression and pose changes. The performance of the system was compared with that of nine benchmark approaches. The experimental results demonstrate that the proposed scheme provides a competitive solution for face recognition.  相似文献   

14.
Matching of images and analysis of shape differences is traditionally pursued by energy minimization of paths of deformations acting to match the shape objects. In the large deformation diffeomorphic metric mapping (LDDMM) framework, iterative gradient descents on the matching functional lead to matching algorithms informally known as Beg algorithms. When stochasticity is introduced to model stochastic variability of shapes and to provide more realistic models of observed shape data, the corresponding matching problem can be solved with a stochastic Beg algorithm, similar to the finite-temperature string method used in rare event sampling. In this paper, we apply a stochastic model compatible with the geometry of the LDDMM framework to obtain a stochastic model of images and we derive the stochastic version of the Beg algorithm which we compare with the string method and an expectation-maximization optimization of posterior likelihoods. The algorithm and its use for statistical inference is tested on stochastic LDDMM landmarks and images.  相似文献   

15.
We present a new system for the automatic determination of the position, size and pose of the head of a human figure in a camera image. The system is an extension of the well-known face recognition system [15] to pose estimation. The pose estimation system is characterized by a certain reliability and speed. We improve this performance and speed with the help of statistical estimation methods. In order to make these applicable, we reduce the originally very high dimensionality of our system with the help of a number of a priori principles. We discuss a possible extension of the learning algorithm aiming an autonomous object recognition system at the end of the paper.  相似文献   

16.
The problem of determining the identity and pose of occluded objects from noisy data is examined. Previous work has shown that local measurements of the position and surface orientation of small patches of an object's surface may be used in a constrained search process to solve this problem, for the case of rigid polygonal objects using 2-D sensory data, or rigid polyhedral objects using 3-D data. The recognition system is extended to recognize and locate curved objects. The extension is done in two dimensions, and applies to the recognition of 2-D objects from 2-D data, or to the recognition of the 3-D objects in stable positions from 2-D data  相似文献   

17.
Cartesian moments are frequently used global geometrical features in computer vision for object pose estimation and recognition. We derive a closed form expression for 3-D Cartesian moment of order p+q+r of a superellipsoid in its canonical coordinate system. We also show how 3-D Cartesian moment of a globally deformed superellipsoid in general position and orientation can be computed as a linear combination of 3-D Cartesian moments of the corresponding nondeformed superellipsoid in canonical coordinate system. Additionally, moments of objects that are compositions of superellipsoids can be computed as simple sums of moments of individual parts. To demonstrate practical application of the derived results we register pairs of range images based on moments of recovered compositions of superellipsoids. We use a standard technique to find centers of gravity and principal axes in pairs of range images while third-order moments are used to resolve the four-way ambiguity. Experimental results show expected improvement of recovered rigid transformation based on moments of recovered superellipsoids as compared to the registration based on moments of raw range image data. Besides object pose estimation the presented results can be directly used for object recognition with moments and/or moment invariants as object features.  相似文献   

18.
Due to the loss of range information, projections as input data for a 3-D object recognition algorithm are expected to increase the computational complexity. In this work, however, we demonstrate that this deficiency carries potential for complexity reduction of major vision problems. We show that projections provide a reduction of feature dimensions, and lead to structures exhibiting simple combinatorial properties. The theoretical framework is embedded in a probabilistic setting which deals with uncertainties and variations of observed features. In statistics marginal densities and the assumption of independency prove to be the key tools when one encounters projections. The examples discussed in this paper include feature matching, pose estimation as well as classification of 3-D objects. The final experimental evaluation demonstrates the practical importance of the marginalization concept and independency assumptions.  相似文献   

19.
This paper presents a mirror morphing scheme to deal with the challenging pose variation problem in car model recognition. Conventionally, researchers adopt pose estimation techniques to overcome the pose problem, whereas it is difficult to obtain very accurate pose estimation. Moreover, slight deviation in pose estimation degrades the recognition performance dramatically. The mirror morphing technique utilizes the symmetric property of cars to normalize car images of any orientation into a typical view. Therefore, the pose error and center bias can be eliminated and satisfactory recognition performance can be obtained. To support mirror morphing, active shape model (ASM) is used to acquire car shape information. An effective pose and center estimation approach is also proposed to provide a good initialization for ASM. In experiments, our proposed car model recognition system can achieve very high recognition rate (>95%) with very low probability of false alarm even when it is dealing with the severe pose problem in the cases of cars with similar shape and color.  相似文献   

20.
This paper proposes a statistical background modeling framework to deal with the issue of target detection, where the global and local information is utilized to achieve more accurate detection of moving objects. Specifically, for the target detection problem under illumination change conditions, a novel self-adaptive Gaussian mixture model mixed with the global information is utilized to construct a statistical background model to detect moving objects; for the target detection problem under dynamic background conditions, the self-tuning spectral clustering method is first utilized to cluster background images, and then the kernel density estimation method mixed with the local information is utilized to construct a statistical background model to detect moving objects. Experimental results demonstrate that the proposed framework can improve the detection performance under illumination change conditions or dynamic background conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号