首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Recognition by linear combinations of models   总被引:18,自引:0,他引:18  
An approach to visual object recognition in which a 3D object is represented by the linear combination of 2D images of the object is proposed. It is shown that for objects with sharp edges as well as with smooth bounding contours, the set of possible images of a given object is embedded in a linear space spanned by a small number of views. For objects with sharp edges, the linear combination representation is exact. For objects with smooth boundaries, it is an approximation that often holds over a wide range of viewing angles. Rigid transformations (with or without scaling) can be distinguished from more general linear transformations of the object by testing certain constraints placed on the coefficients of the linear combinations. Three alternative methods of determining the transformation that matches a model to a given image are proposed  相似文献   

2.
This paper and its companion are concerned with the problems of 3-D object recognition and shape estimation from image curves using a 3-D object curve model that is invariant to affine transformation onto the image space, and a binocular stereo imaging system. The objects of interest here are the ones that have markings (e.g., characters, letters, special drawings and symbols, etc.) on their surfaces. The 3-D curves on the object are modeled as B-splines, which are characterized by a set of parameters (the control points) from which the 3-D curve can be totally generated. The B-splines are invariant under affine transformations. That means that the affine projected object curve onto the image space is a B-spline whose control points are related to the object control points through the affine transformation. Part I deals with issues relating to the curve modeling process. In particular, the authors address the problems of estimating the control points from the data curve, and of deciding on the “best” order B-spline and the “best” number of control points to be used to model the image or object curve(s). A minimum mean-square error (mmse) estimation technique which is invariant to affine transformations is presented as a noniterative, simple, and fast approach for control point estimation. The “best” B-spline is decided upon using a Bayesian selection rule. Finally, we present a matching algorithm that allocates a sample curve to one of p prototype curves when the sample curve is an a priori unknown affine transformation of one of the prototype curves stored in the data base. The approach is tried on a variety of images of real objects  相似文献   

3.
Ye Lu  Ze-Nian Li 《Pattern recognition》2008,41(3):1159-1172
A new method of video object extraction is proposed to automatically extract the object of interest from actively acquired videos. Traditional video object extraction techniques often operate under the assumption of homogeneous object motion and extract various parts of the video that are motion consistent as objects. In contrast, the proposed active video object extraction (AVOE) approach assumes that the object of interest is being actively tracked by a non-calibrated camera under general motion and classifies the possible movements of the camera that result in the 2D motion patterns as recovered from the image sequence. Consequently, the AVOE method is able to extract the single object of interest from the active video. We formalize the AVOE process using notions from Gestalt psychology. We define a new Gestalt factor called “shift and hold” and present 2D object extraction algorithms. Moreover, since an active video sequence naturally contains multiple views of the object of interest, we demonstrate that these views can be combined to form a single 3D object regardless of whether the object is static or moving in the video.  相似文献   

4.
5.
《Real》1997,3(6):415-432
Real-time motion capture plays a very important role in various applications, such as 3D interface for virtual reality systems, digital puppetry, and real-time character animation. In this paper we challenge the problem of estimating and recognizing the motion of articulated objects using theoptical motion capturetechnique. In addition, we present an effective method to control the articulated human figure in realtime.The heart of this problem is the estimation of 3D motion and posture of an articulated, volumetric object using feature points from a sequence of multiple perspective views. Under some moderate assumptions such as smooth motion and known initial posture, we develop a model-based technique for the recovery of the 3D location and motion of a rigid object using a variation of Kalman filter. The posture of the 3D volumatric model is updated by the 2D image flow of the feature points for all views. Two novel concepts – the hierarchical Kalman filter (KHF) and the adaptive hierarchical structure (AHS) incorporating the kinematic properties of the articulated object – are proposed to extend our formulation for the rigid object to the articulated one. Our formulation also allows us to avoid two classic problems in 3D tracking: the multi-view correspondence problem, and the occlusion problem. By adding more cameras and placing them appropriately, our approach can deal with the motion of the object in a very wide area. Furthermore, multiple objects can be handled by managing multiple AHSs and processing multiple HKFs.We show the validity of our approach using the synthetic data acquired simultaneously from the multiple virtual camera in a virtual environment (VE) and real data derived from a moving light display with walking motion. The results confirm that the model-based algorithm works well on the tracking of multiple rigid objects.  相似文献   

6.
We consider the problem of reconstructing the 3D coordinates of a moving point seen from a monocular moving camera, i.e., to reconstruct moving objects from line-of-sight measurements only. The task is feasible only when some constraints are placed on the shape of the trajectory of the moving point. We coin the family of such tasks as “trajectory triangulation.” We investigate the solutions for points moving along a straight-line and along conic-section trajectories, We show that if the point is moving along a straight line, then the parameters of the line (and, hence, the 3D position of the point at each time instant) can be uniquely recovered, and by linear methods, from at least five views. For the case of conic-shaped trajectory, we show that generally nine views are sufficient for a unique reconstruction of the moving point and fewer views when the conic is of a known type (like a circle in 3D Euclidean space for which seven views are sufficient). The paradigm of trajectory triangulation, in general, pushes the envelope of processing dynamic scenes forward. Thus static scenes become a particular case of a more general task of reconstructing scenes rich with moving objects (where an object could be a single point)  相似文献   

7.
This paper examines the recognition of rigid objects bounded by smooth surfaces, using an alignment approach. The projected image of such an object changes during rotation in a manner that is generally difficult to predict. An approach to this problem is suggested, using the 3D surface curvature at the points along the silhouette. The curvature information requires a single number for each point along the object′s silhouette, the radial curvature at the point. We have implemented this method and tested it on images of complex 3D objects. Models of the viewed objects were acquired using three images of each object. The implemented scheme was found to give accurate predictions of the objects′ appearances for large transformations. Using this method, a small number of (viewer-centered) models can be used to predict the new appearance of an object from any given viewpoint.  相似文献   

8.
A method for finding analytical solutions to the problem of determining the attitude of a 3D object in space from a single perspective image is presented. Its principle is based on the interpretation of a triplet of any image lines as the perspective projection of a triplet of linear ridges of the object model, and on the search for the model attitude consistent with these projections. The geometrical transformations to be applied to the model to bring it into the corresponding location are obtained by the resolution of an eight-degree equation in the general case. Using simple logical rules, it is shown on examples related to polyhedra that this approach leads to results useful for both location and recognition of 3D objects because few admissible hypotheses are retained from the interpolation of the three line segments. Line matching by the prediction-verification procedure is thus less complex  相似文献   

9.
A new approach is presented for extracting an explicit 3D shape model from a single range image. One novel aspect is that the model represents both observed object surfaces, and surfaces which bound the volume of occluded space. Another novel aspect is that the approach does not require that the range image segmentation be perfect. The low-level segmentation may be such that the model-building process encounters topology versus geometry conflicts. The model-building process is designed to be “fail soft” in the face of such problems. The portion of the 3D model where a problem presents itself is “glued” together in a manner meant to minimize the disturbance in the 3D shape. The goal is to produce a valid boundary-representation which can be processed by higher-level routines. A third novel aspect of this work is that the implementation has been evaluated on over 200 real range images of polyhedral objects, with no operator intervention and all parameters held constant, and obtained a 97% success rate in creating valid b-reps  相似文献   

10.
Genetic object recognition using combinations of views   总被引:1,自引:0,他引:1  
Investigates the application of genetic algorithms (GAs) for recognizing real 2D or 3D objects from 2D intensity images, assuming that the viewpoint is arbitrary. Our approach is model-based (i.e. we assume a pre-defined set of models), while our recognition strategy relies on the theory of algebraic functions of views. According to this theory, the variety of 2D views depicting an object can be expressed as a combination of a small number of 2D views of the object. This implies a simple and powerful strategy for object recognition: novel 2D views of an object (2D or 3D) can be recognized by simply matching them to combinations of known 2D views of the object. In other words, objects in a scene are recognized by "predicting" their appearance through the combination of known views of the objects. This is an important idea, which is also supported by psychophysical findings indicating that the human visual system works in a similar way. The main difficulty in implementing this idea is determining the parameters of the combination of views. This problem can be solved either in the space of feature matches among the views ("image space") or the space of parameters ("transformation space"). In general, both of these spaces are very large, making the search very time-consuming. In this paper, we propose using GAs to search these spaces efficiently. To improve the efficiency of genetic searching in the transformation space, we use singular value decomposition and interval arithmetic to restrict the genetic search to the most feasible regions of the transformation space. The effectiveness of the GA approaches is shown on a set of increasingly complex real scenes where exact and near-exact matches are found reliably and quickly  相似文献   

11.
Model-based recognition of 3D objects from single images   总被引:1,自引:0,他引:1  
In this work, we treat major problems of object recognition which have received relatively little attention lately. Among them are the loss of depth information in the projection from a 3D object to a single 2D image, and the complexity of finding feature correspondences between images. We use geometric invariants to reduce the complexity of these problems. There are no geometric invariants of a projection from 3D to 2D. However, given certain modeling assumptions about the 3D object, such invariants can be found. The modeling assumptions can be either a particular model or a generic assumption about a class of models. Here, we use such assumptions for single-view recognition. We find algebraic relations between the invariants of a 3D model and those of its 2D image under general projective projection. These relations can be described geometrically as invariant models in a 3D invariant space, illuminated by invariant “light rays,” and projected onto an invariant version of the given image. We apply the method to real images  相似文献   

12.
An algorithm is described which rapidly verifies the potential rigidity of three-dimensional point correspondences from a pair of two-dimensional views under perspective projection. The output of the algorithm is a simple yes or no answer to the question “Could these corresponding points from two views be the projection of a rigid configuration?” Potential applications include 3D object recognition from a single previous view and correspondence matching for stereo or motion over widely separated views. The rigidity checking problem is different from the structure-from-motion problem because it is often the case that two views cannot provide an accurate structure-from-motion estimate due to ambiguity and ill conditioning, whereas it is still possible to give an accurate yes/no answer to the rigidity question. Rigidity checking verifies point correspondences using 3D recovery equations as a matching condition. The proposed algorithm improves upon other methods that fall under this approach because it works with as few as six corresponding points under full perspective projection, handles correspondences from widely separated views, makes full use of the disparity of the correspondences, and is integrated with a linear algorithm for 3D recovery due to Kontsevich (1993). Results are given for experiments with synthetic and real image data. A complete implementation of this algorithm is being made publicly available  相似文献   

13.
In this paper, we present a new framework for three-dimensional (3D) reconstruction of multiple rigid objects from dynamic scenes. Conventional 3D reconstruction from multiple views is applicable to static scenes, in which the configuration of objects is fixed while the images are taken. In our framework, we aim to reconstruct the 3D models of multiple objects in a more general setting where the configuration of the objects varies among views. We solve this problem by object-centered decomposition of the dynamic scenes using unsupervised co-recognition approach. Unlike conventional motion segmentation algorithms that require small motion assumption between consecutive views, co-recognition method provides reliable accurate correspondences of a same object among unordered and wide-baseline views. In order to segment each object region, we benefit from the 3D sparse points obtained from the structure-from-motion. These points are reliable and serve as automatic seed points for a seeded-segmentation algorithm. Experiments on various real challenging image sequences demonstrate the effectiveness of our approach, especially in the presence of abrupt independent motions of objects.  相似文献   

14.
Distinctive Image Features from Scale-Invariant Keypoints   总被引:517,自引:6,他引:517  
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.  相似文献   

15.
In this paper, we propose a novel method to characterize graph structures based on complex network model. First, we show that a structural graph can be modeled as a small-world complex network, and, then, Complex Network Characteristics (including topological and dynamic characteristics) Representation of a Graph (CNCRG) is obtained. Based on these characteristics, graph classification/clustering for objects viewed from different directions and characteristic views identification for single objects are investigated on one synthetic image dataset and two real image datasets. Our experimental results showed that CNCRG achieves better object classification/clustering performance and also provides well-structured view spaces based on multi-dimensional scaling (MDS) and principal component analysis (PCA) embedding methods for graphs extracted from 2D views of 3D objects.  相似文献   

16.
The paper addresses the problem of “class-based” image-based recognition and rendering with varying illumination. The rendering problem is defined as follows: Given a single input image of an object and a sample of images with varying illumination conditions of other objects of the same general class, re-render the input image to simulate new illumination conditions. The class-based recognition problem is similarly defined: Given a single image of an object in a database of images of other objects, some of them multiply sampled under varying illumination, identify (match) any novel image of that object under varying illumination with the single image of that object in the database. We focus on Lambertian surface classes and, in particular, the class of human faces. The key result in our approach is based on a definition of an illumination invariant signature image which enables an analytic generation of the image space with varying illumination. We show that a small database of objects-in our experiments as few as two objects-is sufficient for generating the image space with varying illumination of any new object of the class from a single input image of that object. In many cases, the recognition results outperform by far conventional methods and the re-rendering is of remarkable quality considering the size of the database of example images and the mild preprocess required for making the algorithm work  相似文献   

17.
3D object recognition from local features is robust to occlusions and clutter. However, local features must be extracted from a small set of feature rich keypoints to avoid computational complexity and ambiguous features. We present an algorithm for the detection of such keypoints on 3D models and partial views of objects. The keypoints are highly repeatable between partial views of an object and its complete 3D model. We also propose a quality measure to rank the keypoints and select the best ones for extracting local features. Keypoints are identified at locations where a unique local 3D coordinate basis can be derived from the underlying surface in order to extract invariant features. We also propose an automatic scale selection technique for extracting multi-scale and scale invariant features to match objects at different unknown scales. Features are projected to a PCA subspace and matched to find correspondences between a database and query object. Each pair of matching features gives a transformation that aligns the query and database object. These transformations are clustered and the biggest cluster is used to identify the query object. Experiments on a public database revealed that the proposed quality measure relates correctly to the repeatability of keypoints and the multi-scale features have a recognition rate of over 95% for up to 80% occluded objects.  相似文献   

18.
Multi-view object class recognition can be achieved using existing approaches for single-view object class recognition, by treating different views as entirely independent classes. This strategy requires a large amount of training data for many viewpoints, which can be costly to obtain. We describe a method for constructing a weak three-dimensional model from as few as two views of an object of the target class, and using that model to transform images of objects from one view to several other views, effectively multiplying their value for class recognition. Our approach can be coupled with any 2D image-based recognition system. We show that automatically transformed images dramatically decrease the data requirements for multi-view object class recognition.  相似文献   

19.
Detecting objects, estimating their pose, and recovering their 3D shape are critical problems in many vision and robotics applications. This paper addresses the above needs using a two stages approach. In the first stage, we propose a new method called DEHV – Depth-Encoded Hough Voting. DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Inspired by the Hough voting scheme introduced in [1], DEHV incorporates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. Once the depth map is given, a full reconstruction is achieved in a second (3D modelling) stage, where modified or state-of-the-art 3D shape and texture completion techniques are used to recover the complete 3D model. Extensive quantitative and qualitative experimental analysis on existing datasets [2], [3], [4] and a newly proposed 3D table-top object category dataset shows that our DEHV scheme obtains competitive detection and pose estimation results. Finally, the quality of 3D modelling in terms of both shape completion and texture completion is evaluated on a 3D modelling dataset containing both in-door and out-door object categories. We demonstrate that our overall algorithm can obtain convincing 3D shape reconstruction from just one single uncalibrated image.  相似文献   

20.
A mechanism is presented for direct manipulation of 3D objects with a conventional 2D input device, such as a mouse. The user can define and modify a model by graphical interaction on a 3D perspective or parallel projection. A gestural interface technique enables the specification of 3D transformations (translation, rotation and scaling) by 2D pick and drag operations. Interaction is not restricted to single objects but can be applied to compound objects as well. The method described in this paper is an easy-to-understand 3D input technique which does not require any special hardware and is compatible with the designer's mental model of object manipulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号