首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A robot vision system that automatically generates an object recognition strategy from a 3D model and recognizes the object using this strategy is presented. The appearance of an object from various viewpoints is described in terms of visible 2D features such as parallel lines and ellipses. Features are then ranked according to the number of viewpoints from which they are visible. The rank and feature extraction cost of each feature are used to generate a treelike strategy graph. This graph gives an efficient feature search order when the viewpoint is unknown, starting with commonly occurring features and ending with features specific to a certain viewpoint. The system searches for features in the order indicated by the graph. After detection, the system compares a lines representation generated from the 3D model with the image features to localize the object. Perspective projection is used in the localization process to obtain the precise position and attitude of the object, whereas orthographic projection is used in the strategy generation process to allow symbolic manipulation. Experimental results are given  相似文献   

2.
Shape Reconstruction of 3D Bilaterally Symmetric Surfaces   总被引:1,自引:0,他引:1  
The paper presents a new approach for shape recovery based on integrating geometric and photometric information. We consider 3D bilaterally symmetric objects, that is, objects which are symmetric with respect to a plane (e.g., faces), and their reconstruction from a single image. Both the viewpoint and the illumination are not necessarily frontal. Furthermore, no correspondence between symmetric points is required.The basic idea is that an image taken from a general, non frontal viewpoint, under non-frontal illumination can be regarded as a pair of images. Each image of the pair is one half of the object, taken from different viewing positions and with different lighting directions. Thus, one-image-variants of geometric stereo and of photometric stereo can be used. Unlike the separate invocation of these approaches, which require point correspondence between the two images, we show that integrating the photometric and geometric information suffice to yield a dense correspondence between pairs of symmetric points, and as a result, a dense shape recovery of the object. Furthermore, the unknown lighting and viewing parameters, are also recovered in this process.Unknown distant point light source, Lambertian surfaces, unknown constant albedo, and weak perspective projection are assumed. The method has been implemented and tested experimentally on simulated and real data.  相似文献   

3.
A view-independent relational model (VIRM) used in a vision system for recognizing known 3-D objects from single monochromatic images of unknown scenes is described. The system inspects a CAD model from a number of different viewpoints, and a statistical interference is applied to identify relatively view-independent relationships among component parts of the object. These relations are stored as a relational model of the object, which is represented in the form of a hypergraph. Three-dimensional components of the object, which can be associated with extended image features obtained by grouping of primitive 2-D features are represented as nodes of the hypergraph. Covisibility of model features is represented by means of hyperedges of the hypergraph, and the pairwise view-independent relations form procedural constraints associated with the hypergraph edges. During the recognition phase, the covisibility measures allow a best-first search of the graph for acceptable matches  相似文献   

4.
《Advanced Robotics》2013,27(1):29-42
For recognition of three-dimensional (3D) shapes and measurement of 3D positions of objects it is important for a vision system to be able to measure the 3D data of dense points in the environment. One approach is to measure the distance on the basis of the triangulation principle from the disparity of two images. However, this binocular vision method has difficulty in finding a correspondence of features between two images. This correspondence problem can be solved geometrically by adding another camera, i.e. by trinocular vision. This paper presents the principles and implementation details of trinocular vision. On the basis of the proposed method, we carried out several experiments, from which we found that many correct correspondences could be established, even for images of a complex scene, by only the geometrical constraint of trinocular vision. However, when there are dense points in the image, multiple candidate points are found and a unique correspondence cannot be established. Two approaches to solve this problem are discussed in this paper.  相似文献   

5.
The viewpoint consistency constraint   总被引:3,自引:1,他引:2  
  相似文献   

6.
This paper addresses the problem of recognizing three-dimensional objects bounded by smooth curved surfaces from image contours found in a single photograph. The proposed approach is based on a viewpoint-invariant relationship between object geometry and certain image features under weak perspective projection. The image features themselves are viewpoint-dependent. Concretely, the set of all possible silhouette bitangents, along with the contour points sharing the same tangent direction, is the projection of a one-dimensional set of surface points where each point lies on the occluding contour for a five-parameter family of viewpoints. These image features form a one-parameter family of equivalence classes, and it is shown that each class can be characterized by a set of numerical attributes that remain constant across the corresponding five-dimensional set of viewpoints. This is the basis for describing objects by “invariant” curves embedded in high-dimensional spaces. Modeling is achieved by moving an object in front of a camera and does not require knowing the object-to-camera transformation; nor does it involve implicit or explicit three-dimensional shape reconstruction. At recognition time, attributes computed from a single image are used to index the model database, and both qualitative and quantitative verification procedures eliminate potential false matches. The approach has been implemented and examples are presented.  相似文献   

7.
An important application of machine vision systems is the recognition of known three-dimensional objects. A major difficulty arises when two or more objects project the same or similar two-dimensional image, often resulting in misclassification and degradation of system performance. The changes in images which result from the motion of objects provide a source of three-dimensional information which can greatly aid the classification process, but this three-dimensional analysis is computationally complex and subject to many sources of error. This work develops a methodology which utilizes the information derived from the apparent changes in object features over time to facilitate the recognition task, without the need to actually recover the three-dimensional structure of the objects under view. The basic approach is to generate a ``feature signature' by combining the feature measurements of the individual regions in a long sequence of images. The static information in the individual frames is analyzed along with the temporal information from the entire sequence. These techniques are particularly applicable in situations where static image processing methods cannot discriminate between ambiguous objects. Two example implementations are presented to illustrate the application of the techniques of object recognition using motion information.  相似文献   

8.
View-invariant human action recognition is a challenging research topic in computer vision. Hidden Markov Models(HMM) and their extensions have been widely used for view-invariant action recognition. However those methods are usually according to a large parameter space, requiring amounts of training data and with low classification accuracies for real application. A novel graphical structure based on HMM with multi-view transition is proposed to model the human action with viewpoint changing. The model consists of multiple sub action models, which correspond to the traditional HMM utilized to model the human action in a particular rotation viewpoint space. In the training process, the novel model can be built by connecting the sub action models between adjacent viewpoint spaces. In the recognition process, action with unknown viewpoint is recognized by using improved forward algorithm. The proposed model can not only simplify the model training process by decomposing the parameter space into multiple sub-spaces, but also improve the performance the algorithm by constraining the possible viewpoint changing. Experiment results on IXMAS dataset demonstrated that the proposed model obtains better performance than other recent view-invariant action recognition method.  相似文献   

9.
Non-Single Viewpoint Catadioptric Cameras: Geometry and Analysis   总被引:1,自引:0,他引:1  
Conventional vision systems and algorithms assume the imaging system to have a single viewpoint. However, these imaging systems need not always maintain a single viewpoint. For instance, an incorrectly aligned catadioptric system could cause non-single viewpoints. Moreover, a lot of flexibility in imaging system design can be achieved by relaxing the need for imaging systems to have a single viewpoint. Thus, imaging systems with non-single viewpoints can be designed for specific imaging tasks, or image characteristics such as field of view and resolution. The viewpoint locus of such imaging systems is called a caustic. In this paper, we present an in-depth analysis of caustics of catadioptric cameras with conic reflectors. We use a simple parametric model for both, the reflector and the imaging system, to derive an analytic solution for the caustic surface. This model completely describes the imaging system and provides a map from pixels in the image to their corresponding viewpoints and viewing direction. We use the model to analyze the imaging system's properties such as field of view, resolution and other geometric properties of the caustic itself. In addition, we present a simple technique to calibrate the class of conic catadioptric cameras and estimate their caustics from known camera motion. The analysis and results we present in this paper are general and can be applied to any catadioptric imaging system whose reflector has a parametric form.  相似文献   

10.
Active vision   总被引:16,自引:5,他引:11  
We investigate several basic problems in vision under the assumption that the observer is active. An observer is called active when engaged in some kind of activity whose purpose is to control the geometric parameters of the sensory apparatus. The purpose of the activity is to manipulate the constraints underlying the observed phenomena in order to improve the quality of the perceptual results. For example a monocular observer that moves with a known or unknown motion or a binocular observer that can rotate his eyes and track environmental objects are just two examples of an observer that we call active. We prove that an active observer can solve basic vision problems in a much more efficient way than a passive one. Problems that are ill-posed and nonlinear for a passive observer become well-posed and linear for an active observer. In particular, the problems of shape from shading and depth computation, shape from contour, shape from texture, and structure from motion are shown to be much easier for an active observer than for a passive one. It has to be emphasized that correspondence is not used in our approach, i.e., active vision is not correspondence of features from multiple viewpoints. Finally, active vision here does not mean active sensing, and this paper introduces a general methodology, a general framework in which we believe low-level vision problems should be addressed.The author is Yiannis  相似文献   

11.
《Image and vision computing》2002,20(9-10):639-646
In this paper we propose a paradigm called the interactive visual dialog (IVD) as a means of facilitating a system's ability to recognize objects presented to it by a human. The presentation centers around a supermarket checkout scenario in which an operator presents an item to be tallied to a stationary television camera. An active vision approach is used to provide feedback to the operator in the form of an image (or images) depicting what the system thinks the operator is most likely holding, shown in a viewpoint that suggests how the object should next be presented to improve the certainty of interpretation. Interaction proceeds iteratively until the system converges on the correct interpretation. We show how the IVD can be implemented using an entropy-based gaze planning strategy and a sequential Bayes recognition system using optical flow as input. Experimental results show that the system does, in practice, improve recognition accuracy, leading to convergence to a correct solution in a minimal number of iterations.  相似文献   

12.
The existing object recognition methods can be classified into two categories: interest-point-based and discriminative-part-based. The interest-point-based methods do not perform well if the interest points cannot be selected very carefully. The performance of the discriminative-part-base methods is not stable if viewpoints change, because they select discriminative parts from the interest points. In addition, the discriminative-part-based methods often do not provide an incremental learning ability. To address these problems, we propose a novel method that consists of three phases. First, we use some sliding windows that are different in scale to retrieve a number of local parts from each model object and extract a feature vector for each local part retrieved. Next, we construct prototypes for the model objects by using the feature vectors obtained in the first phase. Each prototype represents a discriminative part of a model object. Then, we establish the correspondence between the local parts of a test object and those of the model objects. Finally, we compute the similarity between the test object and each model object, based on the correspondence established. The test object is recognized as the model object that has the highest similarity with the test object. The experimental results show that our proposed method outperforms or is comparable with the compared methods in terms of recognition rates on the COIL-100 dataset, Oxford buildings dataset and ETH-80 dataset, and recognizes all query images of the ZuBuD dataset. It is robust enough for distortion, occlusion, rotation, viewpoint and illumination change. In addition, we accelerate the recognition process using the C4.5 decision tree technique, and the proposed method has the ability to build prototypes incrementally.  相似文献   

13.
Grasp synthesis for unknown objects is a challenging problem as the algorithms are expected to cope with missing object shape information. This missing information is a function of the vision sensor viewpoint. The majority of the grasp synthesis algorithms in literature synthesize a grasp by using one single image of the target object and making assumptions on the missing shape information. On the contrary, this paper proposes the use of robot's depth sensor actively: we propose an active vision methodology that optimizes the viewpoint of the sensor for increasing the quality of the synthesized grasp over time. By this way, we aim to relax the assumptions on the sensor's viewpoint and boost the success rates of the grasp synthesis algorithms. A reinforcement learning technique is employed to obtain a viewpoint optimization policy, and a training process and automated training data generation procedure are presented. The methodology is applied to a simple force-moment balance-based grasp synthesis algorithm, and a thousand simulations with five objects are conducted with random initial poses in which the grasp synthesis algorithm was not able to obtain a good grasp with the initial viewpoint. In 94% of these cases, the policy achieved to find a successful grasp.  相似文献   

14.
A new approach is presented for explicitly relating image observables to models of curved three-dimensional objects. This relationship is used for object recognition and positioning. Object models consist of collections of parametric surface patches. The image observables considered are raw range data, surface normal and Gaussian curvature, raw image intensity and intensity gradient, raw image contours, and contour orientation and curvature. Elimination theory provides a method for constructing an implicit equation that relates these observables to the three-dimensional position and orientation of object models. Determining the unknown pose parameters is reduced to a fitting problem between the implicit equation and the observed data points. By considering translation-independent observables such as surface normal and curvature, this process is further decomposed into first determining orientation and then determining translation. Applications to object recognition are described, and an implementation is presented.  相似文献   

15.
Image compositing is widely used to combine visual elements from separate source images into a single image. Although recent image compositing techniques are capable of achieving smooth blending of the visual elements from different sources, most of them implicitly assume the source images are taken in the same viewpoint. In this paper, we present an approach to compositing novel image objects from multiple source images which have different viewpoints. Our key idea is to construct 3D proxies for meaningful components of the source image objects, and use these 3D component proxies to warp and seamlessly merge components together in the same viewpoint. To realize this idea, we introduce a coordinate-frame based single-view camera calibration algorithm to handle general types of image objects, a structure-aware cuboid optimization algorithm to get the cuboid proxies for image object components with correct structure relationship, and finally a 3D-proxy transformation guided image warping algorithm to stitch object components. We further describe a novel application based on this compositing approach to automatically synthesize a large number of image objects from a set of exemplars. Experimental results show that our compositing approach can be applied to a variety of image objects, such as chairs, cups, lamps, and robots, and the synthesis application can create novel image objects with significant shape and style variations from a small set of exemplars.  相似文献   

16.
In this paper, we examine the complexities involved in retrieving images from a database comprised of objects of very similar appearance. Such an operation requires a process that can discriminate among images at a very fine level, such as distinguishing among various species of fish. Furthermore, incidental environmental factors such as change in viewpoints and slight, nonessential shape deformation must be excluded from the similarity criteria. To this end, we propose a new method for content-based image retrieval and indexing, one that is well suited for discriminating among objects within the same class in a way that is insensitive to incidental environmental changes. The scheme comprises a global alignment and a local matching process. Affine transform is used to model the different viewpoints associated with positioning the camera, while multi-dimensional indexing techniques are used to make the global alignment scheme efficient. A local matching process based on dynamic programming allows the optimal matching of local structures using cost metrics that may ignore nonessential local shape deformation. Results show the method's ability to cancel out visual distortions caused by a changing viewpoint, and its tolerance to noise, occlusion, and slight deformations of the object.  相似文献   

17.
Transformer模型在自然语言处理领域取得了很好的效果,同时因其能够更好地连接视觉和语言,也激发了计算机视觉界的极大兴趣。本文总结了视觉Transformer处理多种识别任务的百余种代表性方法,并对比分析了不同任务内的模型表现,在此基础上总结了每类任务模型的优点、不足以及面临的挑战。根据识别粒度的不同,分别着眼于诸如图像分类、视频分类的基于全局识别的方法,以及目标检测、视觉分割的基于局部识别的方法。考虑到现有方法在3种具体识别任务的广泛流行,总结了在人脸识别、动作识别和姿态估计中的方法。同时,也总结了可用于多种视觉任务或领域无关的通用方法的研究现状。基于Transformer的模型实现了许多端到端的方法,并不断追求准确率与计算成本的平衡。全局识别任务下的Transformer模型对补丁序列切分和标记特征表示进行了探索,局部识别任务下的Transformer模型因能够更好地捕获全局信息而取得了较好的表现。在人脸识别和动作识别方面,注意力机制减少了特征表示的误差,可以处理丰富多样的特征。Transformer可以解决姿态估计中特征错位的问题,有利于改善基于回归的方法性能,还减少了三维估计时深度映射所产生的歧义。大量探索表明视觉Transformer在识别任务中的有效性,并且在特征表示或网络结构等方面的改进有利于提升性能。  相似文献   

18.
19.
20.
This paper concerns the problem of recognition and localization of three-dimensional objects from range data. Most of the previous approaches suffered from one or both of the following shortcomings: (1) They dealt with single object scenes and/or (2) they dealt with polyhedral objects or objects that were approximated as polyhedra. The work in this paper addresses both of these shortcomings. The input scenes are allowed to contain multiple objects with partial occlusion. The objects are not restricted to polyhedra but are allowed to have a piecewise combination of curved surfaces, namely, spherical, cylindrical, and conical surfaces. This restriction on the types of curved surfaces is not unreasonable since most objects encountered in an industrial environment can be thus modeled. This paper shows how the qualitative classification of the surfaces based on the signs of the mean and Gaussian curvature can be used to come up withdihedral feature junctions as features to be used for recognition and localization. Dihedral feature junctions are robust to occlusion, offer a viewpoint independent modeling technique for the object models, do not require elaborate segmentation, and the feature extraction process is amenable to parallelism. Hough clustering on account of its ease of parallelization is chosen as the constraint propagation/ satisfaction mechanisms. Experimental results are presented using the Connection Machine. The fine-grained architecture of the Connection Machine is shown to be well suited for the recognition/localization technique presented in this paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号