首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Detecting objects in complex scenes while recovering the scene layout is a critical functionality in many vision-based applications. In this work, we advocate the importance of geometric contextual reasoning for object recognition. We start from the intuition that objects' location and pose in the 3D space are not arbitrarily distributed but rather constrained by the fact that objects must lie on one or multiple supporting surfaces. We model such supporting surfaces by means of hidden parameters (i.e. not explicitly observed) and formulate the problem of joint scene reconstruction and object recognition as the one of finding the set of parameters that maximizes the joint probability of having a number of detected objects on K supporting planes given the observations. As a key ingredient for solving this optimization problem, we have demonstrated a novel relationship between object location and pose in the image, and the scene layout parameters (i.e. normal of one or more supporting planes in 3D and camera pose, location and focal length). Using a novel probabilistic formulation and the above relationship our method has the unique ability to jointly: i) reduce false alarm and false negative object detection rate; ii) recover object location and supporting planes within the 3D camera reference system; iii) infer camera parameters (view point and the focal length) from just one single uncalibrated image. Quantitative and qualitative experimental evaluation on two datasets (desk-top dataset [1] and LabelMe [2]) demonstrates our theoretical claims.  相似文献   

2.
Camera calibration is the first step of three-dimensional machine vision. A fundamental parameter to be calibrated is the position of the camera projection center with respect to the image plane. This paper presents a method for the computation of the projection center position using images of a translating rigid object, taken by the camera itself.

Many works have been proposed in literature to solve the calibration problem, but this method has several desirable features. The projection center position is computed directly, independently of all other camera parameters. The dimensions and position of the object used for calibration can be completely unknown.

This method is based on a geometric relation between the projection center and the focus of expansion. The use of this property enables the problem to be split into two parts. First a suitable number of focuses of expansion are computed from the images of the translating object. Then the focuses of expansion are taken as landmarks to build a spatial back triangulation problem, the solution of which gives the projection center position.  相似文献   


3.
在当前交通场景中,位姿、焦距均可变的云台相机正在被大规模应用,现有的相机标定模型在求解相机参数的过程中,通常使用先验条件一次性求解相机参数的方法。这种方法在云台相机不断改变位姿和焦距的过程中,标定结果存在不稳定、不精确的情况,其中相机高度的不稳定现象尤为明显。针对这个问题,提出一种基于焦距和相机高度的金字塔迭代优化方案。使用基础的相机标定,得到一个基础标定结果;设定沿道路方向优化对比内容为道路标线,垂直于道路方向上为道路宽度;设定焦距和相机高度优化范围及初始优化步长,并采用金字塔迭代的方式进行结果优化,最终得到最佳优化结果。在高速公路的数据集上的实验结果表明,该方法能够有效提高相机标定参数的精度和稳定性,比之传统标定算法,在沿道路方向上精度提高8%,垂直于道路方向精度提高6.5%,综合精度提高6.5%;在云台相机改变位姿和焦距过程中,相机高度波动稳定在2%以内。  相似文献   

4.
This paper addresses the problem of recognizing three-dimensional objects bounded by smooth curved surfaces from image contours found in a single photograph. The proposed approach is based on a viewpoint-invariant relationship between object geometry and certain image features under weak perspective projection. The image features themselves are viewpoint-dependent. Concretely, the set of all possible silhouette bitangents, along with the contour points sharing the same tangent direction, is the projection of a one-dimensional set of surface points where each point lies on the occluding contour for a five-parameter family of viewpoints. These image features form a one-parameter family of equivalence classes, and it is shown that each class can be characterized by a set of numerical attributes that remain constant across the corresponding five-dimensional set of viewpoints. This is the basis for describing objects by “invariant” curves embedded in high-dimensional spaces. Modeling is achieved by moving an object in front of a camera and does not require knowing the object-to-camera transformation; nor does it involve implicit or explicit three-dimensional shape reconstruction. At recognition time, attributes computed from a single image are used to index the model database, and both qualitative and quantitative verification procedures eliminate potential false matches. The approach has been implemented and examples are presented.  相似文献   

5.
The “Six-line Problem” arises in computer vision and in the automated analysis of images. Given a three-dimensional (3D) object, one extracts geometric features (for example six lines) and then, via techniques from algebraic geometry and geometric invariant theory, produces a set of 3D invariants that represents that feature set. Suppose that later an object is encountered in an image (for example, a photograph taken by a camera modeled by standard perspective projection, i.e. a “pinhole” camera), and suppose further that six lines are extracted from the object appearing in the image. The problem is to decide if the object in the image is the original 3D object. To answer this question two-dimensional (2D) invariants are computed from the lines in the image. One can show that conditions for geometric consistency between the 3D object features and the 2D image features can be expressed as a set of polynomial equations in the combined set of two- and three-dimensional invariants. The object in the image is geometrically consistent with the original object if the set of equations has a solution. One well known method to attack such sets of equations is with resultants. Unfortunately, the size and complexity of this problem made it appear overwhelming until recently. This paper will describe a solution obtained using our own variant of the Cayley–Dixon–Kapur–Saxena–Yang resultant. There is reason to believe that the resultant technique we employ here may solve other complex polynomial systems.  相似文献   

6.
Visual learning and recognition of 3-d objects from appearance   总被引:33,自引:9,他引:24  
The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image.A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.  相似文献   

7.
从二维图像提取到的特征点相对位置信息,利用对于目标物体的先验知识如边缘比例、夹角特征等,形成基于投影成像规则的变量约束方程,从而将物体姿态计算问题转化为一个非线性优化问题,通过多个局部约束线索形成全局约束,能够迅速对物体的姿态做出比较准确的估计,从而推算出视角等信息.实现这个转化的一个前提假设是,对于我们理解图像中物体的姿态,特征位置点的相对位置而不是绝对位置,起着关键作用.因此计算是在一个假设的深度上进行的,从效果来看这样的假设并不影响物体的住姿计算.本文的方法计算量小,利用几何特征来识别稳定、可靠、泛化能力好,实践证明使用几何特征的约束可满足方法能获得关于姿态的极少量可能解,识别出的姿态在各条边之间的比例关系上具有不变性,继而可以将其应用于不变性识别的实际问题之中.  相似文献   

8.
This paper considers the problem of finding the global optimum of the camera rotation, translation and the focal length given a set of 2D–3D point pairs. The global solution is obtained under the L-infinity optimality by a branch-and-bound algorithm. To obtain the goal, we firstly extend the previous branch-and-bound formulation and show that the image space error (pixel distance) may be used instead of the angular error. Then, we present that the problem of camera pose plus focal length given the rotation is a quasi-convex problem. This provides a derivation of a novel inequality for the branch-and-bound algorithm for our problem. Finally, experimental results with synthetic and real data are provided.  相似文献   

9.
A Multi-Frame Structure-from-Motion Algorithm under Perspective Projection   总被引:2,自引:2,他引:0  
We present a fast, robust algorithm for multi-frame structure from motion from point features which works for general motion and large perspective effects. The algorithm is for point features but easily extends to a direct method based on image intensities. Experiments on synthetic and real sequences show that the algorithm gives results nearly as accurate as the maximum likelihood estimate in a couple of seconds on an IRIS 10000. The results are significantly better than those of an optimal two-image estimate. When the camera projection is close to scaled orthographic, the accuracy is comparable to that of the Tomasi/Kanade algorithm, and the algorithms are comparably fast. The algorithm incorporates a quantitative theoretical analysis of the bas-relief ambiguity and exemplifies how such an analysis can be exploited to improve reconstruction. Also, we demonstrate a structure-from-motion algorithm for partially calibrated cameras, with unknown focal length varying from image to image. Unlike the projective approach, this algorithm fully exploits the partial knowledge of the calibration. It is given by a simple modification of our algorithm for calibrated sequences and is insensitive to errors in calibrating the camera center. Theoretically, we show that unknown focal-length variations strengthen the effects of the bas-relief ambiguity. This paper includes extensive experimental studies of two-frame reconstruction and the Tomasi/Kanade approach in comparison to our algorithm. We find that two-frame algorithms are surprisingly robust and accurate, despite some problems with local minima. We demonstrate experimentally that a nearly optimal two-frame reconstruction can be computed quickly, by a minimization in the motion parameters alone. Lastly, we show that a well known problem with the Tomasi/Kanade algorithm is often not a significant one.  相似文献   

10.
本文给出了一种以空间不变量的数据来计算摄象机外部参数的方法.空间透视不变量是指在几何变换中如投影或改变观察点时保持不变的形状描述.由于它可以得到一个相对于外界来讲独立的物体景物的特征描述,故可以很广泛的应用到计算机视觉等方面.摄象机标定是确定摄象机摄取的2D图象信息及其3D实际景物的信息之间的变换关系,它包括内部参数和外部参数两个部分.内部参数表征的是摄象机的内部特征和光学特征参数,包括图象中心(Cx,Cy)坐标、图象尺度因子Sx、有效的焦距长度f和透镜的畸变失真系数K;外部参数表示的是摄象机的位置和方向在世界坐标中的坐标参数,它包括平移矩阵T和旋转矩阵R3×3,一般情况下可以写成一个扩展矩阵[RT]3×4.本文基于空间透视不变量的计算数据,给出了一种标定摄象机外部参数的方法,实验结果表明该方法具有很强的鲁棒性.  相似文献   

11.
《国际计算机数学杂志》2012,89(14):3111-3137
Reconstruction of three dimensional (3D) object structure from multiple images is a fundamental problem in computational vision. Many applications in computer vision require the use of structure information of 3D objects. The objective of this work is to develop a stable method of 3D reconstruction of an object, which works without the availability of camera parameters, once the plane at infinity is obtained using the approximate scene information. First, a framework has been designed based on a modification of the auto-calibration procedure for 3D structure computation using singular value decomposition. In the second part of the work, ambiguities present at the various stages of 3D reconstruction have been analysed. Error norms have been proposed, and studied to quantify the ambiguity in the reconstruction process. We attempt to analyse the effect of pose difference between camera views and focal length parameters on the reconstruction process, using experimentation with simulated and real-world data.  相似文献   

12.
基于共面二点一线特征的单目视觉定位   总被引:1,自引:1,他引:0  
研究了根据点、线混合特征进行单目视觉定位问题,在给定物体坐标系中共面的两个特征点和一条特征直线的条件下,根据它们在像平面上的对应计算相机与物体之间的位姿参数。根据三个特征之间的几何位置关系,分两种情况给出问题求解的具体过程,最终将问题转换成求解一个二次方程问题,真实的工件定位实验验证了方法的有效性。该结果为应用单目视觉进行工件定位提供了一种新方法。  相似文献   

13.
Accurate recovery of three-dimensional shape from image focus   总被引:6,自引:0,他引:6  
A new shape-from-focus method is described which is based on a new concept, named focused image surface (FIS). FIS of an object is defined as the surface formed by the set of points at which the object points are focused by a camera lens. According to paraxial-geometric optics, there is a one-to-one correspondence between the shape of an object and the shape of its FIS. Therefore, the problem of shape recovery can be posed as the problem of determining the shape of the FIS. From the shape of FIS the shape of the object is easily obtained. In this paper the shape of the FIS is determined by searching for a shape which maximizes a focus measure. In contrast to previous literature where the focus measure is computed over the planar image detector of the camera, here the focus measure is computed over the FIS. This results in more accurate shape recovery than the traditional methods. Also, using FIS, a more accurate focused image can be reconstructed from a sequence of images than is possible with traditional methods. The new method has been implemented on an actual camera system, and the results of shape recovery and focused image reconstruction are presented  相似文献   

14.
描述了一种适用于IBR系统的数字相机内参数自定标方法。该方法基于跟踪机机旋转得到的图象系列的特征匹配点以,而不需要标定物。认定在相机旋转过程中,其光学中心是稳定不变的,也即图象中心是固定的,可以事先定标;但容许相机的焦距在各幅图象间有变化,利用真实图象序列进行了实验验证,表明该方法能鲁棒地估算相机内参数。  相似文献   

15.
When broadcasting sports events such as football, it is useful to be able to place virtual annotations on the pitch, to indicate things such as distances between players and the goal, or whether a player is offside. This requires the camera position, orientation, and focal length to be estimated in real time, so that the graphics can be rendered to match the camera view. Whilst this can be achieved by using sensors on the camera mount and lens, they can be impractical or expensive to install, and often the broadcaster only has access to the video feed itself. This paper presents a method for computing the position, orientation and focal length of a camera in real time, using image analysis. The method uses markings on the pitch, such as arcs and lines, to compute the camera pose. A novel feature of the method is the use of multiple images to improve the accuracy of the camera position estimate. A means of automatically initialising the tracking process is also presented, which makes use of a modified form of Hough transform. The paper shows how a carefully chosen set of algorithms can provide fast, robust and accurate tracking for this real-world application.  相似文献   

16.
ABSTRACT

Line matching plays an important role in vision localization and three-dimensional reconstruction of building structures. The conventional method of line matching is not effective for processing stereo images with wide baselines and large viewing angles. This paper proposes a line matching method in an affine projection space, aiming to solve the problem of change of viewing angles in aerial oblique images. Firstly, monocular image orientation can be performed through geometric structures of buildings. Secondly, according to the pose information of the camera, the affine projection matrix is obtained. The original image can be rectified as a conformal image based on this projection matrix, thereby reducing the difference in the viewing angle between images. Then, line matching is performed on the rectified images to get the matched line pairs. Finally, the inverse affine projection matrix is used to back-project the matched line pairs to the original images. The experimental results of five groups of aerial oblique images show that the matched line segments obtained by the proposed method are basically superior to those of the methods which are directly processed on the original image in terms of quantity, correctness, and efficiency.  相似文献   

17.
Markerless tracking of complex human motions from multiple views   总被引:1,自引:0,他引:1  
We present a method for markerless tracking of complex human motions from multiple camera views. In the absence of markers, the task of recovering the pose of a person during such motions is challenging and requires strong image features and robust tracking. We propose a solution which integrates multiple image cues such as edges, color information and volumetric reconstruction. We show that a combination of multiple image cues helps the tracker to overcome ambiguous situations such as limbs touching or strong occlusions of body parts. Following a model-based approach, we match an articulated body model built from superellipsoids against these image cues. Stochastic Meta Descent (SMD) optimization is used to find the pose which best matches the images. Stochastic sampling makes SMD robust against local minima and lowers the computational costs as a small set of predicted image features is sufficient for optimization. The power of SMD is demonstrated by comparing it to the commonly used Levenberg–Marquardt method. Results are shown for several challenging sequences showing complex motions and full articulation, with tracking of 24 degrees of freedom in ≈1 frame per second.  相似文献   

18.
In this paper, we introduce a method to estimate the object’s pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature points from adjacent images in the video sequence. We first demonstrate that centralized pose estimation from the collection of corresponding feature points in the 2D images from all cameras can be obtained as a solution to a generalized Sylvester’s equation. We subsequently derive a distributed solution to pose estimation from multiple cameras and show that it is equivalent to the solution of the centralized pose estimation based on Sylvester’s equation. Specifically, we rely on collaboration among the multiple cameras to provide an iterative refinement of the independent solution to pose estimation obtained for each camera based on Sylvester’s equation. The proposed approach to pose estimation from multiple cameras relies on all of the information available from all cameras to obtain an estimate at each camera even when the image features are not visible to some of the cameras. The resulting pose estimation technique is therefore robust to occlusion and sensor errors from specific camera views. Moreover, the proposed approach does not require matching feature points among images from different camera views nor does it demand reconstruction of 3D points. Furthermore, the computational complexity of the proposed solution grows linearly with the number of cameras. Finally, computer simulation experiments demonstrate the accuracy and speed of our approach to pose estimation from multiple cameras.  相似文献   

19.
针对未标定相机的位姿估计问题,提出了一种焦距和位姿同时迭代的高精度位姿估计算法。现有的未标定相机的位姿估计算法是焦距和相机位姿单独求解,焦距估计精度较差。提出的算法首先通过现有算法得到相机焦距和位姿的初始参数;然后在正交迭代的基础上推导了焦距和位姿最小化函数,将焦距和位姿同时作为初始值进行迭代计算;最后得到高精度的焦距和位姿参数。仿真实验表明提出的算法在点数为10,噪声标准差为2的情况下,角度相对误差小于1%,平移相对误差小于4%,焦距相对误差小于3%;真实实验表明提出的算法与棋盘标定方法的精度相当。与现有算法相比,能够对未标定相机进行高精度的焦距和位姿估计。  相似文献   

20.
This paper addresses the problem of factorization-based 3D reconstruction from uncalibrated image sequences. Previous studies on structure and motion factorization are either based on simplified affine assumption or general perspective projection. The affine approximation is widely adopted due to its simplicity, whereas the extension to perspective model suffers from recovering projective depths. To fill the gap between simplicity of affine and accuracy of perspective model, we propose a quasi-perspective projection model for structure and motion recovery of rigid and nonrigid objects based on factorization framework. The novelty and contribution of this paper are as follows. Firstly, under the assumption that the camera is far away from the object with small lateral rotations, we prove that the imaging process can be modeled by quasi-perspective projection, which is more accurate than affine model from both geometrical error analysis and experimental studies. Secondly, we apply the model to establish a framework of rigid and nonrigid factorization under quasi-perspective assumption. Finally, we propose an Extended Cholesky Decomposition to recover the rotation part of the Euclidean upgrading matrix. We also prove that the last column of the upgrading matrix corresponds to a global scale and translation of the camera thus may be set freely. The proposed method is validated and evaluated extensively on synthetic and real image sequences and improved results over existing schemes are observed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号