首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The view-independent visualization of 3D scenes is most often based on rendering accurate 3D models or utilizes image-based rendering techniques. To compute the 3D structure of a scene from a moving vision sensor or to use image-based rendering approaches, we need to be able to estimate the motion of the sensor from the recorded image information with high accuracy, a problem that has been well-studied. In this work, we investigate the relationship between camera design and our ability to perform accurate 3D photography, by examining the influence of camera design on the estimation of the motion and structure of a scene from video data. By relating the differential structure of the time varying plenoptic function to different known and new camera designs, we can establish a hierarchy of cameras based upon the stability and complexity of the computations necessary to estimate structure and motion. At the low end of this hierarchy is the standard planar pinhole camera for which the structure from motion problem is non-linear and ill-posed. At the high end is a camera, which we call the full field of view polydioptric camera, for which the motion estimation problem can be solved independently of the depth of the scene which leads to fast and robust algorithms for 3D Photography. In between are multiple view cameras with a large field of view which we have built, as well as omni-directional sensors.  相似文献   

2.
Object tracking is an important task in computer vision that is essential for higher level vision applications such as surveillance systems, human-computer interaction, industrial control, smart compression of video, and robotics. Tracking, however, cannot be easily accomplished due to challenges such as real-time processing, occlusions, changes in intensity, abrupt motions, variety of objects, and mobile platforms. In this paper, we propose a new method to estimate and eliminate the camera motion in mobile platforms, and accordingly, we propose a set of optimal feature points for accurate tracking. Experimental results on different videos show that the proposed method estimates camera motion very well and eliminate its effect on tracking moving objects. And the use of optimal feature points results in a promising tracking. The proposed method in terms of accuracy and processing time has desirable results compared to the state-of-the-art methods.  相似文献   

3.
Road safety, whatever the considered environment, relies heavily on the ability to detect and track moving objects from a moving point of view. In order to achieve such a detection, the vehicle’s ego-motion must first be estimated and compensated. This issue is crucial to complete a fully autonomous vehicle; this is why several approaches have already been proposed. This study presents a method, based solely on visual information that implements such a process. Information from stereo-vision and motion is derived to extract the vehicle’s ego-motion. Ego-motion extraction algorithm is thoroughly evaluated in terms of precision and uncertainty. Given those statistical attributes, a method for dynamic objects detection is presented. This method relies on 3D image registration and residual displacement field evaluation. This method is then evaluated on several real and synthetic data sequences. It will be shown that it allows a reliable and early detection, even in hard cases (e.g. occlusions,...). Given a few additional factors (detectable motion range), overall performances can be derived from visual odometry performances.  相似文献   

4.
Video cameras must produce images at a reasonable frame-rate and with a reasonable depth of field. These requirements impose fundamental physical limits on the spatial resolution of the image detector. As a result, current cameras produce videos with a very low resolution. The resolution of videos can be computationally enhanced by moving the camera and applying super-resolution reconstruction algorithms. However, a moving camera introduces motion blur, which limits super-resolution quality. We analyze this effect and derive a theoretical result showing that motion blur has a substantial degrading effect on the performance of super-resolution. The conclusion is that, in order to achieve the highest resolution motion blur should be avoided. Motion blur can be minimized by sampling the space-time volume of the video in a specific manner. We have developed a novel camera, called the "jitter camera," that achieves this sampling. By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, we show that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has a significantly higher resolution than the captured one.  相似文献   

5.
A new concept in passive ranging to moving objects is described which is based on the comparison of multiple image flows. It is well known that if a static scene is viewed by an observer undergoing a known relative translation through space, then the distance to objects in the scene can be easily obtained from the measured image velocities associated with features on the objects (i.e., motion stereo). But in general, individual objects are translating and rotating at unknown rates with respect to a moving observer whose own motion may not be accurately monitored. The net effect is a complicated image flow field in which absolute range information is lost. However, if a second image flow field is produced by a camera whose motion through space differs from that of the first camera by a known amount, the range information can be recovered by subtracting the first image flow from the second. This ``difference flow' must then be corrected for the known relative rotation between the two cameras, resulting in a divergent relative flow from a known focus of expansion. This passive ranging process may be termed Dynamic Stereo, the known difference in camera motions playing the role of the stereo baseline. We present the basic theory of this ranging process, along with some examples for simulated scenes.  相似文献   

6.
Crowded motions refer to multiple objects moving around and interacting such as crowds, pedestrians and etc. We capture crowded scenes using a depth scanner at video frame rates. Thus, our input is a set of depth frames which sample the scene over time. Processing such data is challenging as it is highly unorganized, with large spatio‐temporal holes due to many occlusions. As no correspondence is given, locally tracking 3D points across frames is hard due to noise and missing regions. Furthermore global segmentation and motion completion in presence of large occlusions is ambiguous and hard to predict. Our algorithm utilizes Gestalt principles of common fate and good continuity to compute motion tracking and completion respectively. Our technique does not assume any pre‐given markers or motion template priors. Our key‐idea is to reduce the motion completion problem to a 1D curve fitting and matching problem which can be solved efficiently using a global optimization scheme. We demonstrate our segmentation and completion method on a variety of synthetic and real world crowded scanned scenes.  相似文献   

7.
We present an algorithm for identifying and tracking independently moving rigid objects from optical flow. Some previous attempts at segmentation via optical flow have focused on finding discontinuities in the flow field. While discontinuities do indicate a change in scene depth, they do not in general signal a boundary between two separate objects. The proposed method uses the fact that each independently moving object has a unique epipolar constraint associated with its motion. Thus motion discontinuities based on self-occlusion can be distinguished from those due to separate objects. The use of epipolar geometry allows for the determination of individual motion parameters for each object as well as the recovery of relative depth for each point on the object. The algorithm assumes an affine camera where perspective effects are limited to changes in overall scale. No camera calibration parameters are required. A Kalman filter based approach is used for tracking motion parameters with time  相似文献   

8.
Motion segmentation in moving camera videos is a very challenging task because of the motion dependence between the camera and moving objects. Camera motion compensation is recognized as an effective approach. However, existing work depends on prior-knowledge on the camera motion and scene structure for model selection. This is not always available in practice. Moreover, the image plane motion suffers from depth variations, which leads to depth-dependent motion segmentation in 3D scenes. To solve these problems, this paper develops a prior-free dependent motion segmentation algorithm by introducing a modified Helmholtz-Hodge decomposition (HHD) based object-motion oriented map (OOM). By decomposing the image motion (optical flow) into a curl-free and a divergence-free component, all kinds of camera-induced image motions can be represented by these two components in an invariant way. HHD identifies the camera-induced image motion as one segment irrespective of depth variations with the help of OOM. To segment object motions from the scene, we deploy a novel spatio-temporal constrained quadtree labeling. Extensive experimental results on benchmarks demonstrate that our method improves the performance of the state-of-the-art by 10%~20% even over challenging scenes with complex background.  相似文献   

9.
Multibody structure-and-motion (MSaM) is the problem in establishing the multiple-view geometry of several views of a 3D scene taken at different times, where the scene consists of multiple rigid objects moving relative to each other. We examine the case of two views. The setting is the following: Given are a set of corresponding image points in two images, which originate from an unknown number of moving scene objects, each giving rise to a motion model. Furthermore, the measurement noise is unknown, and there are a number of gross errors, which are outliers to all models. The task is to find an optimal set of motion models for the measurements. It is solved through Monte-Carlo sampling, careful statistical analysis of the sampled set of motion models, and simultaneous selection of multiple motion models to best explain the measurements. The framework is not restricted to any particular model selection mechanism because it is developed from a Bayesian viewpoint: different model selection criteria are seen as different priors for the set of moving objects, which allow one to bias the selection procedure for different purposes.  相似文献   

10.
动态场景图像序列中运动目标检测新方法   总被引:1,自引:0,他引:1       下载免费PDF全文
在动态场景图像序列中检测运动目标时,如何消除因摄影机运动带来的图像帧间全局运动的影响,以便分割图像中的静止背景和运动物体,是一个必须解决的难题。针对复杂背景下动态场景图像序列的特性,给出了一种新的基于场景图像参考点3D位置恢复的图像背景判别方法和运动目标检测方法。首先,介绍了图像序列的层次化运动模型以及基于它的运动分割方法;然后,利用估计出的投影矩阵计算序列图像中各运动层的参考点3D位置,根据同一景物在不同帧中参考点3D位置恢复值的变化特性,来判别静止背景对应的运动层和运动目标对应的运动层,从而分割出图像中的静止背景和运动目标;最后,给出了动态场景图像序列中运动目标检测的详细算法。实验结果表明,新算法较好地解决了在具有多组帧间全局运动参数的动态场景序列图像中检测运动目标的问题,较大地提高了运动目标跟踪算法的有效性和鲁棒性。  相似文献   

11.
The neural mechanisms underlying motion segregation and integration still remain unclear to a large extent. Local motion estimates often are ambiguous in the lack of form features, such as corners or junctions. Furthermore, even in the presence of such features, local motion estimates may be wrong if they were generated near occlusions or from transparent objects. Here, a neural model of visual motion processing is presented that involves early stages of the cortical dorsal and ventral pathways. We investigate the computational mechanisms of V1-MT feedforward and feedback processing in the perception of coherent shape motion. In particular, we demonstrate how modulatory MT-V1 feedback helps to stabilize localized feature signals at, e.g. corners, and to disambiguate initial flow estimates that signal ambiguous movement due to the aperture problem for single shapes. In cluttered environments with multiple moving objects partial occlusions may occur which, in turn, generate erroneous motion signals at points of overlapping form. Intrinsic-extrinsic region boundaries are indicated by local T-junctions of possibly any orientation and spatial configuration. Such junctions generate strong localized feature tracking signals that inject erroneous motion directions into the integration process. We describe a simple local mechanism of excitatory form-motion interaction that modifies spurious motion cues at T-junctions. In concert with local competitive-cooperative mechanisms of the motion pathway the motion signals are subsequently segregated into coherent representations of moving shapes. Computer simulations demonstrate the competency of the proposed neural model.  相似文献   

12.
In this paper we describe an algorithm to recover the scene structure, the trajectories of the moving objects and the camera motion simultaneously given a monocular image sequence. The number of the moving objects is automatically detected without prior motion segmentation. Assuming that the objects are moving linearly with constant speeds, we propose a unified geometrical representation of the static scene and the moving objects. This representation enables the embedding of the motion constraints into the scene structure, which leads to a factorization-based algorithm. We also discuss solutions to the degenerate cases which can be automatically detected by the algorithm. Extension of the algorithm to weak perspective projections is presented as well. Experimental results on synthetic and real images show that the algorithm is reliable under noise.  相似文献   

13.
For any visual feature‐based SLAM (simultaneous localization and mapping) solutions, to estimate the relative camera motion between two images, it is necessary to find “correct” correspondence between features extracted from those images. Given a set of feature correspondents, one can use a n‐point algorithm with robust estimation method, to produce the best estimate to the relative camera pose. The accuracy of a motion estimate is heavily dependent on the accuracy of the feature correspondence. Such a dependency is even more significant when features are extracted from the images of the scenes with drastic changes in viewpoints and illuminations and presence of occlusions. To make a feature matching robust to such challenging scenes, we propose a new feature matching method that incrementally chooses a five pairs of matched features for a full DoF (degree of freedom) camera motion estimation. In particular, at the first stage, we use our 2‐point algorithm to estimate a camera motion and, at the second stage, use this estimated motion to choose three more matched features. In addition, we use, instead of the epipolar constraint, a planar constraint for more accurate outlier rejection. With this set of five matching features, we estimate a full DoF camera motion with scale ambiguity. Through the experiments with three, real‐world data sets, our method demonstrates its effectiveness and robustness by successfully matching features (1) from the images of a night market where presence of frequent occlusions and varying illuminations, (2) from the images of a night market taken by a handheld camera and by the Google street view, and (3) from the images of a same location taken daytime and nighttime.  相似文献   

14.
Camera calibration using one-dimensional (1D) rigid objects is arresting the attentions of researchers since the easy-to-construct geometrical structure of the apparatuses. In this paper, we extend the motion patterns applicable for calibration with the motion of 1D objects. We show that a 1D object with three or more markers, rotating around one marker which is moving in a plane, provides constraint equations on camera intrinsic parameters. A stick moving under gravity without other forces acting on performs such a motion. Simulated tests show the feasibility and numerical robustness of this method.  相似文献   

15.
This paper addresses the problem of estimating a camera motion from a non-calibrated monocular camera. Compared to existing methods that rely on restrictive assumptions, we propose a method which can estimate camera motion with much less restrictions by adopting new example-based techniques compensating the lack of information. Specifically, we estimate the focal length of the camera by referring to visually similar training images with which focal lengths are associated. For one step camera estimation, we refer to stationary points (landmark points) whose depths are estimated based on RGB-D candidates. In addition to landmark points, moving objects can be also used as an information source to estimate the camera motion. Therefore, our method simultaneously estimates the camera motion for a video, and the 3D trajectories of objects in this video by using Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) particle filtering. Our method is evaluated on challenging datasets demonstrating its effectiveness and efficiency.  相似文献   

16.
3D surface reconstruction and motion modeling has been integrated in several industrial applications. Using a pan–tilt–zoom (PTZ) camera, we present an efficient method called dynamic 3D reconstruction (D3DR) for recovering the 3D motion and structure of a freely moving target. The proposed method estimates the PTZ measurements to keep the target in the center of the field of view (FoV) of the camera with the same size. Feature extraction and tracking approach are used in the imaging framework to estimate the target's translation, position, and distance. A selection strategy is used to select keyframes that show significant changes in target movement and directly update the recovered 3D information. The proposed D3DR method is designed to work in a real-time environment, not requiring all frames captured to be used to update the recovered 3D motion and structure of the target. Using fewer frames minimizes the time and space complexity required. Experimental results conducted on real-time video streams using different targets to prove the efficiency of the proposed method. The proposed D3DR has been compared to existing offline and online 3D reconstruction methods, showing that it uses less execution time than the offline method and uses an average of 49.6% of the total number of frames captured.  相似文献   

17.
18.
In this paper, we present a theory for combining the effects of motion, illumination, 3D structure, albedo, and camera parameters in a sequence of images obtained by a perspective camera. We show that the set of all Lambertian reflectance functions of a moving object, at any position, illuminated by arbitrarily distant light sources, lies "close" to a bilinear subspace consisting of nine illumination variables and six motion variables. This result implies that, given an arbitrary video sequence, it is possible to recover the 3D structure, motion, and illumination conditions simultaneously using the bilinear subspace formulation. The derivation builds upon existing work on linear subspace representations of reflectance by generalizing it to moving objects. Lighting can change slowly or suddenly, locally or globally, and can originate from a combination of point and extended sources. We experimentally compare the results of our theory with ground truth data and also provide results on real data by using video sequences of a 3D face and the entire human body with various combinations of motion and illumination directions. We also show results of our theory in estimating 3D motion and illumination model parameters from a video sequence  相似文献   

19.
基于双目视觉的运动物体实时跟踪与测距   总被引:5,自引:0,他引:5  
利用双目视觉信息系统实现三维空间中运动物体实时跟踪与测距.当运动目标超出视野范围时,可通 过控制摄像机云台转动搜索目标.此外,还研究了在摄像头运动情况下,无需重新标定,即可实现运动物体测距的 算法.这里,自适应背景建模法与CamShift 算法用于实现运动物体的辨识与跟踪.实验结果证明了所提出的算法能 够有效地追踪物体,并同时准确地测量它的三维位置.  相似文献   

20.
We present a template-based approach to detecting human silhouettes in a specific walking pose. Our templates consist of short sequences of 2D silhouettes obtained from motion capture data. This lets us incorporate motion information into them and helps distinguish actual people who move in a predictable way from static objects whose outlines roughly resemble those of humans. Moreover, during the training phase we use statistical learning techniques to estimate and store the relevance of the different silhouette parts to the recognition task. At run-time, we use it to convert Chamfer distance to meaningful probability estimates. The templates can handle six different camera views, excluding the frontal and back view, as well as different scales. We demonstrate the effectiveness of our technique using both indoor and outdoor sequences of people walking in front of cluttered backgrounds and acquired with a moving camera, which makes techniques such as background subtraction impractical.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号