首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a new variational method for multi-view stereovision and non-rigid three-dimensional motion estimation from multiple video sequences. Our method minimizes the prediction error of the shape and motion estimates. Both problems then translate into a generic image registration task. The latter is entrusted to a global measure of image similarity, chosen depending on imaging conditions and scene properties. Rather than integrating a matching measure computed independently at each surface point, our approach computes a global image-based matching score between the input images and the predicted images. The matching process fully handles projective distortion and partial occlusions. Neighborhood as well as global intensity information can be exploited to improve the robustness to appearance changes due to non-Lambertian materials and illumination changes, without any approximation of shape, motion or visibility. Moreover, our approach results in a simpler, more flexible, and more efficient implementation than in existing methods. The computation time on large datasets does not exceed thirty minutes on a standard workstation. Finally, our method is compliant with a hardware implementation with graphics processor units. Our stereovision algorithm yields very good results on a variety of datasets including specularities and translucency. We have successfully tested our motion estimation algorithm on a very challenging multi-view video sequence of a non-rigid scene. Electronic supplementary material Electronic supplementary material is available for this article at and accessible for authorised users.  相似文献   

2.
This paper proposes an effective approach to detect and segment moving objects from two time-consecutive stereo frames, which leverages the uncertainties in camera motion estimation and in disparity computation. First, the relative camera motion and its uncertainty are computed by tracking and matching sparse features in four images. Then, the motion likelihood at each pixel is estimated by taking into account the ego-motion uncertainty and disparity in computation procedure. Finally, the motion likelihood, color and depth cues are combined in the graph-cut framework for moving object segmentation. The efficiency of the proposed method is evaluated on the KITTI benchmarking datasets, and our experiments show that the proposed approach is robust against both global (camera motion) and local (optical flow) noise. Moreover, the approach is dense as it applies to all pixels in an image, and even partially occluded moving objects can be detected successfully. Without dedicated tracking strategy, our approach achieves high recall and comparable precision on the KITTI benchmarking sequences.  相似文献   

3.
We present a novel strategy for computing disparity maps from omni-directional stereo images obtained with fish-eye lenses in forest environments. At a first segmentation stage, the method identifies textures of interest to be either matched or discarded. Two of them are identified by applying the powerful Support Vector Machines approach. At a second stage, a stereovision matching process is designed based on the application of four stereovision matching constraints: epipolarity, similarity, uniqueness and smoothness. The epipolarity guides the process. The similarity and uniqueness are mapped once again through the Support Vector Machines, but under a different way to the previous case; after this an initial disparity map is obtained. This map is later filtered by applying the Discrete Simulated Annealing framework where the smoothness constraint is conveniently mapped. The combination of the segmentation and stereovision matching approaches makes the main contribution. The method is compared against the usage of simple features and combined similarity matching strategies.  相似文献   

4.
Disparity flow depicts the 3D motion of a scene in the disparity space of a given view and can be considered as view-dependent scene flow. A novel algorithm is presented to compute disparity maps and disparity flow maps in an integrated process. Consequently, the disparity flow maps obtained helps to enforce the temporal consistency between disparity maps of adjacent frames. The disparity maps found also provides the spatial correspondence information that can be used to cross-validate disparity flow maps of different views. Two different optimization approaches are integrated in the presented algorithm for searching optimal disparity values and disparity flows. The local winner-take-all approach runs faster, whereas the global dynamic programming based approach produces better results. All major computations are performed in the image space of the given view, leading to an efficient implementation on programmable graphics hardware. Experimental results on captured stereo sequences demonstrate the algorithm’s capability of estimating both 3D depth and 3D motion in real-time. Quantitative performance evaluation using synthetic data with ground truth is also provided.  相似文献   

5.
在立体视觉中,视差间接反映物体的深度信息,视差计算是深度计算的基础。常见的视差计算方法研究都是面向双目立体视觉,而双焦单目立体视觉的视差分布不同于双目视差,具有沿极线辐射的特点。针对双焦单目立体视觉的特点,提出了一种单目立体视差的计算方法。对于计算到的初步视差图,把视差点分类为匹配计算点和误匹配点。通过均值偏移向量(Mean Shift)算法,实现了对误匹配点依赖于匹配点和图像分割的视差估计,最终得到致密准确的视差图。实验证明,这种方法可以通过双焦立体图像对高效地获得场景的视差图。  相似文献   

6.
Two novel systems computing dense three-dimensional (3-D) scene flow and structure from multiview image sequences are described in this paper. We do not assume rigidity of the scene motion, thus allowing for nonrigid motion in the scene. The first system, integrated model-based system (IMS), assumes that each small local image region is undergoing 3-D affine motion. Non-linear motion model fitting based on both optical flow constraints and stereo constraints is then carried out on each local region in order to simultaneously estimate 3-D motion correspondences and structure. The second system is based on extended gradient-based system (EGS), a natural extension of two-dimensional (2-D) optical flow computation. In this method, a new hierarchical rule-based stereo matching algorithm is first developed to estimate the initial disparity map. Different available constraints under a multiview camera setup are further investigated and utilized in the proposed motion estimation. We use image segmentation information to adopt and maintain the motion and depth discontinuities. Within the framework for EGS, we present two different formulations for 3-D scene flow and structure computation. One formulation assumes that initial disparity map is accurate, while the other does not. Experimental results on both synthetic and real imagery demonstrate the effectiveness of our 3-D motion and structure recovery schemes. Empirical comparison between IMS and EGS is also reported.  相似文献   

7.
In stereovision, indices allowing pixels of the left and right images to be matched are basically one-dimensional features of the epipolar lines. In some situations, these features are not significant or cannot be extracted from the single epipolar line. Therefore, many techniques use 2D neighbourhoods to increase the available information. In this paper, we discuss the systematic use of 2D neighbourhoods for stereo matching. We propose an alternative approach to stereo matching using multiple 1D correlation windows, which yields a semi-dense disparity map and an associated confidence map. A particular technique derived from this approach — using fuzzy filtering and a basic decision rule — is compared to about 80 other methods on the Middlebury image datasets [1]. Results are first presented in the framework of the Middlebury website, then on the Receiver Operating Characteristics (ROC) evaluation [2] and, finally, on stereo image pairs of slanted surfaces. We show that a 1D correlation window is sufficient to provide correct matchings in most cases.  相似文献   

8.
Genetic-Based Stereo Algorithm and Disparity Map Evaluation   总被引:8,自引:0,他引:8  
In this paper, a new genetic-based stereo algorithm is presented. Our motivation is to improve the accuracy of the disparity map by removing the mismatches caused by both occlusions and false targets. In our approach, the stereo matching problem is considered as an optimization problem. The algorithm first takes advantage of multi-view stereo images to detect occlusions, and therefore, removes mismatches caused by visibility problems. By optimizing the compatibility between corresponding points and the continuity of the disparity map using a genetic algorithm, mismatches caused by false targets are removed. The quadtree structure is used to implement the multi-resolution framework. Since nodes at different level of the quadtree cover different number of pixels, selecting nodes at different levels gives a similar effect as adjusting the window size at different locations of the image. The experimental results show that our approach can generate more accurate disparity maps than two existing approaches. In addition, we introduce a new disparity map evaluation technique, which is developed based on a similar technique employed in the image segmentation area. Comparing with two existing evaluation approaches, the new technique can evaluate the disparity maps generated without additional knowledge of the scene, such as the correct depth information or novel views.  相似文献   

9.
Recently, stereovision has appeared in robotics as a source of information for real-time mapping and path planning. In this paper, an intelligent motion system for mobile robots is designed and implemented using stereovision. The proposed system uses stereovision as a primary method for sensing the environment, and the system is able to navigate intelligently in an indoor environment with varying degrees of obstacle complexity. It creates noiseless and high-confidence 3D point clouds and uses these point clouds as an input for the mapping and path-planning modules. The proposed system was built by developing, enhancing, and integrating various techniques, modules and algorithms. The Stereovision-based Path-planning module is the integration of three main enhanced techniques: (1) the multi-baseline multi-view stereovision filter (MMSVF), (2) accurate floor detection and segmentation (AFDS), and (3) the intelligent gazing module (IGM). This Stereovision-based Path planning (MMSVF, IGM, and AFDS) was integrated with the Fuzzy Logic Motion Controller (FLMC). All techniques, modules and algorithms are implemented using a multi-threaded and client–server-based architecture. To prove the viability and robustness of our proposed system, we have integrated all components of the system into a fully functional mobile robot navigation system. We compared the performance of the main modules with that of similar modules in the literatures, and showed that our modules had better performance. Testing the whole system is more important than just testing each module individually. To the best of our knowledge, the literatures lack such testing. Hence, in this paper we present the performance of our complete integrated system in different environments using different parameters and different architectures.  相似文献   

10.
目的 双目视觉是目标距离估计问题的一个很好的解决方案。现有的双目目标距离估计方法存在估计精度较低或数据准备较繁琐的问题,为此需要一个可以兼顾精度和数据准备便利性的双目目标距离估计算法。方法 提出一个基于R-CNN(region convolutional neural network)结构的网络,该网络可以实现同时进行目标检测与目标距离估计。双目图像输入网络后,通过主干网络提取特征,通过双目候选框提取网络以同时得到左右图像中相同目标的包围框,将成对的目标框内的局部特征输入目标视差估计分支以估计目标的距离。为了同时得到左右图像中相同目标的包围框,使用双目候选框提取网络代替原有的候选框提取网络,并提出了双目包围框分支以同时进行双目包围框的回归;为了提升视差估计的精度,借鉴双目视差图估计网络的结构,提出了一个基于组相关和3维卷积的视差估计分支。结果 在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)数据集上进行验证实验,与同类算法比较,本文算法平均相对误差值约为3.2%,远小于基于双目视差图估计算法(11.3%),与基于3维目标检测的算法接近(约为3.9%)。另外,提出的视差估计分支改进对精度有明显的提升效果,平均相对误差值从5.1%下降到3.2%。通过在另外采集并标注的行人监控数据集上进行类似实验,实验结果平均相对误差值约为4.6%,表明本文方法可以有效应用于监控场景。结论 提出的双目目标距离估计网络结合了目标检测与双目视差估计的优势,具有较高的精度。该网络可以有效运用于车载相机及监控场景,并有希望运用于其他安装有双目相机的场景。  相似文献   

11.
Occlusions and binocular stereo   总被引:10,自引:2,他引:8  
Binocular stereo is the process of obtaining depth information from a pair of cameras. In the past, stereo algorithms have had problems at occlusions and have tended to fail there (though sometimes post-processing has been added to mitigate the worst effects). We show that, on the contrary, occlusions can help stereo computation by providing cues for depth discontinuities.We describe a theory for stereo based on the Bayesian approach, using adaptive windows and a prior weak smoothness constraint, which incorporates occlusion. Our model assumes that a disparity discontinuity, along the epipolar line, in one eyealways corresponds to an occluded region in the other eye thus, leading to anocclusion constraint. This constraint restricts the space of possible disparity values, thereby simplifying the computations. An estimation of the disparity at occluded features is also discussed in light of psychophysical experiments. Using dynamic programming we can find the optimal solution to our system and the experimental results are good and support the assumptions made by the model.  相似文献   

12.
Block matching along epipolar lines is the core of most stereovision algorithms in geographic information systems. The usual distances between blocks are the sum of squared distances in the block (SSD) or the correlation. Minimizing these distances causes the fattening effect, by which the center of the block inherits the disparity of the more contrasted pixels in the block. This fattening error occurs everywhere in the image, and not just on strong depth discontinuities. The fattening effect at strong depth edges is a particular case of fattening, called foreground fattening effect. A theorem proved in the present paper shows that a simple and universal adaptive weighting of the SSD resolves the fattening problem at all smooth disparity points (a Spanish patent has been applied for by Universitat de Illes Balears (Reference P25155ES00, UIB, 2009)). The optimal SSD weights are nothing but the inverses of the squares of the image gradients in the epipolar direction. With these adaptive weights, it is shown that the optimal disparity function is the result of the convolution of the real disparity with a prefixed kernel. Experiments on simulated and real pairs prove that the method does what the theorem predicts, eliminating surface bumps caused by fattening. However, the method does not resolve the foreground fattening.  相似文献   

13.
Biologically-inspired event-driven silicon retinas, so called dynamic vision sensors (DVS), allow efficient solutions for various visual perception tasks, e.g. surveillance, tracking, or motion detection. Similar to retinal photoreceptors, any perceived light intensity change in the DVS generates an event at the corresponding pixel. The DVS thereby emits a stream of spatiotemporal events to encode visually perceived objects that in contrast to conventional frame-based cameras, is largely free of redundant background information. The DVS offers multiple additional advantages, but requires the development of radically new asynchronous, event-based information processing algorithms. In this paper we present a fully event-based disparity matching algorithm for reliable 3D depth perception using a dynamic cooperative neural network. The interaction between cooperative cells applies cross-disparity uniqueness-constraints and within-disparity continuity-constraints, to asynchronously extract disparity for each new event, without any need of buffering individual events. We have investigated the algorithm’s performance in several experiments; our results demonstrate smooth disparity maps computed in a purely event-based manner, even in the scenes with temporally-overlapping stimuli.  相似文献   

14.
基于体视显微镜(Stereolightmicroscope,SLM)的显微立体视觉已经在微操作领域应用。本文研究了基于SLM显微立体视觉模型的微操作系统中的三维微观定位问题。通过对SLM双光路的分析,给出了描述二维图像空间和三维物空间映射的弱非线性显微立体视觉模型。采用立体匹配算法和目标识别两种方式对运动图像序列中的目标对象进行捕捉,可以批量给出立体图像中相关点的坐标。利用显微立体视觉理论模型和显微图像处理实现了微操作系统中的微观3D定位。  相似文献   

15.
In this paper, we propose a novel stereo method for registering foreground objects in a pair of thermal and visible videos of close-range scenes. In our stereo matching, we use Local Self-Similarity (LSS) as similarity metric between thermal and visible images. In order to accurately assign disparities to depth discontinuities and occluded Region Of Interest (ROI), we have integrated color and motion cues as soft constraints in an energy minimization framework. The optimal disparity map is approximated for image ROIs using a Belief Propagation (BP) algorithm. We tested our registration method on several challenging close-range indoor video frames of multiple people at different depths, with different clothing, and different poses. We show that our global optimization algorithm significantly outperforms the existing state-of-the art method, especially for disparity assignment of occluded people at different depth in close-range surveillance scenes and for relatively large camera baseline.  相似文献   

16.
Small Baseline Stereovision   总被引:1,自引:0,他引:1  
This paper presents a study of small baseline stereovision. It is generally admitted that because of the finite resolution of images, getting a good precision in depth from stereovision demands a large angle between the views. In this paper, we show that under simple and feasible hypotheses, small baseline stereovision can be rehabilitated and even favoured. The main hypothesis is that the images should be band limited, in order to achieve sub-pixel precisions in the matching process. This assumption is not satisfied for common stereo pairs. Yet, this becomes realistic for recent spatial or aerian acquisition devices. In this context, block-matching methods, which had become somewhat obsolete for large baseline stereovision, regain their relevance. A multi-scale algorithm dedicated to small baseline stereovision is described along with experiments on small angle stereo pairs at the end of the paper.  相似文献   

17.
Julian Martin  Bart 《Neurocomputing》2008,71(7-9):1629-1641
We introduce a model for the computation of structure from motion based on the physiology of visual cortical areas MT and MST. The model assumes that the perception of depth from motion is related to the firing of a subset of MT neurons tuned to both velocity and disparity. The model's MT neurons are connected to each other laterally to form modulatory receptive-field surrounds that are gated by feedback connections from area MST. This allows the building up of a depth map from motion in area MT, even in absence of disparity in the input. Depth maps from motion and from stereo are combined by a weighted average at a final stage. The model's predictions for the interaction between motion and stereo cues agree with previous psychophysical data, both when the cues are consistent with each other or when they are contradictory. In particular, the model shows nonlinearities as a result of early interactions between motion and stereo before their depth maps are averaged. The two cues interact in a way that represents an alternative to the “modified weak fusion” model of depth–cue combination.  相似文献   

18.
In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-the-wild conditions including challenging scenes with people of different sizes. Code: https://github.com/dluvizon/scene-aware-3d-multi-human  相似文献   

19.
A new sense for depth of field   总被引:19,自引:0,他引:19  
This paper examines a novel source of depth information: focal gradients resulting from the limited depth of field inherent in most optical systems. Previously, autofocus schemes have used depth of field to measured depth by searching for the lens setting that gives the best focus, repeating this search separately for each image point. This search is unnecessary, for there is a smooth gradient of focus as a function of depth. By measuring the amount of defocus, therefore, we can estimate depth simultaneously at all points, using only one or two images. It is proved that this source of information can be used to make reliable depth maps of useful accuracy with relatively minimal computation. Experiments with realistic imagery show that measurement of these optical gradients can provide depth information roughly comparable to stereo disparity or motion parallax, while avoiding image-to-image matching problems.  相似文献   

20.
立体匹配是计算机视觉领域中的一个重要的热门研究课题,为了获得性能更优的稠密视差图,通过把偏微分方程理论运用于机器视觉中,提出了一种新的基于能量函数获取稠密视差图(disparity map)的方法,并首先分析了匹配点对在不同相对位置下对匹配项产生的影响;接着提出了适用于视差图的各向异性的热扩散方程,它不仅继承了Alvarez定义的正则项对初始视差图内部平滑和保持边缘不连续的特性,还通过引入图像的噪声屏蔽函数和二阶方向导数来分别控制对应视差图中不同区域的扩散速度和角点处的扩散方向;最后通过定义的正则项和匹配项来构造新的能量函数,并把基于区域匹配算法得到的视差图作为初始值,再利用最速下降法求解相应的最小能量泛函。实验结果表明,无论从视觉效果上,还是重构深度图的判别上,该新算法都取得了更优的性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号