首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new algorithm capable of estimating disparity gradients to produce accurate dense disparities is proposed. Such a disparity gradient plays a critical role in acquiring accurate disparities for scenes with many different object shapes. The target is a road traffic scene because it contains various objects, including the road surface, vehicles, pedestrians, sidewalks, and walls. In this paper, we adopt several methods, such as initial matching cost computation, scanline optimization, left/right consistency check, and cost aggregation. However, disparity accuracy is slightly improved by the simple organization of such methods. Disparity quality decisively relies on the application of disparity gradients. Accordingly, in the proposed algorithm, cost aggregation is performed along the direction of the estimated disparity gradient in a disparity space image. This approach improves disparity quality significantly. However, this cost aggregation is time consuming. To reduce the time required, we designed a new 2D integral cost technique. The robustness of the proposed algorithm is demonstrated through the disparity maps obtained from standard images on the Web, indoor images, and outdoor images of various road traffic scenes.  相似文献   

2.
This paper presents an efficient image-based approach to navigate a scene based on only three wide-baseline uncalibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images, an accurate trifocal plane is extracted from the trifocal tensor of these three images. Next, based on a small number of feature marks using a friendly GUI, the correct dense disparity maps are obtained by using our trinocular-stereo algorithm. Employing the barycentric warping scheme with the computed disparity, we can generate an arbitrary novel view within a triangle spanned by three camera centers. Furthermore, after self-calibration of the cameras, 3D objects can be correctly augmented into the virtual environment synthesized by the tri-view morphing algorithm. Three applications of the tri-view morphing algorithm are demonstrated. The first one is 4D video synthesis, which can be used to fill in the gap between a few sparsely located video cameras to synthetically generate a video from a virtual moving camera. This synthetic camera can be used to view the dynamic scene from a novel view instead of the original static camera views. The second application is multiple view morphing, where we can seamlessly fly through the scene over a 2D space constructed by more than three cameras. The last one is dynamic scene synthesis using three still images, where several rigid objects may move in any orientation or direction. After segmenting three reference frames into several layers, the novel views in the dynamic scene can be generated by applying our algorithm. Finally, the experiments are presented to illustrate that a series of photo-realistic virtual views can be generated to fly through a virtual environment covered by several static cameras.  相似文献   

3.
An approach for incremental refinement of disparity maps obtained from a dynamic stereo sequence of a static scene is presented. The approach has been implemented using a binocular stereo vision system mounted on a mobile robot. A robust least median of squares based algorithm is given for recovering the camera motion between successive viewpoints, which provides a self-calibration mechanism. The recovered motion is utilized for recursive disparity prediction and refinement using a robust Kalman filter model  相似文献   

4.
Building upon recent developments in optical flow and stereo matching estimation, we propose a variational framework for the estimation of stereoscopic scene flow, i.e., the motion of points in the three-dimensional world from stereo image sequences. The proposed algorithm takes into account image pairs from two consecutive times and computes both depth and a 3D motion vector associated with each point in the image. In contrast to previous works, we partially decouple the depth estimation from the motion estimation, which has many practical advantages. The variational formulation is quite flexible and can handle both sparse or dense disparity maps. The proposed method is very efficient; with the depth map being computed on an FPGA, and the scene flow computed on the GPU, the proposed algorithm runs at frame rates of 20 frames per second on QVGA images (320×240 pixels). Furthermore, we present solutions to two important problems in scene flow estimation: violations of intensity consistency between input images, and the uncertainty measures for the scene flow result.  相似文献   

5.
Multiview video involves a huge amount of data, and as such, efficiently encoding each view is a critical issue for its wider application. In this paper, a fast motion and disparity estimation algorithm is proposed, utilizing the close correlation between temporal and interview reference frames. First, a reliable predictor is found according to the correlation of motion and disparity vectors. Second, an iterative search process is carried out to find the optimal motion and disparity vectors. The proposed algorithm makes use of the prediction vector obtained in the previous motion estimation for the next disparity estimation and achieves both optimal motion and disparity vectors jointly. Experimental results demonstrate that the proposed algorithm can successfully save an average of 86% of computational time with a negligible quality drop when compared to the joint multiview video model (JMVM) full search algorithm. Furthermore, in comparison with the conventional simulcast coding, the proposed algorithm enhances the video quality and also greatly increases coding speed.  相似文献   

6.
With the development of multimedia, the technology of image and video is growing from 2D to 3D, thus interactivity is going to become a main character of future multimedia technology. Virtual view synthesis, known as the analogue expression to the real world, is one of the key techniques in interactive 3D video systems. This paper proposes a new algorithm of virtual view synthesis, which is based on disparity estimation. Considering two rectified input reference images that are taken in the same scene simultaneously, the accurate dense disparity maps are gained as the first step. Then image interpolation is used to synthesize certain virtual view image and reverse mapping is adopted to fill up the holes which are formed in previous process. By defining a position parameter, this algorithm can produce results at an arbitrary view between the two original views. Experimental results illustrate the superiority of the proposed method.  相似文献   

7.
Multiview video involves a huge amount of data, and as such, efficiently encoding each view is a critical issue for its wider application. In this paper, a fast motion and disparity estimation algorithm is proposed, utilizing the close correlation between temporal and inter-view reference frames. First, a reliable predictor is found according to the correlation of motion and disparity vectors. Second, an iterative search process is carried out to find the optimal motion and disparity vectors. The proposed algorithm makes use of the prediction vector obtained in the previous motion estimation for the next disparity estimation and achieves both optimal motion and disparity vectors jointly. Experimental results demonstrate that the proposed algorithm can successfully save an average of 86% of computational time with a negligible quality drop when compared to the joint multiview video model (JMVM) full search algorithm. Furthermore, in comparison with the conventional simulcast coding, the proposed algorithm enhances the video quality and also greatly increases coding speed.  相似文献   

8.
Two novel systems computing dense three-dimensional (3-D) scene flow and structure from multiview image sequences are described in this paper. We do not assume rigidity of the scene motion, thus allowing for nonrigid motion in the scene. The first system, integrated model-based system (IMS), assumes that each small local image region is undergoing 3-D affine motion. Non-linear motion model fitting based on both optical flow constraints and stereo constraints is then carried out on each local region in order to simultaneously estimate 3-D motion correspondences and structure. The second system is based on extended gradient-based system (EGS), a natural extension of two-dimensional (2-D) optical flow computation. In this method, a new hierarchical rule-based stereo matching algorithm is first developed to estimate the initial disparity map. Different available constraints under a multiview camera setup are further investigated and utilized in the proposed motion estimation. We use image segmentation information to adopt and maintain the motion and depth discontinuities. Within the framework for EGS, we present two different formulations for 3-D scene flow and structure computation. One formulation assumes that initial disparity map is accurate, while the other does not. Experimental results on both synthetic and real imagery demonstrate the effectiveness of our 3-D motion and structure recovery schemes. Empirical comparison between IMS and EGS is also reported.  相似文献   

9.
In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-the-wild conditions including challenging scenes with people of different sizes. Code: https://github.com/dluvizon/scene-aware-3d-multi-human  相似文献   

10.
A new divide-and-conquer technique for disparity estimation is proposed in this paper. This technique performs feature matching following the high confidence first principle, starting with the strongest feature point in the stereo pair of scanlines. Once the first matching pair is established, the ordering constraint in disparity estimation allows the original intra-scanline matching problem to be divided into two smaller subproblems. Each subproblem can then be solved recursively until there is no reliable feature point within the subintervals. This technique is very efficient for dense disparity map estimation for stereo images with rich features. For general scenes, this technique can be paired up with the disparity-space image (DSI) technique to compute dense disparity maps with integrated occlusion detection. In this approach, the divide-and-conquer part of the algorithm handles the matching of stronger features and the DSI-based technique handles the matching of pixels in between feature points and the detection of occlusions. An extension to the standard disparity-space technique is also presented to compliment the divide-and-conquer algorithm. Experiments demonstrate the effectiveness of the proposed divide-and-conquer DSI algorithm  相似文献   

11.
Depth estimation in a scene using image pairs acquired by a stereo camera setup, is one of the important tasks of stereo vision systems. The disparity between the stereo images allows for 3D information acquisition which is indispensable in many machine vision applications. Practical stereo vision systems involve wide ranges of disparity levels. Considering that disparity map extraction of an image is a computationally demanding task, practical real-time FPGA based algorithms require increased device utilization resource usage, depending on the disparity levels operational range, which leads to significant power consumption. In this paper a new hardware-efficient real-time disparity map computation module is developed. The module constantly estimates the precisely required range of disparity levels upon a given stereo image set, maintaining this range as low as possible by verging the stereo setup cameras axes. This enables a parallel-pipelined design, for the overall module, realized on a single FPGA device of the Altera Stratix IV family. Accurate disparity maps are computed at a rate of more than 320 frames per second, for a stereo image pair of 640 × 480 pixels spatial resolution with a disparity range of 80 pixels. The presented technique provides very good processing speed at the expense of accuracy, with very good scalability in terms of disparity levels. The proposed method enables a suitable module delivering high performance in real-time stereo vision applications, where space and power are significant concerns.  相似文献   

12.
目的 针对人眼观看立体图像内容可能存在的视觉不舒适性,基于视差对立体图像视觉舒适度的影响,提出了一种结合全局线性和局部非线性视差重映射的立体图像视觉舒适度提升方法。方法 首先,考虑双目融合限制和视觉注意机制,分别结合空间频率和立体显著性因素提取立体图像的全局和局部视差统计特征,并利用支持向量回归构建客观的视觉舒适度预测模型作为控制视差重映射程度的约束;然后,通过构建的预测模型对输入的立体图像的视觉舒适性进行分析,就欠舒适的立体图像设计了一个两阶段的视差重映射策略,分别是视差范围的全局线性重映射和针对提取的潜在欠舒适区域内视差的局部非线性重映射;最后,根据重映射后的视差图绘制得到舒适度提升后的立体图像。结果 在IVY Lab立体图像舒适度测试库上的实验结果表明,相较于相关有代表性的视觉舒适度提升方法对于欠舒适立体图像的处理结果,所提出方法在保持整体场景立体感的同时,能更有效地提升立体图像的视觉舒适度。结论 所提出方法能够根据由不同的立体图像特征构建的视觉舒适度预测模型来自动实施全局线性和局部非线性视差重映射过程,达到既改善立体图像视觉舒适度、又尽量减少视差改变所导致的立体感削弱的目的,从而提升立体图像的整体3维体验。  相似文献   

13.
14.
基于视差空间的双目视觉里程计   总被引:3,自引:0,他引:3  
提出了一种基于视差空间的双目视觉里程计算法.利用SIFT特征点的尺度和旋转不变性,实现左、右图像对特征点的准确匹配,及前后帧间的特征跟踪.在RANSAC框架下对匹配点进行运动估计获得运动参数初始值,然后迭代更新匹配点的视差比值直至收敛.为克服传统算法中3维空间噪声分布不均匀的缺陷,利用了视差空间噪声分布的各向同性的性质进行运动估计,并且通过迭代取得全局最小值.实验结果表明,该算法在运动估计中具有更好的精度.  相似文献   

15.
Genetic-Based Stereo Algorithm and Disparity Map Evaluation   总被引:8,自引:0,他引:8  
In this paper, a new genetic-based stereo algorithm is presented. Our motivation is to improve the accuracy of the disparity map by removing the mismatches caused by both occlusions and false targets. In our approach, the stereo matching problem is considered as an optimization problem. The algorithm first takes advantage of multi-view stereo images to detect occlusions, and therefore, removes mismatches caused by visibility problems. By optimizing the compatibility between corresponding points and the continuity of the disparity map using a genetic algorithm, mismatches caused by false targets are removed. The quadtree structure is used to implement the multi-resolution framework. Since nodes at different level of the quadtree cover different number of pixels, selecting nodes at different levels gives a similar effect as adjusting the window size at different locations of the image. The experimental results show that our approach can generate more accurate disparity maps than two existing approaches. In addition, we introduce a new disparity map evaluation technique, which is developed based on a similar technique employed in the image segmentation area. Comparing with two existing evaluation approaches, the new technique can evaluate the disparity maps generated without additional knowledge of the scene, such as the correct depth information or novel views.  相似文献   

16.
Scene flow provides the 3D motion field of point clouds, which correspond to image pixels. Current algorithms usually need complex stereo calibration before estimating flow, which has strong restrictions on the position of the camera. This paper proposes a monocular camera scene flow estimation algorithm. Firstly, an energy functional is constructed, where three important assumptions are turned into data terms derivation: a brightness constancy assumption, a gradient constancy assumption, and a short time object velocity constancy assumption. Two smooth operators are used as regularization terms. Then, an occluded map computation algorithm is used to ensure estimating scene flow only on un-occluded points. After that, the energy functional is solved with a coarse-to-fine variational equation on Gaussian pyramid, which can prevent the iteration from converging to a local minimum value. The experiment results show that the algorithm can use three sequential frames at least to get scene flow in world coordinate, without optical flow or disparity inputting.  相似文献   

17.
Owing to its generality and efficiency.Cascaded Shadow Maps(CSMs) has an important role in real-time shadow rendering in large scale and complex virtual environments.However,CSMs suffers from redundant rendering problem—objects are rendered undesirably to different shadow map textures when view direction and light direction are not perpendicular.In this paper,we present a light space cascaded shadow maps algorithm.The algorithm splits a scene into non-intersecting layers in light space,and generates one shadow map for each layer through irregular frustum clipping and scene organization,ensuring that any shadow sample point never appears in multiple shadow maps.A succinct shadow determination method is given to choose the optimal shadow map when rendering scenes.We also combine the algorithm with stable cascaded shadow maps and soft shadow algorithm to avoid shadow flicking and produce soft shadows.The results show that the algorithm effectively improves the efficiency and shadow quality of CSMs by avoiding redundant rendering. and can produce high-quality shadow rendering in large scale dynamic environments with real-time performance.  相似文献   

18.
目的 双目视觉是目标距离估计问题的一个很好的解决方案。现有的双目目标距离估计方法存在估计精度较低或数据准备较繁琐的问题,为此需要一个可以兼顾精度和数据准备便利性的双目目标距离估计算法。方法 提出一个基于R-CNN(region convolutional neural network)结构的网络,该网络可以实现同时进行目标检测与目标距离估计。双目图像输入网络后,通过主干网络提取特征,通过双目候选框提取网络以同时得到左右图像中相同目标的包围框,将成对的目标框内的局部特征输入目标视差估计分支以估计目标的距离。为了同时得到左右图像中相同目标的包围框,使用双目候选框提取网络代替原有的候选框提取网络,并提出了双目包围框分支以同时进行双目包围框的回归;为了提升视差估计的精度,借鉴双目视差图估计网络的结构,提出了一个基于组相关和3维卷积的视差估计分支。结果 在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)数据集上进行验证实验,与同类算法比较,本文算法平均相对误差值约为3.2%,远小于基于双目视差图估计算法(11.3%),与基于3维目标检测的算法接近(约为3.9%)。另外,提出的视差估计分支改进对精度有明显的提升效果,平均相对误差值从5.1%下降到3.2%。通过在另外采集并标注的行人监控数据集上进行类似实验,实验结果平均相对误差值约为4.6%,表明本文方法可以有效应用于监控场景。结论 提出的双目目标距离估计网络结合了目标检测与双目视差估计的优势,具有较高的精度。该网络可以有效运用于车载相机及监控场景,并有希望运用于其他安装有双目相机的场景。  相似文献   

19.
On the Geometry of Visual Correspondence   总被引:1,自引:1,他引:0  
Image displacement fields—optical flow fields, stereo disparity fields, normal flow fields—due to rigid motion possess a global geometric structure which is independent of the scene in view. Motion vectors of certain lengths and directions are constrained to lie on the imaging surface at particular loci whose location and form depends solely on the 3D motion parameters. If optical flow fields or stereo disparity fields are considered, then equal vectors are shown to lie on conic sections. Similarly, for normal motion fields, equal vectors lie within regions whose boundaries also constitute conics. By studying various properties of these curves and regions and their relationships, a characterization of the structure of rigid motion fields is given. The goal of this paper is to introduce a concept underlying the global structure of image displacement fields. This concept gives rise to various constraints that could form the basis of algorithms for the recovery of visual information from multiple views.  相似文献   

20.
As an observer moves and explores the environment, the visual stimulation in his/her eye is constantly changing. Somehow he/she is able to perceive the spatial layout of the scene, and to discern his/her movement through space. Computational vision researchers have been trying to solve this problem for a number of years with only limited success. It is a difficult problem to solve because the optical flow field is nonlinearly related to the 3D motion and depth parameters.Here, we show that the nonlinear equation describing the optical flow field can be split by an exact algebraic manipulation to form three sets of equations. The first set relates the flow field to only the translational component of 3D motion. Thus, depth and rotation need not be known or estimated prior to solving for translation. Once the translation has been recovered, the second set of equations can be used to solve for rotation. Finally, depth can be estimated with the third set of equations, given the recovered translation and rotation.The algorithm applies to the general case of arbitrary motion with respect to an arbitrary scene. It is simple to compute, and it is plausible biologically. The results reported in this article demonstrate the potential of our new approach, and show that it performs favorably when compared with two other well-known algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号