首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Extracting View-Dependent Depth Maps from a Collection of Images   总被引:1,自引:0,他引:1  
Stereo correspondence algorithms typically produce a single depth map. In addition to the usual problems of occlusions and textureless regions, such algorithms cannot model the variation in scene or object appearance with respect to the viewing position. In this paper, we propose a new representation that overcomes the appearance variation problem associated with an image sequence. Rather than estimating a single depth map, we associate a depth map with each input image (or a subset of them). Our representation is motivated by applications such as view interpolation and depth-based segmentation for model-building or layer extraction. We describe two approaches to extract such a representation from a sequence of images.The first approach, which is more classical, computes the local depth map associated with each chosen reference frame independently. The novelty of this approach lies in its combination of shiftable windows, temporal selection, and graph cut optimization. The second approach simultaneously optimizes a set of self-consistent depth maps at multiple key-frames. Since multiple depth maps are estimated simultaneously, visibility can be modeled explicitly and disparity consistency imposed across the different depth maps. Results, which include a difficult specular scene example, show the effectiveness of our approach.  相似文献   

2.
We present an integrated, fully GPU‐based processing pipeline to interactively render new views of arbitrary scenes from calibrated but otherwise unstructured input views. In a two‐step procedure, our method first generates for each input view a dense proxy of the scene using a new multi‐view stereo formulation. Each scene proxy consists of a structured cloud of feature aware particles which automatically have their image space footprints aligned to depth discontinuities of the scene geometry and hence effectively handle sharp object boundaries and occlusions. We propose a particle optimization routine combined with a special parameterization of the view space that enables an efficient proxy generation as well as robust and intuitive filter operators for noise and outlier removal. Moreover, our generic proxy generation allows us to flexibly handle scene complexities ranging from small objects up to complete outdoor scenes. The second phase of the algorithm combines these particle clouds in real‐time into a view‐dependent proxy for the desired output view and performs a pixel‐accurate accumulation of the colour contributions from each available input view. This makes it possible to reconstruct even fine‐scale view‐dependent illumination effects. We demonstrate how all these processing stages of the pipeline can be implemented entirely on the GPU with memory efficient, scalable data structures for maximum performance. This allows us to generate new output renderings of high visual quality from input images in real‐time.  相似文献   

3.
Crowded motions refer to multiple objects moving around and interacting such as crowds, pedestrians and etc. We capture crowded scenes using a depth scanner at video frame rates. Thus, our input is a set of depth frames which sample the scene over time. Processing such data is challenging as it is highly unorganized, with large spatio‐temporal holes due to many occlusions. As no correspondence is given, locally tracking 3D points across frames is hard due to noise and missing regions. Furthermore global segmentation and motion completion in presence of large occlusions is ambiguous and hard to predict. Our algorithm utilizes Gestalt principles of common fate and good continuity to compute motion tracking and completion respectively. Our technique does not assume any pre‐given markers or motion template priors. Our key‐idea is to reduce the motion completion problem to a 1D curve fitting and matching problem which can be solved efficiently using a global optimization scheme. We demonstrate our segmentation and completion method on a variety of synthetic and real world crowded scanned scenes.  相似文献   

4.
目的 光场相机通过一次成像同时记录场景的空间信息和角度信息,获取多视角图像和重聚焦图像,在深度估计中具有独特优势。遮挡是光场深度估计中的难点问题之一,现有方法没有考虑遮挡或仅仅考虑单一遮挡情况,对于多遮挡场景点,方法失效。针对遮挡问题,在多视角立体匹配框架下,提出了一种对遮挡鲁棒的光场深度估计算法。方法 首先利用数字重聚焦算法获取重聚焦图像,定义场景的遮挡类型,并构造相关性成本量。然后根据最小成本原则自适应选择最佳成本量,并求解局部深度图。最后利用马尔可夫随机场结合成本量和平滑约束,通过图割算法和加权中值滤波获取全局优化深度图,提升深度估计精度。结果 实验在HCI合成数据集和Stanford Lytro Illum实际场景数据集上展开,分别进行局部深度估计与全局深度估计实验。实验结果表明,相比其他先进方法,本文方法对遮挡场景效果更好,均方误差平均降低约26.8%。结论 本文方法能够有效处理不同遮挡情况,更好地保持深度图边缘信息,深度估计结果更准确,且时效性更好。此外,本文方法适用场景是朗伯平面场景,对于含有高光的非朗伯平面场景存在一定缺陷。  相似文献   

5.
A compact visual representation, called the 3D layered, adaptive-resolution, and multi-perspective panorama (LAMP), is proposed for representing large-scale 3D scenes with large variations of depths and obvious occlusions. Two kinds of 3D LAMP representations are proposed: the relief-like LAMP and the image-based LAMP. Both types of LAMPs concisely represent almost all the information from a long image sequence. Methods to construct LAMP representations from video sequences with dominant translation are provided. The relief-like LAMP is basically a single extended multi-perspective panoramic view image. Each pixel has a pair of texture and depth values, but each pixel may also have multiple pairs of texture-depth values to represent occlusion in layers, in addition to adaptive resolution changing with depth. The image-based LAMP, on the other hand, consists of a set of multi-perspective layers, each of which has a pair of 2D texture and depth maps, but with adaptive time-sampling scales depending on depths of scene points. Several examples of 3D LAMP construction for real image sequences are given. The 3D LAMP is a concise and powerful representation for image-based rendering.  相似文献   

6.
Rational Filters for Passive Depth from Defocus   总被引:9,自引:0,他引:9  
A fundamental problem in depth from defocus is the measurement of relative defocus between images. The performance of previously proposed focus operators are inevitably sensitive to the frequency spectra of local scene textures. As a result, focus operators such as the Laplacian of Gaussian result in poor depth estimates. An alternative is to use large filter banks that densely sample the frequency space. Though this approach can result in better depth accuracy, it sacrifices the computational efficiency that depth from defocus offers over stereo and structure from motion. We propose a class of broadband operators that, when used together, provide invariance to scene texture and produce accurate and dense depth maps. Since the operators are broadband, a small number of them are sufficient for depth estimation of scenes with complex textural properties. In addition, a depth confidence measure is derived that can be computed from the outputs of the operators. This confidence measure permits further refinement of computed depth maps. Experiments are conducted on both synthetic and real scenes to evaluate the performance of the proposed operators. The depth detection gain error is less than 1%, irrespective of texture frequency. Depth accuracy is found to be 0.51.2% of the distance of the object from the imaging optics.  相似文献   

7.
On-the-Fly Processing of Generalized Lumigraphs   总被引:2,自引:0,他引:2  
We introduce a flexible and powerful concept for reconstructing arbitrary views from multiple source images on the fly. Our approach is based on a Lumigraph structure with per-pixel depth values, and generalizes the classical two-plane parameterized light fields and Lumigraphs. With our technique, it is possible to render arbitrary views of time-varying, non-diffuse scenes at interactive frame rates, and it allows using any kind of sensor that yields images with dense depth information. We demonstrate the flexibility and efficiency of our approach through various examples.  相似文献   

8.
In this paper, we introduce a novel technique for pre‐filtering multi‐layer shadow maps. The occluders in the scene are stored as variable‐length lists of fragments for each texel. We show how this representation can be filtered by progressively merging these lists. In contrast to previous pre‐filtering techniques, our method better captures the distribution of depth values, resulting in a much higher shadow quality for overlapping occluders and occluders with different depths. The pre‐filtered maps are generated and evaluated directly on the GPU, and provide efficient queries for shadow tests with arbitrary filter sizes. Accurate soft shadows are rendered in real‐time even for complex scenes and difficult setups. Our results demonstrate that our pre‐filtered maps are general and particularly scalable.  相似文献   

9.
基于光场的渲染技术研究   总被引:1,自引:0,他引:1  
基于图像的渲染技术(Image-Based Rendering,IBR)逐渐成为场景渲染的主要手段之一,目前已有大量的基于此技术的方法提出,其中光场渲染(Light Field Renderin,LFR)是在不需要图像的深度信息或相关性的条件下,通过相机阵列或由一个相机按设计好的路径移动把场景拍摄下来作为输入图像集,对于任意给定的新视点,找出该视点邻近的几个采样点进行简单的重新采样,就能得到该视点处的视图.本文采用两平面参数化的方法设计实施了一套光场渲染的软件方案,用光场渲染的方法实现了真实场景中物体的实时漫游.  相似文献   

10.
11.
12.
王伟  余淼  胡占义 《自动化学报》2014,40(12):2782-2796
提出一种高精度的基于匹配扩散的稠密深度图估计算法. 算法分为像素级与区域级两阶段的匹配扩散过程.前者主要对视图间的稀疏特征点匹配进行扩散以获取相对稠密的初始深度图; 而后者则在多幅初始深度图的基础上, 根据场景分段平滑的假设, 在能量函数最小化框架下利用平面拟合及多方向平面扫描等方法解决存在匹配多义性问题区域(如弱纹理区域)的深度推断问题. 在标准数据集及真实数据集上的实验表明, 本文算法对视图中的光照变化、透视畸变等因素具有较强的适应性, 并能有效地对弱纹理区域的深度信息进行推断, 从而可以获得高精度、稠密的深度图.  相似文献   

13.
The advancements in three-dimensional (3D) display technology have led to a wide interest in light-field display. However, the need to simultaneously capture a large number of object views made content generation for light-field displays still a bottleneck. In this paper, we propose a method for light-field content generation based on plane-depth-fused sweep volume (PDFSV), focusing on handling wide-baseline views and exhibiting scene generalization when the camera array remains unchanged. Specifically, the proposed PDFSV exploits the prior depth of the images captured by a 4 × 4 spherical camera array to represent 3D information of scenes. Then two optimized sequential convolutional neural networks (CNN) are employed for implicit depth modeling and final color calculation, respectively. By doing these, the prior depth facilitates the synthesis of regions with complex textures in the target view. We produce a Wide-baseline Multi-view Image Set (WMIS) which has a field of view (FOV) angle reaching 54°and could be publicly available for access. In our experiments, we use only the 4 vertex views as input. Results demonstrate that the proposed approach can synthesize high-quality views at arbitrary positions between sparse views, outperforming existing neural-radiance-fields-based (NeRF-based) methods. Finally, we conduct autostereoscopic display experiments, achieving satisfactory results.  相似文献   

14.
现有基于深度学习的显著性检测算法主要针对二维RGB图像设计,未能利用场景图像的三维视觉信息,而当前光场显著性检测方法则多数基于手工设计,特征表示能力不足,导致上述方法在各种挑战性自然场景图像上的检测效果不理想。提出一种基于卷积神经网络的多模态多级特征精炼与融合网络算法,利用光场图像丰富的视觉信息,实现面向四维光场图像的精准显著性检测。为充分挖掘三维视觉信息,设计2个并行的子网络分别处理全聚焦图像和深度图像。在此基础上,构建跨模态特征聚合模块实现对全聚焦图像、焦堆栈序列和深度图3个模态的跨模态多级视觉特征聚合,以更有效地突出场景中的显著性目标对象。在DUTLF-FS和HFUT-Lytro光场基准数据集上进行实验对比,结果表明,该算法在5个权威评估度量指标上均优于MOLF、AFNet、DMRA等主流显著性目标检测算法。  相似文献   

15.
3D video billboard clouds reconstruct and represent a dynamic three-dimensional scene using displacement-mapped billboards. They consist of geometric proxy planes augmented with detailed displacement maps and combine the generality of geometry-based 3D video with the regularization properties of image-based 3D video. 3D video billboards are an image-based representation placed in the disparity space of the acquisition cameras and thus provide a regular sampling of the scene with a uniform error model. We propose a general geometry filtering framework which generates time-coherent models and removes reconstruction and quantization noise as well as calibration errors. This replaces the complex and time-consuming sub-pixel matching process in stereo reconstruction with a bilateral filter. Rendering is performed using a GPU-accelerated algorithm which generates consistent view-dependent geometry and textures for each individual frame. In addition, we present a semi-automatic approach for modeling dynamic three-dimensional scenes with a set of multiple 3D video billboards clouds.  相似文献   

16.
Depth and visual hulls are useful for quick reconstruction and rendering of a 3D object based on a number of reference views. However, for many scenes, especially multi‐object, these hulls may contain significant artifacts known as phantom geometry. In depth hulls the phantom geometry appears behind the scene objects in regions occluded from all the reference views. In visual hulls the phantom geometry may also appear in front of the objects because there is not enough information to unambiguously imply the object positions. In this work we identify which parts of the depth and visual hull might constitute phantom geometry. We define the notion of reduced depth hull and reduced visual hull as the parts of the corresponding hull that are phantom‐free. We analyze the role of the depth information in identification of the phantom geometry. Based on this, we provide an algorithm for rendering the reduced depth hull at interactive frame‐rates and suggest an approach for rendering the reduced visual hull. The rendering algorithms take advantage of modern GPU programming techniques. Our techniques bypass explicit reconstruction of the hulls, rendering the reduced depth or visual hull directly from the reference views.  相似文献   

17.
目的 使用运动历史点云(MHPC)进行人体行为识别的方法,由于点云数据量大,在提取特征时运算复杂度很高。而使用深度运动图(DMM)进行人体行为识别的方法,提取特征简单,但是包含的动作信息不全面,限制了人体行为识别精度的上限。针对上述问题,提出了一种多视角深度运动图的人体行为识别算法。方法 首先采用深度图序列生成MHPC对动作进行表示,接着将MHPC旋转特定角度补充更多视角下的动作信息;然后将原始和旋转后MHPC投影到笛卡儿坐标平面,生成多视角深度运动图,并对其提取方向梯度直方图,采用串联融合生成特征向量;最后使用支持向量机对特征向量进行分类识别,在MSR Action3D和自建数据库上对算法进行验证。结果 MSR Action3D数据库有2种实验设置,采用实验设置1时,算法识别率为96.8%,比APS_PHOG(axonometric projections and PHOG feature)算法高2.5%,比DMM算法高1.9%,比DMM_CRC(depth motion maps and collaborative representation classifier)算法高1.1%。采用实验设置2时,算法识别率为93.82%,比DMM算法高5.09%,比HON4D(histogram of oriented 4D surface normal)算法高4.93%。在自建数据库上该算法识别率达到97.98%,比MHPC算法高3.98%。结论 实验结果表明,多视角深度运动图不但解决了MHPC提取特征复杂的问题,而且使DMM包含了更多视角下的动作信息,有效提高了人体行为识别的精度。  相似文献   

18.
A vision–based 3-D scene analysis system is described that is capable to model complex real–world scenes like streets and buildings automatically from stereoscopic image pairs. Input to the system is a sequence of stereoscopic images taken with two standard CCD Cameras and TV lenses. The relative orientation of both cameras to each other is known by calibration. The camerapair is then moved throughout the scene and a long sequence of closely spaced views is recorded. Each of the stereoscopic image pairs is rectified and a dense map of 3-D suface points is obtained by area correlation, object segmentation, interpolation, and triangulation. 3-D camera motion relative to the scene coordinate system is tracked directly from the image sequence which allows to fuse 3-D surface measurements from different viewpoints into a consistent 3-D model scene. The surface geometry of each scene object is approximated by a triangular surface mesh which stores the suface texture in a texture map. From the textured 3-D models, realistic looking image sequences from arbitrary view points can be synthesized using computer graphics.  相似文献   

19.
Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multi-view approach to solving this problem. In our approach we neither detect nor track objects from any single camera or camera pair; rather evidence is gathered from all the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multi-view approaches that require fully calibrated views our approach is purely image-based and uses only 2D constructs. To this end we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views, to resolve occlusions and localize people on a reference scene plane. For greater robustness this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis, are demonstrated in challenging multi-view, crowded scenes.  相似文献   

20.
We present an image‐based rendering system to viewpoint‐navigate through space and time of complex real‐world, dynamic scenes. Our approach accepts unsynchronized, uncalibrated multivideo footage as input. Inexpensive, consumer‐grade camcorders suffice to acquire arbitrary scenes, for example in the outdoors, without elaborate recording setup procedures, allowing also for hand‐held recordings. Instead of scene depth estimation, layer segmentation or 3D reconstruction, our approach is based on dense image correspondences, treating view interpolation uniformly in space and time: spatial viewpoint navigation, slow motion or freeze‐and‐rotate effects can all be created in the same way. Acquisition simplification, integration of moving cameras, generalization to difficult scenes and space–time symmetric interpolation amount to a widely applicable virtual video camera system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号