首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we propose an interactive technique for constructing a 3D scene via sparse user inputs. We represent a 3D scene in the form of a Layered Depth Image (LDI) which is composed of a foreground layer and a background layer, and each layer has a corresponding texture and depth map. Given user‐specified sparse depth inputs, depth maps are computed based on superpixels using interpolation with geodesic‐distance weighting and an optimization framework. This computation is done immediately, which allows the user to edit the LDI interactively. Additionally, our technique automatically estimates depth and texture in occluded regions using the depth discontinuity. In our interface, the user paints strokes on the 3D model directly. The drawn strokes serve as 3D handles with which the user can pull out or push the 3D surface easily and intuitively with real‐time feedback. We show our technique enables efficient modeling of LDI that produce sufficient 3D effects.  相似文献   

2.
Video remains the method of choice for capturing temporal events. However, without access to the underlying 3D scene models, it remains difficult to make object level edits in a single video or across multiple videos. While it may be possible to explicitly reconstruct the 3D geometries to facilitate these edits, such a workflow is cumbersome, expensive, and tedious. In this work, we present a much simpler workflow to create plausible editing and mixing of raw video footage using only sparse structure points (SSP) directly recovered from the raw sequences. First, we utilize user‐scribbles to structure the point representations obtained using structure‐from‐motion on the input videos. The resultant structure points, even when noisy and sparse, are then used to enable various video edits in 3D, including view perturbation, keyframe animation, object duplication and transfer across videos, etc. Specifically, we describe how to synthesize object images from new views adopting a novel image‐based rendering technique using the SSPs as proxy for the missing 3D scene information. We propose a structure‐preserving image warping on multiple input frames adaptively selected from object video, followed by a spatio‐temporally coherent image stitching to compose the final object image. Simple planar shadows and depth maps are synthesized for objects to generate plausible video sequence mimicking real‐world interactions. We demonstrate our system on a variety of input videos to produce complex edits, which are otherwise difficult to achieve.  相似文献   

3.
对于智能机器人来说,正确地理解环境是一项非常重要且充满挑战性的能力,从而成为机器人学领域一个关键问题.随着服务机器人进入家庭成为趋势,让机器人能够依靠自身搭载的传感器和场景理解算法,以自主、可靠的方式感知并理解其所处的环境,识别环境中的各类物体及其相互关系,并建立环境模型,成为自主完成任务和实现人-机器人智能交互的前提.在规模较大的室内空间中,由于机器人常用的RGB-D(RGB depth)视觉传感器(同时获取彩色图像和深度信息)视野有限,使之难以直接获取包含整个区域的单帧图像,但机器人能够运动到不同位置,采集多种视角的图像数据,这些数据总体上能够覆盖整个场景.在此背景下,提出了基于多视角RGB-D图像帧信息融合的室内场景理解算法,在单帧RGB-D图像上进行物体检测和物体关系提取,在多帧RGB-D图像上进行物体实例检测,同时构建对应整个场景的物体关系拓扑图模型.通过对RGB-D图像帧进行划分,提取图像单元的颜色直方图特征,并提出基于最长公共子序列的跨帧物体实例检测方法,确定多帧图像之间的物体对应关联,解决了RGB-D摄像机视角变化影响图像帧融合的问题.最后,在NYUv2(NYU depth dataset v2)数据集上验证了本文算法的有效性.  相似文献   

4.
We introduce a novel method for enabling stereoscopic viewing of a scene from a single pre‐segmented image. Rather than attempting full 3D reconstruction or accurate depth map recovery, we hallucinate a rough approximation of the scene's 3D model using a number of simple depth and occlusion cues and shape priors. We begin by depth‐sorting the segments, each of which is assumed to represent a separate object in the scene, resulting in a collection of depth layers. The shapes and textures of the partially occluded segments are then completed using symmetry and convexity priors. Next, each completed segment is converted to a union of generalized cylinders yielding a rough 3D model for each object. Finally, the object depths are refined using an iterative ground fitting process. The hallucinated 3D model of the scene may then be used to generate a stereoscopic image pair, or to produce images from novel viewpoints within a small neighborhood of the original view. Despite the simplicity of our approach, we show that it compares favorably with state‐of‐the‐art depth ordering methods. A user study was conducted showing that our method produces more convincing stereoscopic images than existing semi‐interactive and automatic single image depth recovery methods.  相似文献   

5.
融合图像场景及物体先验知识的图像描述生成模型   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 目前基于深度卷积神经网络(CNN)和长短时记忆(LSTM)网络模型进行图像描述的方法一般是用物体类别信息作为先验知识来提取图像CNN特征,忽略了图像中的场景先验知识,造成生成的句子缺乏对场景的准确描述,容易对图像中物体的位置关系等造成误判。针对此问题,设计了融合场景及物体类别先验信息的图像描述生成模型(F-SOCPK),将图像中的场景先验信息和物体类别先验信息融入模型中,协同生成图像的描述句子,提高句子生成质量。方法 首先在大规模场景类别数据集Place205上训练CNN-S模型中的参数,使得CNN-S模型能够包含更多的场景先验信息,然后将其中的参数通过迁移学习的方法迁移到CNNd-S中,用于捕捉待描述图像中的场景信息;同时,在大规模物体类别数据集Imagenet上训练CNN-O模型中的参数,然后将其迁移到CNNd-O模型中,用于捕捉图像中的物体信息。提取图像的场景信息和物体信息之后,分别将其送入语言模型LM-S和LM-O中;然后将LM-S和LM-O的输出信息通过Softmax函数的变换,得到单词表中每个单词的概率分值;最后使用加权融合方式,计算每个单词的最终分值,取概率最大者所对应的单词作为当前时间步上的输出,最终生成图像的描述句子。结果 在MSCOCO、Flickr30k和Flickr8k 3个公开数据集上进行实验。本文设计的模型在反映句子连贯性和准确率的BLEU指标、反映句子中单词的准确率和召回率的METEOR指标及反映语义丰富程度的CIDEr指标等多个性能指标上均超过了单独使用物体类别信息的模型,尤其在Flickr8k数据集上,在CIDEr指标上,比单独基于物体类别的Object-based模型提升了9%,比单独基于场景类别的Scene-based模型提升了近11%。结论 本文所提方法效果显著,在基准模型的基础上,性能有了很大提升;与其他主流方法相比,其性能也极为优越。尤其是在较大的数据集上(如MSCOCO),其优势较为明显;但在较小的数据集上(如Flickr8k),其性能还有待于进一步改进。在下一步工作中,将在模型中融入更多的视觉先验信息,如动作类别、物体与物体之间的关系等,进一步提升描述句子的质量。同时,也将结合更多视觉技术,如更深的CNN模型、目标检测、场景理解等,进一步提升句子的准确率。  相似文献   

6.
We present a data‐driven method for synthesizing 3D indoor scenes by inserting objects progressively into an initial, possibly, empty scene. Instead of relying on few hundreds of hand‐crafted 3D scenes, we take advantage of existing large‐scale annotated RGB‐D datasets, in particular, the SUN RGB‐D database consisting of 10,000+ depth images of real scenes, to form the prior knowledge for our synthesis task. Our object insertion scheme follows a co‐occurrence model and an arrangement model, both learned from the SUN dataset. The former elects a highly probable combination of object categories along with the number of instances per category while a plausible placement is defined by the latter model. Compared to previous works on probabilistic learning for object placement, we make two contributions. First, we learn various classes of higher‐order object‐object relations including symmetry, distinct orientation, and proximity from the database. These relations effectively enable considering objects in semantically formed groups rather than by individuals. Second, while our algorithm inserts objects one at a time, it attains holistic plausibility of the whole current scene while offering controllability through progressive synthesis. We conducted several user studies to compare our scene synthesis performance to results obtained by manual synthesis, state‐of‐the‐art object placement schemes, and variations of parameter settings for the arrangement model.  相似文献   

7.
Collision detection is highly important in computer graphics and virtual reality. Most collision detection methods are object‐based, relying on testing the geometrical interference of objects, and their performance therefore depends on the geometrical complexity of the objects. Recently, image‐based methods have gained increasing acceptance for their simplicity in implementation, robustness with respect to the object geometry, and the potential to distribute the computational burden onto graphics hardware. However, all existing image‐based methods require direct calls to OpenGL, but so far there is no direct way to access OpenGL through the Java 3D API. Although Java 3D provides its own built‐in collision detection classes, they are either incorrect or inefficient. In this paper, we present a hybrid image‐based collision detection method in Java 3D, which incorporates the Java 3D built‐in collision detection and the image‐based collision detection in our specially devised scene graph. In addition, we take advantage of the fact that the 3D position of successive offscreen views (i.e. virtual views perceived by the probing object) does not change significantly and thereby reduce the occurrences of offscreen rendering, so that the collision detection becomes even faster (up to 50% in our case). Experimental results prove the correctness and efficiency of our method. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

8.
We present an image‐based rendering system to viewpoint‐navigate through space and time of complex real‐world, dynamic scenes. Our approach accepts unsynchronized, uncalibrated multivideo footage as input. Inexpensive, consumer‐grade camcorders suffice to acquire arbitrary scenes, for example in the outdoors, without elaborate recording setup procedures, allowing also for hand‐held recordings. Instead of scene depth estimation, layer segmentation or 3D reconstruction, our approach is based on dense image correspondences, treating view interpolation uniformly in space and time: spatial viewpoint navigation, slow motion or freeze‐and‐rotate effects can all be created in the same way. Acquisition simplification, integration of moving cameras, generalization to difficult scenes and space–time symmetric interpolation amount to a widely applicable virtual video camera system.  相似文献   

9.
张洛声  童晶 《计算机应用》2017,37(8):2302-2306
为了快速生成带浮雕纹理的三维模型,提出一种实时交互的浮雕纹理模型构建方法。方法分两步:第一步,将生成浮雕的源模型或图像转换为初始深度图,并进一步转换为梯度图,再通过梯度域的压缩、过滤,求解线性方程重建出整体连续的浮雕深度图;第二步,借助基于网格求交的浮雕纹理映射算法将浮雕深度图贴在目标模型表面,并通过移动、旋转、缩放等操作实时在目标模型三维空间上修改浮雕效果,最终重建目标模型网格,生成浮雕纹理模型。实验表明,所提方法可快速实现在一个目标模型上生成凹浮雕、凸浮雕、多浮雕等效果,所得模型无需经过其他处理,可直接应用于3D打印,打印效果较好。  相似文献   

10.
Many casually taken ‘tourist’ photographs comprise of architectural objects like houses, buildings, etc. Reconstructing such 3D scenes captured in a single photograph is a very challenging problem. We propose a novel approach to reconstruct such architectural scenes with minimal and simple user interaction, with the goal of providing 3D navigational capability to an image rather than acquiring accurate geometric detail. Our system, Peek‐in‐the‐Pic, is based on a sketch‐based geometry reconstruction paradigm. Given an image, the user simply traces out objects from it. Our system regards these as perspective line drawings, automatically completes them and reconstructs geometry from them. We make basic assumptions about the structure of traced objects and provide simple gestures for placing additional constraints. We also provide a simple sketching tool to progressively complete parts of the reconstructed buildings that are not visible in the image and cannot be automatically completed. Finally, we fill holes created in the original image when reconstructed buildings are removed from it, by automatic texture synthesis. Users can spend more time using interactive texture synthesis for further refining the image. Thus, instead of looking at flat images, a user can fly through them after some simple processing. Minimal manual work, ease of use and interactivity are the salient features of our approach.  相似文献   

11.
This paper presents a symbolic formalism for modeling and retrieving video data via the moving objects contained in the video images. The model integrates the representations of individual moving objects in a scene with the time-varying relationships between them by incorporating both the notions of object tracks and temporal sequences of PIRs (projection interval relationships). The model is supported by a set of operations which form the basis of a moving object algebra. This algebra allows one to retrieve scenes and information from scenes by specifying both spatial and temporal properties of the objects involved. It also provides operations to create new scenes from existing ones. A prototype implementation is described which allows queries to be specified either via an animation sketch or using the moving object algebra.  相似文献   

12.
We propose a method for converting a single image of a transparent object into multi-view photo that enables users observing the object from multiple new angles, without inputting any 3D shape. The complex light paths formed by refraction and reflection makes it challenging to compute the lighting effects of transparent objects from a new angle. We construct an encoder–decoder network for normal reconstruction and texture extraction, which enables synthesizing novel views of transparent object from a set of new views and new environment maps using only one RGB image. By simultaneously considering the optical transmission and perspective variation, our network learns the characteristics of optical transmission and the change of perspective as guidance to the conversion from RGB colours to surface normals. A texture extraction subnetwork is proposed to alleviate the contour loss phenomenon during normal map generation. We test our method using 3D objects within and without our training data, including real 3D objects that exists in our lab, and completely new environment maps that we take using our phones. The results show that our method performs better on view synthesis of transparent objects in complex scenes using only a single-view image.  相似文献   

13.
We present a method for synthesizing high reliefs, a sculpting technique that attaches 3D objects onto a 2D surface within a limited depth range. The main challenges are the preservation of distinct scene parts by preserving depth discontinuities, the fine details of the shape, and the overall continuity of the scene. Bas relief depth compression methods such as gradient compression and depth range compression are not applicable for high relief production. Instead, our method is based on differential coordinates to bring scene elements to the relief plane while preserving depth discontinuities and surface details of the scene. We select a user‐defined number of attenuation points within the scene, attenuate these points towards the relief plane and recompute the positions of all scene elements by preserving the differential coordinates. Finally, if the desired depth range is not achieved we apply a range compression. High relief synthesis is semi‐automatic and can be controlled by user‐defined parameters to adjust the depth range, as well as the placement of the scene elements with respect to the relief plane.  相似文献   

14.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

15.
在弱可见光条件下,对同一场景监控的红外与可见光图像进行融合,使融合图像即显示红外目标,又能保留可见光图像的细节结构信息,方便观察者对场景的观察与监控。充分利用红外成像的特点,热目标与背景的温度差会使目标在红外图像中的灰度值更大。使用红外序列建立稳定的背景模型,当前帧与背景的差得到运动目标区域,然后,将目标区域内的红外目标融合到可见光图像中,达到对红外运动目标检测的目的。  相似文献   

16.
We present an interactive system for composing realistic images of an object under arbitrary pose and appearance specified by sketching. Our system draws inspiration from a traditional illustration workflow: The user first sketches rough ‘masses’ of the object, as ellipses, to define an initial abstract pose that can then be refined with more detailed contours as desired. The system is made robust to partial or inaccurate sketches using a reduced‐dimensionality model of pose space learnt from a labelled collection of photos. Throughout the composition process, interactive visual feedback is provided to guide the user. Finally, the user's partial or complete sketch, complemented with appearance requirements, is used to constrain the automatic synthesis of a novel, high‐quality, realistic image.  相似文献   

17.
Many high‐level image processing tasks require an estimate of the positions, directions and relative intensities of the light sources that illuminated the depicted scene. In image‐based rendering, augmented reality and computer vision, such tasks include matching image contents based on illumination, inserting rendered synthetic objects into a natural image, intrinsic images, shape from shading and image relighting. Yet, accurate and robust illumination estimation, particularly from a single image, is a highly ill‐posed problem. In this paper, we present a new method to estimate the illumination in a single image as a combination of achromatic lights with their 3D directions and relative intensities. In contrast to previous methods, we base our azimuth angle estimation on curve fitting and recursive refinement of the number of light sources. Similarly, we present a novel surface normal approximation using an osculating arc for the estimation of zenith angles. By means of a new data set of ground‐truth data and images, we demonstrate that our approach produces more robust and accurate results, and show its versatility through novel applications such as image compositing and analysis.  相似文献   

18.
边界光场     
提出一种新的基于图像的绘制方法——边界光场.该方法基于3D全光函数的思想,并使之与场景几何相结合.该方法克服了已有的IBR漫游系统的一些缺陷,利用自适应的的全光采样模式,根据场景复杂度或用户要求组织采样数据,降低了场景数据量;由于场景几何的参与,纠正了较大的深度变形;新的采样数据组织模式去除了对漫游范围的限制.文中方法可有效地应用于虚拟或真实场景漫游系统中.  相似文献   

19.
目的 对于微距摄影来说,由于微距镜头的景深有限,往往很难通过单幅照片获得拍摄对象全幅清晰的图像.因此要想获取全幅清晰的照片,就需要拍摄多幅具有不同焦点的微距照片,并对其进行融合.方法 传统的微距照片融合方法一般都假定需要融合的图像是已经配准好的,也并没有考虑微距图像的自动采集.因此提出了一种用于微距摄影的多聚焦图像采集和融合系统,该系统由3个部分组成.第1部分是一种微距图像拍摄装置,该硬件能够以高精度的方式拍摄物体在不同焦距下的微距照片.第2部分是一个基于不变特征的图像配准组件,它可以对在多个焦点下拍摄的微距图像进行自动配准和对齐.第3部分是一个基于图像金字塔的多聚焦图像融合组件,这个组件能够对已经对齐的微距照片进行融合,使得合成的图像具有更大的景深.该组件对基于图像金字塔的融合方法进行了扩展,提出了一种基于滤波的权重计算策略.通过将该权重计算与图像金字塔相结合,得到了一种基于多分辨率的多聚焦图像融合方法.结果 论文使用设计的拍摄装置采集了多组实验数据,用以验证系统硬件设计和软件设计的正确性,并使用主观和客观的方法对提出的系统进行评价.从主观评价来看,系统合成的微距图像不仅具有足够的景深,而且在高分辨率下也能够清晰地呈现物体微小的细节.从客观评价来看,通过将系统合成的微距图像与其他方法合成的微距图像进行量化比较,在标准差、信息熵和平均梯度3种评价标准中都是最优的.结论 实验结果表明,该系统是灵活和高效的,不仅能够对多幅具有不同焦点的微距图像进行自动采集、配准和融合,并且在图像融合的质量方面也能和其他方法相媲美.  相似文献   

20.
Compared to still image editing, content-based video editing faces the additional challenges of maintaining the spatiotemporal consistency with respect to geometry. This brings up difficulties of seamlessly modifying video content, for instance, inserting or removing an object. In this paper, we present a new video editing system for creating spatiotemporally consistent and visually appealing refilming effects. Unlike the typical filming practice, our system requires no labor-intensive construction of 3D models/surfaces mimicking the real scene. Instead, it is based on an unsupervised inference of view-dependent depth maps for all video frames. We provide interactive tools requiring only a small amount of user input to perform elementary video content editing, such as separating video layers, completing background scene, and extracting moving objects. These tools can be utilized to produce a variety of visual effects in our system, including but not limited to video composition, "predatorrdquo effect, bullet-time, depth-of-field, and fog synthesis. Some of the effects can be achieved in real time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号