首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Convincing manipulation of objects in live action videos is a difficult and often tedious task. Skilled video editors achieve this with the help of modern professional tools, but complex motions might still lack physical realism since existing tools do not consider the laws of physics. On the other hand, physically based simulation promises a high degree of realism, but typically creates a virtual 3D scene animation rather than returning an edited version of an input live action video. We propose a framework that combines video editing and physics‐based simulation. Our tool assists unskilled users in editing an input image or video while respecting the laws of physics and also leveraging the image content. We first fit a physically based simulation that approximates the object's motion in the input video. We then allow the user to edit the physical parameters of the object, generating a new physical behavior for it. The core of our work is the formulation of an image‐aware constraint within physics simulations. This constraint manifests as external control forces to guide the object in a way that encourages proper texturing at every frame, yet producing physically plausible motions. We demonstrate the generality of our method on a variety of physical interactions: rigid motion, multi‐body collisions, clothes and elastic bodies.  相似文献   

2.
Images/videos captured by portable devices (e.g., cellphones, DV cameras) often have limited fields of view. Image stitching, also referred to as mosaics or panorama, can produce a wide angle image by compositing several photographs together. Although various methods have been developed for image stitching in recent years, few works address the video stitching problem. In this paper, we present the first system to stitch videos captured by hand‐held cameras. We first recover the 3D camera paths and a sparse set of 3D scene points using CoSLAM system, and densely reconstruct the 3D scene in the overlapping regions. Then, we generate a smooth virtual camera path, which stays in the middle of the original paths. Finally, the stitched video is synthesized along the virtual path as if it was taken from this new trajectory. The warping required for the stitching is obtained by optimizing over both temporal stability and alignment quality, while leveraging on 3D information at our disposal. The experiments show that our method can produce high quality stitching results for various challenging scenarios.  相似文献   

3.
Novel view synthesis from sparse and unstructured input views faces challenges like the difficulty with dense 3D reconstruction and large occlusion. This paper addresses these problems by estimating proper appearance flows from the target to input views to warp and blend the input views. Our method first estimates a sparse set 3D scene points using an off‐the‐shelf 3D reconstruction method and calculates sparse flows from the target to input views. Our method then performs appearance flow completion to estimate the dense flows from the corresponding sparse ones. Specifically, we design a deep fully convolutional neural network that takes sparse flows and input views as input and outputs the dense flows. Furthermore, we estimate the optical flows between input views as references to guide the estimation of dense flows between the target view and input views. Besides the dense flows, our network also estimates the masks to blend multiple warped inputs to render the target view. Experiments on the KITTI benchmark show that our method can generate high quality novel views from sparse and unstructured input views.  相似文献   

4.
We present a novel representation and rendering method for free‐viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi‐automatic, data‐driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free‐viewpoint video generated from broadcast footage of challenging, uncontrolled environments.  相似文献   

5.
6.
We propose an image editing system for repositioning objects in a single image based on the perspective of the scene. In our system, an input image is transformed into a layer structure that is composed of object layers and a background layer, and then the scene depth is computed from the ground region that is specified by the user using a simple boundary line. The object size and order of overlapping are automatically determined during the reposition based on the scene depth. In addition, our system enables the user to move shadows along with objects naturally by extracting the shadow mattes using only a few user‐specified scribbles. Finally, we demonstrate the versatility of our system through applications to depth‐of‐field effects, fog synthesis and 3D walkthrough in an image.  相似文献   

7.
We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. Our approach is formulated in an MDL hypothesis selection framework, which allows it to recover from mismatches and temporarily lost tracks. Building upon a multi-view/multi-category object detector, it localizes cars and pedestrians in the input images. The 2D object detections are converted to 3D observations, which are accumulated in a world coordinate frame. Trajectory analysis in a spacetime window yields physically plausible trajectory candidates. Tracking is achieved by performing model selection after every frame. At each time instant, our approach searches for the globally optimal set of spacetime trajectories which provides the best explanation for the current image and all evidence collected so far, while satisfying the constraints that no two objects may occupy the same physical space, nor explain the same image pixels at any time. Successful trajectory hypotheses are then fed back to guide object detection in future frames. The resulting approach can initialize automatically and track a large and varying number of objects from both static and moving cameras. We evaluate our approach on several challenging video sequences with both a surveillance-type scenario and a scenario where the input videos are taken from a moving vehicle.  相似文献   

8.
9.
The ability to quickly and intuitively edit digital contents has become increasingly important in our everyday life. We propose a novel method for propagating a sparse set of user edits (e.g., changes in color, brightness, contrast, etc.) expressed as casual strokes to nearby regions in an image or video with similar appearances. Existing methods for edit propagation are typically based on optimization, whose computational cost can be prohibitive for large inputs. We re‐formulate propagation as a function interpolation problem in a high‐dimensional space, which we solve very efficiently using radial basis functions. While simple to implement, our method significantly improves the speed and space cost of existing methods, and provides instant feedback of propagation results even on large images and videos.  相似文献   

10.
Abstract— With the maturation of three‐dimensional (3‐D) technologies, display systems can provide higher visual quality to enrich the viewer experience. However, the depth information required for 3‐D displays is not available in conventional 2‐D recorded contents. Therefore, the conversion of existing 2‐D video to 3‐D video becomes an important issue for emerging 3‐D applications. This paper presents a system which automatically converts 2‐D videos to 3‐D format. The proposed system combines three major depth cues: the depth from motion, the scene depth from geometrical perspective, and the fine‐granularity depth from the relative position. The proposed system uses a block‐based method incorporating a joint bilateral filter to efficiently generate visually comfortable depth maps and to diminish the blocky artifacts. By means of the generated depth map, 2‐D videos can be readily converted into 3‐D format. Moreover, for conventional 2‐D displays, a 2‐D image/video depth perception enhancement application is also presented. With the depth‐aware adjustment of color saturation, contrast, and edge, the stereo effect of the 2‐D content can be enhanced. A user study on subjective quality shows that the proposed method has promising results on depth quality and visual comfort.  相似文献   

11.
We present a technique for coupling simulated fluid phenomena that interact with real dynamic scenes captured as a binocular video sequence. We first process the binocular video sequence to obtain a complete 3D reconstruction of the scene, including velocity information. We use stereo for the visible parts of 3D geometry and surface completion to fill the missing regions. We then perform fluid simulation within a 3D domain that contains the object, enabling one‐way coupling from the video to the fluid. In order to maintain temporal consistency of the reconstructed scene and the animated fluid across frames, we develop a geometry tracking algorithm that combines optic flow and depth information with a novel technique for “velocity completion”. The velocity completion technique uses local rigidity constraints to hypothesize a motion field for the entire 3D shape, which is then used to propagate and filter the reconstructed shape over time. This approach not only generates smoothly varying geometry across time, but also simultaneously provides the necessary boundary conditions for one‐way coupling between the dynamic geometry and the simulated fluid. Finally, we employ a GPU based scheme for rendering the synthetic fluid in the real video, taking refraction and scene texture into account.  相似文献   

12.
王伟  任国恒  陈立勇  张效尉 《自动化学报》2019,45(11):2187-2198
在基于图像的城市场景三维重建中,场景分段平面重建算法可以克服场景中的弱纹理、光照变化等因素的影响而快速恢复场景完整的近似结构.然而,在初始空间点较为稀疏、候选平面集不完备、图像过分割质量较低等问题存在时,可靠性往往较低.为了解决此问题,本文根据城市场景的结构特征构造了一种新颖的融合场景结构先验、空间点可见性与颜色相似性的平面可靠性度量,然后采用图像区域与相应平面协同优化的方式对场景结构进行了推断.实验结果表明,本文算法利用稀疏空间点即可有效重建出完整的场景结构,整体上具有较高的精度与效率.  相似文献   

13.
Despite considerable advances in natural image matting over the last decades, video matting still remains a difficult problem. The main challenges faced by existing methods are the large amount of user input required, and temporal inconsistencies in mattes between pairs of adjacent frames. We present a temporally‐coherent matte‐propagation method for videos based on PatchMatch and edge‐aware filtering. Given an input video and trimaps for a few frames, including the first and last, our approach generates alpha mattes for all frames of the video sequence. We also present a user scribble‐based interface for video matting that takes advantage of the efficiency of our method to interactively refine the matte results. We demonstrate the effectiveness of our approach by using it to generate temporally‐coherent mattes for several natural video sequences. We perform quantitative comparisons against the state‐of‐the‐art sparse‐input video matting techniques and show that our method produces significantly better results according to three different metrics. We also perform qualitative comparisons against the state‐of‐the‐art dense‐input video matting techniques and show that our approach produces similar quality results while requiring only about 7% of the amount of user input required by such techniques. These results show that our method is both effective and user‐friendly, outperforming state‐of‐the‐art solutions.  相似文献   

14.
In this paper, we present a novel method to extract motion of a dynamic object from a video that is captured by a handheld camera, and apply it to a 3D character. Unlike the motion capture techniques, neither special sensors/trackers nor a controllable environment is required. Our system significantly automates motion imitation which is traditionally conducted by professional animators via manual keyframing. Given the input video sequence, we track the dynamic reference object to obtain trajectories of both 2D and 3D tracking points. With them as constraints, we then transfer the motion to the target 3D character by solving an optimization problem to maintain the motion gradients. We also provide a user-friendly editing environment for users to fine tune the motion details. As casual videos can be used, our system, therefore, greatly increases the supply source of motion data. Examples of imitating various types of animal motion are shown.  相似文献   

15.
Although 3D object detection methods based on feature fusion have made great progress, the methods still have the problem of low precision due to sparse point clouds. In this paper, we propose a new feature fusion-based method, which can generate virtual point cloud and improve the precision of car detection. Considering that RGB images have rich semantic information, this method firstly segments the cars from the image, and then projected the raw point clouds onto the segmented car image to segment point clouds of the cars. Furthermore, the segmented point clouds are input to the virtual point cloud generation module. The module regresses the direction of car, then combines the foreground points to generate virtual point clouds and superimposed with the raw point cloud. Eventually, the processed point cloud is converted to voxel representation, which is then fed into 3D sparse convolutional network to extract features, and finally a region proposal network is used to detect cars in a bird’s-eye view. Experimental results on KITTI dataset show that our method is effective, and the precision have significant advantages compared to other similar feature fusion-based methods.  相似文献   

16.
Image storyboards of films and videos are useful for quick browsing and automatic video processing. A common approach for producing image storyboards is to display a set of selected key‐frames in temporal order, which has been widely used for 2D video data. However, such an approach cannot be applied for 3D animation data because different information is revealed by changing parameters such as the viewing angle and the duration of the animation. Also, the interests of the viewer may be different from person to person. As a result, it is difficult to draw a single image that perfectly abstracts the entire 3D animation data. In this paper, we propose a system that allows users to interactively browse an animation and produce a comic sequence out of it. Each snapshot in the comic optimally visualizes a duration of the original animation, taking into account the geometry and motion of the characters and objects in the scene. This is achieved by a novel algorithm that automatically produces a hierarchy of snapshots from the input animation. Our user interface allows users to arrange the snapshots according to the complexity of the movements by the characters and objects, the duration of the animation and the page area to visualize the comic sequence. Our system is useful for quickly browsing through a large amount of animation data and semi‐automatically synthesizing a storyboard from a long sequence of animation.  相似文献   

17.
Temporal coherence is an important problem in Non‐Photorealistic Rendering for videos. In this paper, we present a novel approach to enhance temporal coherence in video painting. Instead of painting on video frame, our approach first partitions the video into multiple motion layers, and then places the brush strokes on the layers to generate the painted imagery. The extracted motion layers consist of one background layer and several object layers in each frame. Then, background layers from all the frames are aligned into a panoramic image, on which brush strokes are placed to paint the background in one‐shot. The strokes used to paint object layers are propagated frame by frame using smooth transformations defined by thin plate splines. Once the background and object layers are painted, they are projected back to each frame and blent to form the final painting results. Thanks to painting a single image, our approach can completely eliminate the flickering in background, and temporal coherence on object layers is also significantly enhanced due to the smooth transformation over frames. Additionally, by controlling the painting strokes on different layers, our approach is easy to generate painted video with multi‐style. Experimental results show that our approach is both robust and efficient to generate plausible video painting.  相似文献   

18.
Video capture is limited by the trade‐off between spatial and temporal resolution: when capturing videos of high temporal resolution, the spatial resolution decreases due to bandwidth limitations in the capture system. Achieving both high spatial and temporal resolution is only possible with highly specialized and very expensive hardware, and even then the same basic trade‐off remains. The recent introduction of compressive sensing and sparse reconstruction techniques allows for the capture of single‐shot high‐speed video, by coding the temporal information in a single frame, and then reconstructing the full video sequence from this single‐coded image and a trained dictionary of image patches. In this paper, we first analyse this approach, and find insights that help improve the quality of the reconstructed videos. We then introduce a novel technique, based on convolutional sparse coding (CSC), and show how it outperforms the state‐of‐the‐art, patch‐based approach in terms of flexibility and efficiency, due to the convolutional nature of its filter banks. The key idea for CSC high‐speed video acquisition is extending the basic formulation by imposing an additional constraint in the temporal dimension, which enforces sparsity of the first‐order derivatives over time.  相似文献   

19.
We present a deep learning based technique that enables novel‐view videos of human performances to be synthesized from sparse multi‐view captures. While performance capturing from a sparse set of videos has received significant attention, there has been relatively less progress which is about non‐rigid objects (e.g., human bodies). The rich articulation modes of human body make it rather challenging to synthesize and interpolate the model well. To address this problem, we propose a novel deep learning based framework that directly predicts novel‐view videos of human performances without explicit 3D reconstruction. Our method is a composition of two steps: novel‐view prediction and detail enhancement. We first learn a novel deep generative query network for view prediction. We synthesize novel‐view performances from a sparse set of just five or less camera videos. Then, we use a new generative adversarial network to enhance fine‐scale details of the first step results. This opens up the possibility of high‐quality low‐cost video‐based performance synthesis, which is gaining popularity for VA and AR applications. We demonstrate a variety of promising results, where our method is able to synthesis more robust and accurate performances than existing state‐of‐the‐art approaches when only sparse views are available.  相似文献   

20.
提出一种利用运动目标三维轨迹重建的视频时域同步算法.待同步的视频序列由不同相机在同一场景中同时拍摄得到,对场景及相机运动不做限制性约束.假设每帧图像的相机投影矩阵已知,首先基于离散余弦变换基重建运动目标的三维轨迹.然后提出一种基于轨迹基系数矩阵的秩约束,用于衡量不同序列子段间的空间时间对准程度.最后构建代价矩阵,并利用基于图的方法实现视频间的非线性时域同步.我们不依赖已知的点对应关系,不同视频中的跟踪点甚至可以对应不同的三维点,只要它们之间满足以下假设:观测序列中跟踪点对应的三维点,其空间位置可以用参考序列中所有跟踪点对应的三维点集的子集的线性组合描述,且该线性关系维持不变.与多数现有方法要求特征点跟踪持续整个图像序列不同,本文方法可以利用长短不一的图像点轨迹.本文在仿真数据和真实数据集上验证了提出方法的鲁棒性和性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号