首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time‐consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real‐time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky.  相似文献   

2.
Shadow removal for videos is an important and challenging vision task. In this paper, we present a novel shadow removal approach for videos captured by free moving cameras using illumination transfer optimization. We first detect the shadows of the input video using interactive fast video matting. Then, based on the shadow detection results, we decompose the input video into overlapped 2D patches, and find the coherent correspondences between the shadow and non‐shadow patches via discrete optimization technique built on the patch similarity metric. We finally remove the shadows of the input video sequences using an optimized illumination transfer method, which reasonably recovers the illumination information of the shadow regions and produces spatio‐temporal shadow‐free videos. We also process the shadow boundaries to make the transition between shadow and non‐shadow regions smooth. Compared with previous works, our method can handle videos captured by free moving cameras and achieve better shadow removal results. We validate the effectiveness of the proposed algorithm via a variety of experiments.  相似文献   

3.
A practical way to generate a high dynamic range (HDR) video using off‐the‐shelf cameras is to capture a sequence with alternating exposures and reconstruct the missing content at each frame. Unfortunately, existing approaches are typically slow and are not able to handle challenging cases. In this paper, we propose a learning‐based approach to address this difficult problem. To do this, we use two sequential convolutional neural networks (CNN) to model the entire HDR video reconstruction process. In the first step, we align the neighboring frames to the current frame by estimating the flows between them using a network, which is specifically designed for this application. We then combine the aligned and current images using another CNN to produce the final HDR frame. We perform an end‐to‐end training by minimizing the error between the reconstructed and ground truth HDR images on a set of training scenes. We produce our training data synthetically from existing HDR video datasets and simulate the imperfections of standard digital cameras using a simple approach. Experimental results demonstrate that our approach produces high‐quality HDR videos and is an order of magnitude faster than the state‐of‐the‐art techniques for sequences with two and three alternating exposures.  相似文献   

4.
We propose a novel approach to robot‐operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of a volumetric depth fusion framework and performs real‐time voxel‐based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.  相似文献   

5.
We present a novel method to reconstruct a fluid's 3D density and motion based on just a single sequence of images. This is rendered possible by using powerful physical priors for this strongly under‐determined problem. More specifically, we propose a novel strategy to infer density updates strongly coupled to previous and current estimates of the flow motion. Additionally, we employ an accurate discretization and depth‐based regularizers to compute stable solutions. Using only one view for the reconstruction reduces the complexity of the capturing setup drastically and could even allow for online video databases or smart‐phone videos as inputs. The reconstructed 3D velocity can then be flexibly utilized, e.g., for re‐simulation, domain modification or guiding purposes. We will demonstrate the capacity of our method with a series of synthetic test cases and the reconstruction of real smoke plumes captured with a Raspberry Pi camera.  相似文献   

6.
360° VR videos provide users with an immersive visual experience. To encode 360° VR videos, spherical pixels must be mapped onto a two‐dimensional domain to take advantage of the existing video encoding and storage standards. In VR industry, standard cubemap projection is the most widely used projection method for encoding 360° VR videos. However, it exhibits pixel density variation at different regions due to projection distortion. We present a generalized algorithm to improve the efficiency of cubemap projection using polynomial approximation. In our algorithm, standard cubemap projection can be regarded as a special form with 1st‐order polynomial. Our experiments show that the generalized cubemap projection can significantly reduce the projection distortion using higher order polynomials. As a result, pixel distribution can be well balanced in the resulting 360° VR videos. We use PSNR, S‐PSNR and CPP‐PSNR to evaluate the visual quality and the experimental results demonstrate promising performance improvement against standard cubemap projection and Google's equi‐angular cubemap.  相似文献   

7.
We present a new video‐based performance cloning technique. After training a deep generative network using a reference video capturing the appearance and dynamics of a target actor, we are able to generate videos where this actor reenacts other performances. All of the training data and the driving performances are provided as ordinary video segments, without motion capture or depth information. Our generative model is realized as a deep neural network with two branches, both of which train the same space‐time conditional generator, using shared weights. One branch, responsible for learning to generate the appearance of the target actor in various poses, uses paired training data, self‐generated from the reference video. The second branch uses unpaired data to improve generation of temporally coherent video renditions of unseen pose sequences. Through data augmentation, our network is able to synthesize images of the target actor in poses never captured by the reference video. We demonstrate a variety of promising results, where our method is able to generate temporally coherent videos, for challenging scenarios where the reference and driving videos consist of very different dance performances.  相似文献   

8.
A camera's shutter controls the incoming light that is reaching the camera sensor. Different shutters lead to wildly different results, and are often used as a tool in movies for artistic purpose, e.g., they can indirectly control the effect of motion blur. However, a physical camera is limited to a single shutter setting at any given moment. ShutterApp enables users to define spatio‐temporally‐varying virtual shutters that go beyond the options available in real‐world camera systems. A user provides a sparse set of annotations that define shutter functions at selected locations in key frames. From this input, our solution defines shutter functions for each pixel of the video sequence using a suitable interpolation technique, which are then employed to derive the output video. Our solution performs in real‐time on commodity hardware. Hereby, users can explore different options interactively, leading to a new level of expressiveness without having to rely on specialized hardware or laborious editing.  相似文献   

9.
Procedural modeling is used across many industries for rapid 3D content creation. However, professional procedural tools often lack artistic control, requiring manual edits on baked results, diminishing the advantages of a procedural modeling pipeline. Previous approaches to enable local artistic control require special annotations of the procedural system and manual exploration of potential edit locations. Therefore, we propose a novel approach to discover meaningful and non‐redundant good edit locations (GELs). We introduce a bottom‐up algorithm for finding GELs directly from the attributes in procedural models, without special annotations. To make attribute edits at GELs persistent, we analyze their local spatial context and construct a meta‐locator to uniquely specify their structure. Meta‐locators are calculated independently per attribute, making them robust against changes in the procedural system. Functions on meta‐locators enable intuitive and robust multi‐selections. Finally, we introduce an algorithm to transfer meta‐locators to a different procedural model. We show that our approach greatly simplifies the exploration of the local edit space, and we demonstrate its usefulness in a user study and multiple real‐world examples.  相似文献   

10.
Reproducing the appearance of real‐world materials using current printing technology is problematic. The reduced number of inks available define the printer's limited gamut, creating distortions in the printed appearance that are hard to control. Gamut mapping refers to the process of bringing an out‐of‐gamut material appearance into the printer's gamut, while minimizing such distortions as much as possible. We present a novel two‐step gamut mapping algorithm that allows users to specify which perceptual attribute of the original material they want to preserve (such as brightness, or roughness). In the first step, we work in the low‐dimensional intuitive appearance space recently proposed by Serrano et al. [ SGM*16 ], and adjust achromatic reflectance via an objective function that strives to preserve certain attributes. From such intermediate representation, we then perform an image‐based optimization including color information, to bring the BRDF into gamut. We show, both objectively and through a user study, how our method yields superior results compared to the state of the art, with the additional advantage that the user can specify which visual attributes need to be preserved. Moreover, we show how this approach can also be used for attribute‐preserving material editing.  相似文献   

11.
We present a deep learning based technique that enables novel‐view videos of human performances to be synthesized from sparse multi‐view captures. While performance capturing from a sparse set of videos has received significant attention, there has been relatively less progress which is about non‐rigid objects (e.g., human bodies). The rich articulation modes of human body make it rather challenging to synthesize and interpolate the model well. To address this problem, we propose a novel deep learning based framework that directly predicts novel‐view videos of human performances without explicit 3D reconstruction. Our method is a composition of two steps: novel‐view prediction and detail enhancement. We first learn a novel deep generative query network for view prediction. We synthesize novel‐view performances from a sparse set of just five or less camera videos. Then, we use a new generative adversarial network to enhance fine‐scale details of the first step results. This opens up the possibility of high‐quality low‐cost video‐based performance synthesis, which is gaining popularity for VA and AR applications. We demonstrate a variety of promising results, where our method is able to synthesis more robust and accurate performances than existing state‐of‐the‐art approaches when only sparse views are available.  相似文献   

12.
Understanding the attentional behavior of the human visual system when visualizing a rendered 3D shape is of great importance for many computer graphics applications. Eye tracking remains the only solution to explore this complex cognitive mechanism. Unfortunately, despite the large number of studies dedicated to images and videos, only a few eye tracking experiments have been conducted using 3D shapes. Thus, potential factors that may influence the human gaze in the specific setting of 3D rendering, are still to be understood. In this work, we conduct two eye‐tracking experiments involving 3D shapes, with both static and time‐varying camera positions. We propose a method for mapping eye fixations (i.e., where humans gaze) onto the 3D shapes with the aim to produce a benchmark of 3D meshes with fixation density maps, which is publicly available. First, the collected data is used to study the influence of shape, camera position, material and illumination on visual attention. We find that material and lighting have a significant influence on attention, as well as the camera path in the case of dynamic scenes. Then, we compare the performance of four representative state‐of‐the‐art mesh saliency models in predicting ground‐truth fixations using two different metrics. We show that, even combined with a center‐bias model, the performance of 3D saliency algorithms remains poor at predicting human fixations. To explain their weaknesses, we provide a qualitative analysis of the main factors that attract human attention. We finally provide a comparison of human‐eye fixations and Schelling points and show that their correlation is weak.  相似文献   

13.
Palette‐based image decomposition has attracted increasing attention in recent years. A specific class of approaches have been proposed basing on the RGB‐space geometry, which manage to construct convex hulls whose vertices act as palette colors. However, such palettes do not guarantee to have the representative colors which actually appear in the image, thus making it less intuitive and less predictable when editing palette colors to perform recoloring. Hence, we proposed an improved geometric approach to address this issue. We use a polyhedron, but not necessarily a convex hull, in the RGB space to represent the color palette. We then formulate the task of palette extraction as an optimization problem which could be solved in a few seconds. Our palette has a higher degree of representativeness and maintains a relatively similar level of accuracy compared with previous methods. For layer decomposition, we compute layer opacities via simple mean value coordinates, which could achieve instant feedbacks without precomputations. We have demonstrated our method for image recoloring on a variety of examples. In comparison with state‐of‐the‐art works, our approach is generally more intuitive and efficient with fewer artifacts.  相似文献   

14.
Computer graphics artists often resort to compositing to rework light effects in a synthetic image without requiring a new render. Shadows are primary subjects of artistic manipulation as they carry important stylistic information while our perception is tolerant with their editing. In this paper we formalize the notion of global shadow, generalizing direct shadow found in previous work to a global illumination context. We define an object's shadow layer as the difference between two altered renders of the scene. A shadow layer contains the radiance lost on the camera film because of a given object. We translate this definition in the theoretical framework of Monte‐Carlo integration, obtaining a concise expression of the shadow layer. Building on it, we propose a path tracing algorithm that renders both the original image and any number of shadow layers in a single pass: the user may choose to separate shadows on a per‐object and per‐light basis, enabling intuitive and decoupled edits.  相似文献   

15.
Monte‐Carlo path tracing techniques can generate stunning visualizations of medical volumetric data. In a clinical context, such renderings turned out to be valuable for communication, education, and diagnosis. Because a large number of computationally expensive lighting samples is required to converge to a smooth result, progressive rendering is the only option for interactive settings: Low‐sampled, noisy images are shown while the user explores the data, and as soon as the camera is at rest the view is progressively refined. During interaction, the visual quality is low, which strongly impedes the user's experience. Even worse, when a data set is explored in virtual reality, the camera is never at rest, leading to constantly low image quality and strong flickering. In this work we present an approach to bring volumetric Monte‐Carlo path tracing to the interactive domain by reusing samples over time. To this end, we transfer the idea of temporal antialiasing from surface rendering to volume rendering. We show how to reproject volumetric ray samples even though they cannot be pinned to a particular 3D position, present an improved weighting scheme that makes longer history trails possible, and define an error accumulation method that downweights less appropriate older samples. Furthermore, we exploit reprojection information to adaptively determine the number of newly generated path tracing samples for each individual pixel. Our approach is designed for static, medical data with both volumetric and surface‐like structures. It achieves good‐quality volumetric Monte‐Carlo renderings with only little noise, and is also usable in a VR context.  相似文献   

16.
We introduce a bidirectional reflectance distribution function (BRDF) model for the rendering of materials that exhibit hazy reflections, whereby the specular reflections appear to be flanked by a surrounding halo. The focus of this work is on artistic control and ease of implementation for real‐time and off‐line rendering. We propose relying on a composite material based on a pair of arbitrary BRDF models; however, instead of controlling their physical parameters, we expose perceptual parameters inspired by visual experiments [ VBF17 ]. Our main contribution then consists in a mapping from perceptual to physical parameters that ensures the resulting composite BRDF is valid in terms of reciprocity, positivity and energy conservation. The immediate benefit of our approach is to provide direct artistic control over both the intensity and extent of the haze effect, which is not only necessary for editing purposes, but also essential to vary haziness spatially over an object surface. Our solution is also simple to implement as it requires no new importance sampling strategy and relies on existing BRDF models. Such a simplicity is key to approximating the method for the editing of hazy gloss in real‐time and for compositing.  相似文献   

17.
VIDOS is a Java-based server–client video management system that permits one to customize a personal version of any downloadable digital video file over the Internet or local intranet. It enables one, without purchasing expensive video editing software, to edit the video spatially and temporally, to specify the desired zoom factor, frame rate and video format, and to choose the nature and quality of digital compression before downloading the edited video. VIDOS permits videos to be adapted to suit their end uses. By potentially reducing their size, it can improve corporate and personal efficiency by speeding network transfers and cutting disc storage requirements.  相似文献   

18.
The ability to quickly and intuitively edit digital contents has become increasingly important in our everyday life. We propose a novel method for propagating a sparse set of user edits (e.g., changes in color, brightness, contrast, etc.) expressed as casual strokes to nearby regions in an image or video with similar appearances. Existing methods for edit propagation are typically based on optimization, whose computational cost can be prohibitive for large inputs. We re‐formulate propagation as a function interpolation problem in a high‐dimensional space, which we solve very efficiently using radial basis functions. While simple to implement, our method significantly improves the speed and space cost of existing methods, and provides instant feedback of propagation results even on large images and videos.  相似文献   

19.
20.
This paper proposes a scale‐adaptive filtering method to improve the performance of structure‐preserving texture filtering for image smoothing. With classical texture filters, it usually is challenging to smooth texture at multiple scales while preserving salient structures in an image. We address this issue in the concept of adaptive bilateral filtering, where the scales of Gaussian range kernels are allowed to vary from pixel to pixel. Based on direction‐wise statistics, our method distinguishes texture from structure effectively, identifies appropriate scope around a pixel to be smoothed and thus infers an optimal smoothing scale for it. Filtering an image with varying‐scale kernels, the image is smoothed according to the distribution of texture adaptively. With commendable experimental results, we show that, needing less iterations, our proposed scheme boosts texture filtering performance in terms of preserving the geometric structures of multiple scales even after aggressive smoothing of the original image.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号