Street‐level imagery is now abundant but does not have sufficient capture density to be usable for Image‐Based Rendering (IBR) of facades. We present a method that exploits repetitive elements in facades ‐ such as windows ‐ to perform data augmentation, in turn improving camera calibration, reconstructed geometry and overall rendering quality for IBR. The main intuition behind our approach is that a few views of several instances of an element provide similar information to many views of a single instance of that element. We first select similar instances of an element from 3–4 views of a facade and transform them into a common coordinate system, creating a “platonic” element. We use this common space to refine the camera calibration of each view of each instance and to reconstruct a 3D mesh of the element with multi‐view stereo, that we regularize to obtain a piecewise‐planar mesh aligned with dominant image contours. Observing the same element under multiple views also allows us to identify reflective areas ‐ such as glass panels ‐ which we use at rendering time to generate plausible reflections using an environment map. Our detailed 3D mesh, augmented set of views, and reflection mask enable image‐based rendering of much higher quality than results obtained using the input images directly.  相似文献   

Power saving is a prevailing concern in desktop computers and, especially, in battery‐powered devices such as mobile phones. This is generating a growing demand for power‐aware graphics applications that can extend battery life, while preserving good quality. In this paper, we address this issue by presenting a real‐time power‐efficient rendering framework, able to dynamically select the rendering configuration with the best quality within a given power budget. Different from the current state of the art, our method does not require precomputation of the whole camera‐view space, nor Pareto curves to explore the vast power‐error space; as such, it can also handle dynamic scenes. Our algorithm is based on two key components: our novel power prediction model, and our runtime quality error estimation mechanism. These components allow us to search for the optimal rendering configuration at runtime, being transparent to the user. We demonstrate the performance of our framework on two different platforms: a desktop computer, and a mobile device. In both cases, we produce results close to the maximum quality, while achieving significant power savings.  相似文献   

We present a new video‐based performance cloning technique. After training a deep generative network using a reference video capturing the appearance and dynamics of a target actor, we are able to generate videos where this actor reenacts other performances. All of the training data and the driving performances are provided as ordinary video segments, without motion capture or depth information. Our generative model is realized as a deep neural network with two branches, both of which train the same space‐time conditional generator, using shared weights. One branch, responsible for learning to generate the appearance of the target actor in various poses, uses paired training data, self‐generated from the reference video. The second branch uses unpaired data to improve generation of temporally coherent video renditions of unseen pose sequences. Through data augmentation, our network is able to synthesize images of the target actor in poses never captured by the reference video. We demonstrate a variety of promising results, where our method is able to generate temporally coherent videos, for challenging scenarios where the reference and driving videos consist of very different dance performances.  相似文献   

We present a deep learning based technique that enables novel‐view videos of human performances to be synthesized from sparse multi‐view captures. While performance capturing from a sparse set of videos has received significant attention, there has been relatively less progress which is about non‐rigid objects (e.g., human bodies). The rich articulation modes of human body make it rather challenging to synthesize and interpolate the model well. To address this problem, we propose a novel deep learning based framework that directly predicts novel‐view videos of human performances without explicit 3D reconstruction. Our method is a composition of two steps: novel‐view prediction and detail enhancement. We first learn a novel deep generative query network for view prediction. We synthesize novel‐view performances from a sparse set of just five or less camera videos. Then, we use a new generative adversarial network to enhance fine‐scale details of the first step results. This opens up the possibility of high‐quality low‐cost video‐based performance synthesis, which is gaining popularity for VA and AR applications. We demonstrate a variety of promising results, where our method is able to synthesis more robust and accurate performances than existing state‐of‐the‐art approaches when only sparse views are available.  相似文献   

The Bidirectional Texture Function (BTF) is a data‐driven solution to render materials with complex appearance. A typical capture contains tens of thousands of images of a material sample under varying viewing and lighting conditions. While capable of faithfully recording complex light interactions in the material, the main drawback is the massive memory requirement, both for storing and rendering, making effective compression of BTF data a critical component in practical applications. Common compression schemes used in practice are based on matrix factorization techniques, which preserve the discrete format of the original dataset. While this approach generalizes well to different materials, rendering with the compressed dataset still relies on interpolating between the closest samples. Depending on the material and the angular resolution of the BTF, this can lead to blurring and ghosting artefacts. An alternative approach uses analytic model fitting to approximate the BTF data, using continuous functions that naturally interpolate well, but whose expressive range is often not wide enough to faithfully recreate materials with complex non‐local lighting effects (subsurface scattering, inter‐reflections, shadowing and masking…). In light of these observations, we propose a neural network‐based BTF representation inspired by autoencoders: our encoder compresses each texel to a small set of latent coefficients, while our decoder additionally takes in a light and view direction and outputs a single RGB vector at a time. This allows us to continuously query reflectance values in the light and view hemispheres, eliminating the need for linear interpolation between discrete samples. We train our architecture on fabric BTFs with a challenging appearance and compare to standard PCA as a baseline. We achieve competitive compression ratios and high‐quality interpolation/extrapolation without blurring or ghosting artifacts.  相似文献   

Monte Carlo methods for physically‐based light transport simulation are broadly adopted in the feature film production, animation and visual effects industries. These methods, however, often result in noisy images and have slow convergence. As such, improving the convergence of Monte Carlo rendering remains an important open problem. Gradient‐domain light transport is a recent family of techniques that can accelerate Monte Carlo rendering by up to an order of magnitude, leveraging a gradient‐based estimation and a reformulation of the rendering problem as an image reconstruction. This state of the art report comprehensively frames the fundamentals of gradient‐domain rendering, as well as the pragmatic details behind practical gradient‐domain uniand bidirectional path tracing and photon density estimation algorithms. Moreover, we discuss the various image reconstruction schemes that are crucial to accurate and stable gradient‐domain rendering. Finally, we benchmark various gradient‐domain techniques against the state‐of‐the‐art in denoising methods before discussing open problems.  相似文献   

We introduce an interactive tool for novice users to design mechanical objects made of 2.5D linkages. Users simply draw the shape of the object and a few key poses of its multiple moving parts. Our approach automatically generates a one‐degree‐of freedom linkage that connects the fixed and moving parts, such that the moving parts traverse all input poses in order without any collision with the fixed and other moving parts. In addition, our approach avoids common linkage defects and favors compact linkages and smooth motion trajectories. Finally, our system automatically generates the 3D geometry of the object and its links, allowing the rapid creation of a physical mockup of the designed object.  相似文献   

Presenting high‐fidelity 3D content on compact portable devices with low computational power is challenging. Smartphones, tablets and head‐mounted displays (HMDs) suffer from thermal and battery‐life constraints and thus cannot match the render quality of desktop PCs and laptops. Streaming rendering enables to show high‐quality content but can suffer from potentially high latency. We propose an approach to efficiently capture shading samples in object space and packing them into a texture. Streaming this texture to the client, we support temporal frame up‐sampling with high fidelity, low latency and high mobility. We introduce two novel sample distribution strategies and a novel triangle representation in the shading atlas space. Since such a system requires dynamic parallelism, we propose an implementation exploiting the power of hardware‐accelerated tessellation stages. Our approach allows fast de‐coding and rendering of extrapolated views on a client device by using hardware‐accelerated interpolation between shading samples and a set of potentially visible geometry. A comparison to existing shading methods shows that our sample distributions allow better client shading quality than previous atlas streaming approaches and outperforms image‐based methods in all relevant aspects.  相似文献   

Recent neural style transfer frameworks have obtained astonishing visual quality and flexibility in Single‐style Transfer (SST), but little attention has been paid to Multi‐style Transfer (MST) which refers to simultaneously transferring multiple styles to the same image. Compared to SST, MST has the potential to create more diverse and visually pleasing stylization results. In this paper, we propose the first MST framework to automatically incorporate multiple styles into one result based on regional semantics. We first improve the existing SST backbone network by introducing a novel multi‐level feature fusion module and a patch attention module to achieve better semantic correspondences and preserve richer style details. For MST, we designed a conceptually simple yet effective region‐based style fusion module to insert into the backbone. It assigns corresponding styles to content regions based on semantic matching, and then seamlessly combines multiple styles together. Comprehensive evaluations demonstrate that our framework outperforms existing works of SST and MST.  相似文献   

The stochastic nature of Monte Carlo rendering algorithms inherently produces noisy images. Essentially, three approaches have been developed to solve this issue: improving the ray‐tracing strategies to reduce pixel variance, providing adaptive sampling by increasing the number of rays in regions needing so, and filtering the noisy image as a post‐process. Although the algorithms from the latter category introduce bias, they remain highly attractive as they quickly improve the visual quality of the images, are compatible with all sorts of rendering effects, have a low computational cost and, for some of them, avoid deep modifications of the rendering engine. In this paper, we build upon recent advances in both non‐local and collaborative filtering methods to propose a new efficient denoising operator for Monte Carlo rendering. Starting from the local statistics which emanate from the pixels sample distribution, we enrich the image with local covariance measures and introduce a nonlocal bayesian filter which is specifically designed to address the noise stemming from Monte Carlo rendering. The resulting algorithm only requires the rendering engine to provide for each pixel a histogram and a covariance matrix of its color samples. Compared to state‐of‐the‐art sample‐based methods, we obtain improved denoising results, especially in dark areas, with a large increase in speed and more robustness with respect to the main parameter of the algorithm. We provide a detailed mathematical exposition of our bayesian approach, discuss extensions to multiscale execution, adaptive sampling and animated scenes, and experimentally validate it on a collection of scenes.  相似文献   

Color scribbling is a unique form of illustration where artists use compact, overlapping, and monochromatic scribbles at microscopic scale to create astonishing colorful images at macroscopic scale. The creation process is skill‐demanded and time‐consuming, which typically involves drawing monochromatic scribbles layer‐by‐layer to depict true‐color subjects using a limited color palette delicately. In this work, we present a novel computational framework for automatic generation of color scribble images from arbitrary raster images. The core contribution of our work lies in a novel color dithering model tailor‐made for synthesizing a smooth color appearance using multiple layers of overlapped monochromatic strokes. Specifically, our system reconstructs the appearance of the input image by (i) generating layers of monochromatic scribbles based on a limited color palette derived from input image, and (ii) optimizing the drawing sequence among layers to minimize the visual color dissimilarity between dithered image and original image as well as the color banding artifacts. We demonstrate the effectiveness and robustness of our algorithm with various convincing results synthesized from a variety of input images with different stroke patterns. The experimental study further shows that our approach faithfully captures the scribble style and the color presentation at respectively microscopic and macroscopic scales, which is otherwise difficult for state‐of‐the‐art methods.  相似文献   

Rendering materials such as metallic paints, scratched metals and rough plastics requires glint integrators that can capture all micro‐specular highlights falling into a pixel footprint, faithfully replicating surface appearance. Specular normal maps can be used to represent a wide range of arbitrary micro‐structures. The use of normal maps comes with important drawbacks though: the appearance is dark overall due to back‐facing normals and importance sampling is suboptimal, especially when the micro‐surface is very rough. We propose a new glint integrator relying on a multiple‐scattering patch‐based BRDF addressing these issues. To do so, our method uses a modified version of microfacet‐based normal mapping [SHHD17] designed for glint rendering, leveraging symmetric microfacets. To model multiple‐scattering, we re‐introduce the lost energy caused by a perfectly specular, single‐scattering formulation instead of using expensive random walks. This reflectance model is the basis of our patch‐based BRDF, enabling robust sampling and artifact‐free rendering with a natural appearance. Additional calculation costs amount to about 40% in the worst cases compared to previous methods [YHMR16, CCM18].  相似文献   

Palette‐based image decomposition has attracted increasing attention in recent years. A specific class of approaches have been proposed basing on the RGB‐space geometry, which manage to construct convex hulls whose vertices act as palette colors. However, such palettes do not guarantee to have the representative colors which actually appear in the image, thus making it less intuitive and less predictable when editing palette colors to perform recoloring. Hence, we proposed an improved geometric approach to address this issue. We use a polyhedron, but not necessarily a convex hull, in the RGB space to represent the color palette. We then formulate the task of palette extraction as an optimization problem which could be solved in a few seconds. Our palette has a higher degree of representativeness and maintains a relatively similar level of accuracy compared with previous methods. For layer decomposition, we compute layer opacities via simple mean value coordinates, which could achieve instant feedbacks without precomputations. We have demonstrated our method for image recoloring on a variety of examples. In comparison with state‐of‐the‐art works, our approach is generally more intuitive and efficient with fewer artifacts.  相似文献   

This paper proposes a deep learning‐based image tone enhancement approach that can maximally enhance the tone of an image while preserving the naturalness. Our approach does not require carefully generated ground‐truth images by human experts for training. Instead, we train a deep neural network to mimic the behavior of a previous classical filtering method that produces drastic but possibly unnatural‐looking tone enhancement results. To preserve the naturalness, we adopt the generative adversarial network (GAN) framework as a regularizer for the naturalness. To suppress artifacts caused by the generative nature of the GAN framework, we also propose an imbalanced cycle‐consistency loss. Experimental results show that our approach can effectively enhance the tone and contrast of an image while preserving the naturalness compared to previous state‐of‐the‐art approaches.  相似文献   

Traditional pencil drawing rendering algorithms when applied to video may suffer from temporal inconsistency and shower‐door effect due to the stochastic noise models employed. This paper attempts to resolve these problems with deep learning. Recently, many research endeavors have demonstrated that feed‐forward Convolutional Neural Networks (CNNs) are capable of using a reference image to stylize a whole video sequence while removing the shower‐door effect in video style transfer applications. Compared with video style transfer, pencil drawing video is more sensitive to the inconsistency of texture and requires a stronger expression of pencil hatching. Thus, in this paper we develop an approach by combining a latest Line Integral Convolution (LIC) based method, specializing in realistically simulating pencil drawing images, with a new feed‐forward CNN that can eliminate the shower‐door effect successfully. Taking advantage of optical flow, we adopt a feature‐map‐level temporal loss function and propose a new framework to avoid the temporal inconsistency between consecutive frames, enhancing the visual impression of pencil strokes and tone. Experimental comparisons with the existing feed‐forward CNNs have demonstrated that our method can generate temporally more stable and visually more pleasant pencil drawing video results in a faster manner.  相似文献   

Emissive media are often challenging to render: in thin regions where only few scattering events occur the emission is poorly sampled, while sampling events for emission can be disadvantageous due to absorption in dense regions. We extend the standard path space measurement contribution to also collect emission along path segments, not only at vertices. We apply this extension to two estimators: extending paths via scattering and distance sampling, and next event estimation. In order to do so, we unify the two approaches and derive the corresponding Monte Carlo estimators to interpret next event estimation as a solid angle sampling technique. We avoid connecting paths to vertices hidden behind dense absorbing layers of smoke by also including transmittance sampling into next event estimation. We demonstrate the advantages of our line integration approach which generates estimators with lower variance since entire segments are accounted for. Also, our novel forward next event estimation technique yields faster run times compared to previous next event estimation as it penetrates less deeply into dense volumes.  相似文献   

360° VR videos provide users with an immersive visual experience. To encode 360° VR videos, spherical pixels must be mapped onto a two‐dimensional domain to take advantage of the existing video encoding and storage standards. In VR industry, standard cubemap projection is the most widely used projection method for encoding 360° VR videos. However, it exhibits pixel density variation at different regions due to projection distortion. We present a generalized algorithm to improve the efficiency of cubemap projection using polynomial approximation. In our algorithm, standard cubemap projection can be regarded as a special form with 1st‐order polynomial. Our experiments show that the generalized cubemap projection can significantly reduce the projection distortion using higher order polynomials. As a result, pixel distribution can be well balanced in the resulting 360° VR videos. We use PSNR, S‐PSNR and CPP‐PSNR to evaluate the visual quality and the experimental results demonstrate promising performance improvement against standard cubemap projection and Google's equi‐angular cubemap.  相似文献   

Distributions of samples play a very important role in rendering, affecting variance, bias and aliasing in Monte‐Carlo and Quasi‐Monte Carlo evaluation of the rendering equation. In this paper, we propose an original sampler which inherits many important features of classical low‐discrepancy sequences (LDS): a high degree of uniformity of the achieved distribution of samples, computational efficiency and progressive sampling capability. At the same time, we purposely tailor our sampler in order to improve its spectral characteristics, which in turn play a crucial role in variance reduction, anti‐aliasing and improving visual appearance of rendering. Our sampler can efficiently generate sequences of multidimensional points, whose power spectra approach so‐called Blue‐Noise (BN) spectral property while preserving low discrepancy (LD) in certain 2‐D projections. In our tile‐based approach, we perform permutations on subsets of the original Sobol LDS. In a large space of all possible permutations, we select those which better approach the target BN property, using pair‐correlation statistics. We pre‐calculate such “good” permutations for each possible Sobol pattern, and store them in a lookup table efficiently accessible in runtime. We provide a complete and rigorous proof that such permutations preserve dyadic partitioning and thus the LDS properties of the point set in 2‐D projections. Our construction is computationally efficient, has a relatively low memory footprint and supports adaptive sampling. We validate our method by performing spectral/discrepancy/aliasing analysis of the achieved distributions, and provide variance analysis for several target integrands of theoretical and practical interest.  相似文献   

