首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a novel approach to optimally retarget videos for varied displays with differing aspect ratios by preserving salient scene content discovered via eye tracking. Our algorithm performs editing with cut, pan and zoom operations by optimizing the path of a cropping window within the original video while seeking to (i) preserve salient regions, and (ii) adhere to the principles of cinematography. Our approach is (a) content agnostic as the same methodology is employed to re‐edit a wide‐angle video recording or a close‐up movie sequence captured with a static or moving camera, and (b) independent of video length and can in principle re‐edit an entire movie in one shot. Our algorithm consists of two steps. The first step employs gaze transition cues to detect time stamps where new cuts are to be introduced in the original video via dynamic programming. A subsequent step optimizes the cropping window path (to create pan and zoom effects), while accounting for the original and new cuts. The cropping window path is designed to include maximum gaze information, and is composed of piecewise constant, linear and parabolic segments. It is obtained via L(1) regularized convex optimization which ensures a smooth viewing experience. We test our approach on a wide variety of videos and demonstrate significant improvement over the state‐of‐the‐art, both in terms of computational complexity and qualitative aspects. A study performed with 16 users confirms that our approach results in a superior viewing experience as compared to gaze driven re‐editing [ JSSH15 ] and letterboxing methods, especially for wide‐angle static camera recordings.  相似文献   

2.
Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time‐consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real‐time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky.  相似文献   

3.
We present a novel method to reconstruct a fluid's 3D density and motion based on just a single sequence of images. This is rendered possible by using powerful physical priors for this strongly under‐determined problem. More specifically, we propose a novel strategy to infer density updates strongly coupled to previous and current estimates of the flow motion. Additionally, we employ an accurate discretization and depth‐based regularizers to compute stable solutions. Using only one view for the reconstruction reduces the complexity of the capturing setup drastically and could even allow for online video databases or smart‐phone videos as inputs. The reconstructed 3D velocity can then be flexibly utilized, e.g., for re‐simulation, domain modification or guiding purposes. We will demonstrate the capacity of our method with a series of synthetic test cases and the reconstruction of real smoke plumes captured with a Raspberry Pi camera.  相似文献   

4.
Radiometric compensation methods remove the effect of the underlying spatially varying surface reflectance of the texture when projecting on textured surfaces. All prior work sample the surface reflectance dependent radiometric transfer function from the projector to the camera at every pixel that requires the camera to observe tens or hundreds of images projected by the projector. In this paper, we cast the radiometric compensation problem as a sampling and reconstruction of multi‐dimensional radiometric transfer function that models the color transfer function from the projector to an observing camera and the surface reflectance in a unified manner. Such a multi‐dimensional representation makes no assumption about linearity of the projector to camera color transfer function and can therefore handle projectors with non‐linear color transfer functions(e.g. DLP, LCOS, LED‐based or laser‐based). We show that with a well‐curated sampling of this multi‐dimensional function, achieved by exploiting the following key properties, is adequate for its accurate representation: (a) the spectral reflectance of most real‐world materials are smooth and can be well‐represented using a lower‐dimension function; (b) the reflectance properties of the underlying texture have strong redundancies – for example, multiple pixels or even regions can have similar surface reflectance; (c) the color transfer function from the projector to camera have strong input coherence. The proposed sampling allows us to reduce the number of projected images that needs to be observed by a camera by up to two orders of magnitude, the minimum being only two. We then present a new multi‐dimensional scattered data interpolation technique to reconstruct the radiometric transfer function at a high spatial density (i.e. at every pixel) to compute the compensation image. We show that the accuracy of our interpolation technique is higher than any existing methods.  相似文献   

5.
The usage of deep learning models for tagging input data has increased over the past years because of their accuracy and high‐performance. A successful application is to score sleep stages. In this scenario, models are trained to predict the sleep stages of individuals. Although their predictive accuracy is high, there are still mis classifications that prevent doctors from properly diagnosing sleep‐related disorders. This paper presents a system that allows users to explore the output of deep learning models in a real‐life scenario to spot and analyze faulty predictions. These can be corrected by users to generate a sequence of sleep stages to be examined by doctors. Our approach addresses a real‐life scenario with absence of ground truth. It differs from others in that our goal is not to improve the model itself, but to correct the predictions it provides. We demonstrate that our approach is effective in identifying faulty predictions and helping users to fix them in the proposed use case.  相似文献   

6.
We propose a novel framework to generate a global texture atlas for a deforming geometry. Our approach distinguishes from prior arts in two aspects. First, instead of generating a texture map for each timestamp to color a dynamic scene, our framework reconstructs a global texture atlas that can be consistently mapped to a deforming object. Second, our approach is based on a single RGB‐D camera, without the need of a multiple‐camera setup surrounding a scene. In our framework, the input is a 3D template model with an RGB‐D image sequence, and geometric warping fields are found using a state‐of‐the‐art non‐rigid registration method [GXW*15] to align the template mesh to noisy and incomplete input depth images. With these warping fields, our multi‐scale approach for texture coordinate optimization generates a sharp and clear texture atlas that is consistent with multiple color observations over time. Our approach is accelerated by graphical hardware and provides a handy configuration to capture a dynamic geometry along with a clean texture atlas. We demonstrate our approach with practical scenarios, particularly human performance capture. We also show that our approach is resilient on misalignment issues caused by imperfect estimation of warping fields and inaccurate camera parameters.  相似文献   

7.
Monte‐Carlo path tracing techniques can generate stunning visualizations of medical volumetric data. In a clinical context, such renderings turned out to be valuable for communication, education, and diagnosis. Because a large number of computationally expensive lighting samples is required to converge to a smooth result, progressive rendering is the only option for interactive settings: Low‐sampled, noisy images are shown while the user explores the data, and as soon as the camera is at rest the view is progressively refined. During interaction, the visual quality is low, which strongly impedes the user's experience. Even worse, when a data set is explored in virtual reality, the camera is never at rest, leading to constantly low image quality and strong flickering. In this work we present an approach to bring volumetric Monte‐Carlo path tracing to the interactive domain by reusing samples over time. To this end, we transfer the idea of temporal antialiasing from surface rendering to volume rendering. We show how to reproject volumetric ray samples even though they cannot be pinned to a particular 3D position, present an improved weighting scheme that makes longer history trails possible, and define an error accumulation method that downweights less appropriate older samples. Furthermore, we exploit reprojection information to adaptively determine the number of newly generated path tracing samples for each individual pixel. Our approach is designed for static, medical data with both volumetric and surface‐like structures. It achieves good‐quality volumetric Monte‐Carlo renderings with only little noise, and is also usable in a VR context.  相似文献   

8.
Power saving is a prevailing concern in desktop computers and, especially, in battery‐powered devices such as mobile phones. This is generating a growing demand for power‐aware graphics applications that can extend battery life, while preserving good quality. In this paper, we address this issue by presenting a real‐time power‐efficient rendering framework, able to dynamically select the rendering configuration with the best quality within a given power budget. Different from the current state of the art, our method does not require precomputation of the whole camera‐view space, nor Pareto curves to explore the vast power‐error space; as such, it can also handle dynamic scenes. Our algorithm is based on two key components: our novel power prediction model, and our runtime quality error estimation mechanism. These components allow us to search for the optimal rendering configuration at runtime, being transparent to the user. We demonstrate the performance of our framework on two different platforms: a desktop computer, and a mobile device. In both cases, we produce results close to the maximum quality, while achieving significant power savings.  相似文献   

9.
360° VR videos provide users with an immersive visual experience. To encode 360° VR videos, spherical pixels must be mapped onto a two‐dimensional domain to take advantage of the existing video encoding and storage standards. In VR industry, standard cubemap projection is the most widely used projection method for encoding 360° VR videos. However, it exhibits pixel density variation at different regions due to projection distortion. We present a generalized algorithm to improve the efficiency of cubemap projection using polynomial approximation. In our algorithm, standard cubemap projection can be regarded as a special form with 1st‐order polynomial. Our experiments show that the generalized cubemap projection can significantly reduce the projection distortion using higher order polynomials. As a result, pixel distribution can be well balanced in the resulting 360° VR videos. We use PSNR, S‐PSNR and CPP‐PSNR to evaluate the visual quality and the experimental results demonstrate promising performance improvement against standard cubemap projection and Google's equi‐angular cubemap.  相似文献   

10.
Reproducing the appearance of real‐world materials using current printing technology is problematic. The reduced number of inks available define the printer's limited gamut, creating distortions in the printed appearance that are hard to control. Gamut mapping refers to the process of bringing an out‐of‐gamut material appearance into the printer's gamut, while minimizing such distortions as much as possible. We present a novel two‐step gamut mapping algorithm that allows users to specify which perceptual attribute of the original material they want to preserve (such as brightness, or roughness). In the first step, we work in the low‐dimensional intuitive appearance space recently proposed by Serrano et al. [ SGM*16 ], and adjust achromatic reflectance via an objective function that strives to preserve certain attributes. From such intermediate representation, we then perform an image‐based optimization including color information, to bring the BRDF into gamut. We show, both objectively and through a user study, how our method yields superior results compared to the state of the art, with the additional advantage that the user can specify which visual attributes need to be preserved. Moreover, we show how this approach can also be used for attribute‐preserving material editing.  相似文献   

11.
Cognitive biases are systematic errors in judgment due to an over‐reliance on rule‐of‐thumb heuristics. Recent research suggests that cognitive biases, like numerical anchoring, transfers to visual analytics in the form of visual anchoring. However, it is unclear how visualization users can be visually anchored and how the anchors affect decision‐making. To investigate, we performed a between‐subjects laboratory experiment with 94 participants to analyze the effects of visual anchors and strategy cues using a visual analytics system. The decision‐making task was to identify misinformation from Twitter news accounts. Participants were randomly assigned to conditions that modified the scenario video (visual anchor) and/or strategy cues provided. Our findings suggest that such interventions affect user activity, speed, confidence, and, under certain circumstances, accuracy. We discuss implications of our results on the forking paths problem and raise concerns on how visualization researchers train users to avoid unintentionally anchoring users and affecting the end result.  相似文献   

12.
Indirect illumination involving with visually rich participating media such as turbulent smoke and loud explosions contributes significantly to the appearances of other objects in a rendering scene. However, previous real‐time techniques have focused only on the appearances of the media directly visible from the viewer. Specifically, appearances that can be indirectly seen over reflective surfaces have not attracted much attention. In this paper, we present a real‐time rendering technique for such indirect views that involves the participating media. To achieve real‐time performance for computing indirect views, we leverage layered polygonal area lights (LPALs) that can be obtained by slicing the media into multiple flat layers. Using this representation, radiance entering each surface point from each slice of the volume is analytically evaluated to achieve instant calculation. The analytic solution can be derived for standard bidirectional reflectance distribution functions (BRDFs) based on the microfacet theory. Accordingly, our method is sufficiently robust to work on surfaces with arbitrary shapes and roughness values. In addition, we propose a quadrature method for more accurate rendering of scenes with dense volumes, and a transformation of the domain of volumes to simplify the calculation and implementation of the proposed method. By taking advantage of these computation techniques, the proposed method achieves real‐time rendering of indirect illumination for emissive volumes.  相似文献   

13.
In this paper, we consider the problem of information transfer across shapes and propose an extension to the widely used functional map representation. Our main observation is that in addition to the vector space structure of the functional spaces, which has been heavily exploited in the functional map framework, the functional algebra (i.e., the ability to take pointwise products of functions) can significantly extend the power of this framework. Equipped with this observation, we show how to improve one of the key applications of functional maps, namely transferring real‐valued functions without conversion to point‐to‐point correspondences. We demonstrate through extensive experiments that by decomposing a given function into a linear combination consisting not only of basis functions but also of their pointwise products, both the representation power and the quality of the function transfer can be improved significantly. Our modification, while computationally simple, allows us to achieve higher transfer accuracy while keeping the size of the basis and the functional map fixed. We also analyze the computational complexity of optimally representing functions through linear combinations of products in a given basis and prove NP‐completeness in some general cases. Finally, we argue that the use of function products can have a wide‐reaching effect in extending the power of functional maps in a variety of applications, in particular by enabling the transfer of high‐frequency functions without changing the representation size or complexity.  相似文献   

14.
Several visual representations have been developed over the years to visualize molecular structures, and to enable a better understanding of their underlying chemical processes. Today, the most frequently used atom‐based representations are the Space‐filling, the Solvent Excluded Surface, the Balls‐and‐Sticks, and the Licorice models. While each of these representations has its individual benefits, when applied to large‐scale models spatial arrangements can be difficult to interpret when employing current visualization techniques. In the past it has been shown that global illumination techniques improve the perception of molecular visualizations; unfortunately existing approaches are tailored towards a single visual representation. We propose a general illumination model for molecular visualization that is valid for different representations. With our illumination model, it becomes possible, for the first time, to achieve consistent illumination among all atom‐based molecular representations. The proposed model can be further evaluated in real‐time, as it employs an analytical solution to simulate diffuse light interactions between objects. To be able to derive such a solution for the rather complicated and diverse visual representations, we propose the use of regression analysis together with adapted parameter sampling strategies as well as shape parametrization guided sampling, which are applied to the geometric building blocks of the targeted visual representations. We will discuss the proposed sampling strategies, the derived illumination model, and demonstrate its capabilities when visualizing several dynamic molecules.  相似文献   

15.
We propose a novel approach to robot‐operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of a volumetric depth fusion framework and performs real‐time voxel‐based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.  相似文献   

16.
We propose a novel learning‐based solution for motion planning of physically‐based humanoid climbing that allows for fast and robust planning of complex climbing strategies and movements, including extreme movements such as jumping. Similar to recent previous work, we combine a high‐level graph‐based path planner with low‐level sampling‐based optimization of climbing moves. We contribute through showing that neural network models of move success probability, effortfulness, and control policy can make both the high‐level and low‐level components more efficient and robust. The models can be trained through random simulation practice without any data. The models also eliminate the need for laboriously hand‐tuned heuristics for graph search. As a result, we are able to efficiently synthesize climbing sequences involving dynamic leaps and one‐hand swings, i.e. there are no limits to the movement complexity or the number of limbs allowed to move simultaneously. Our supplemental video also provides some comparisons between our AI climber and a real human climber.  相似文献   

17.
In this paper, we introduce the CPatch, a curved primitive that can be used to construct arbitrary vector graphics. A CPatch is a generalization of a 2D polygon: Any number of curves up to a cubic degree bound a primitive. We show that a CPatch can be rasterized efficiently in a hierarchical manner on the GPU, locally discarding irrelevant portions of the curves. Our rasterizer is fast and scalable, works on all patches in parallel, and does not require any approximations. We show a parallel implementation of our rasterizer, which naturally supports all kinds of color spaces, blending and super‐sampling. Additionally, we show how vector graphics input can efficiently be converted to a CPatch representation, solving challenges like patch self intersections and false inside‐outside classification. Results indicate that our approach is faster than the state‐of‐the‐art, more flexible and could potentially be implemented in hardware.  相似文献   

18.
A practical way to generate a high dynamic range (HDR) video using off‐the‐shelf cameras is to capture a sequence with alternating exposures and reconstruct the missing content at each frame. Unfortunately, existing approaches are typically slow and are not able to handle challenging cases. In this paper, we propose a learning‐based approach to address this difficult problem. To do this, we use two sequential convolutional neural networks (CNN) to model the entire HDR video reconstruction process. In the first step, we align the neighboring frames to the current frame by estimating the flows between them using a network, which is specifically designed for this application. We then combine the aligned and current images using another CNN to produce the final HDR frame. We perform an end‐to‐end training by minimizing the error between the reconstructed and ground truth HDR images on a set of training scenes. We produce our training data synthetically from existing HDR video datasets and simulate the imperfections of standard digital cameras using a simple approach. Experimental results demonstrate that our approach produces high‐quality HDR videos and is an order of magnitude faster than the state‐of‐the‐art techniques for sequences with two and three alternating exposures.  相似文献   

19.
20.
The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB‐D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use. Motivated by this rapid progress, this state‐of‐the‐art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance‐based animation to real‐time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization‐based reconstruction algorithms. We provide an in‐depth overview of the underlying concepts of real‐world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under‐constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo‐geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号