首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 87 毫秒
1.
Tracking pedestrians is a vital component of many computer vision applications, including surveillance, scene understanding, and behavior analysis. Videos of crowded scenes present significant challenges to tracking due to the large number of pedestrians and the frequent partial occlusions that they produce. The movement of each pedestrian, however, contributes to the overall crowd motion (i.e., the collective motions of the scene's constituents over the entire video) that exhibits an underlying spatially and temporally varying structured pattern. In this paper, we present a novel Bayesian framework for tracking pedestrians in videos of crowded scenes using a space-time model of the crowd motion. We represent the crowd motion with a collection of hidden Markov models trained on local spatio-temporal motion patterns, i.e., the motion patterns exhibited by pedestrians as they move through local space-time regions of the video. Using this unique representation, we predict the next local spatio-temporal motion pattern a tracked pedestrian will exhibit based on the observed frames of the video. We then use this prediction as a prior for tracking the movement of an individual in videos of extremely crowded scenes. We show that our approach of leveraging the crowd motion enables tracking in videos of complex scenes that present unique difficulty to other approaches.  相似文献   

2.
Consistent segmentation is to the center of many applications based on dynamic geometric data. Directly segmenting a raw 3D point cloud sequence is a challenging task due to the low data quality and large inter‐frame variation across the whole sequence. We propose a local‐to‐global approach to co‐segment point cloud sequences of articulated objects into near‐rigid moving parts. Our method starts from a per‐frame point clustering, derived from a robust voting‐based trajectory analysis. The local segments are then progressively propagated to the neighboring frames with a cut propagation operation, and further merged through all frames using a novel space‐time segment grouping technqiue, leading to a globally consistent and compact segmentation of the entire articulated point cloud sequence. Such progressive propagating and merging, in both space and time dimensions, makes our co‐segmentation algorithm especially robust in handling noise, occlusions and pose/view variations that are usually associated with raw scan data.  相似文献   

3.
Rendering animations of scenes with deformable objects, camera motion, and complex illumination, including indirect lighting and arbitrary shading, is a long‐standing challenge. Prior work has shown that complex lighting can be accurately approximated by a large collection of point lights. In this formulation, rendering of animation sequences becomes the problem of efficiently shading many surface samples from many lights across several frames. This paper presents a tensor formulation of the animated many‐light problem, where each element of the tensor expresses the contribution of one light to one pixel in one frame. We sparsely sample rows and columns of the tensor, and introduce a clustering algorithm to select a small number of representative lights to efficiently approximate the animation. Our algorithm achieves efficiency by reusing representatives across frames, while minimizing temporal flicker. We demonstrate our algorithm in a variety of scenes that include deformable objects, complex illumination and arbitrary shading and show that a surprisingly small number of representative lights is sufficient for high quality rendering. We believe out algorithm will find practical use in applications that require fast previews of complex animation.  相似文献   

4.
We present an efficient and scalable system that enables programmable motion effects on GPUs. Our system is based on the framework proposed by Schmid et al. [ [SSBG10] ] that extends the concept of a surface shader to that of a programmable motion effect. While capable of expressing a variety of motion depiction styles, the execution of motion effect programs requires global knowledge about all portions of an object's surface that passes in front of a pixel during an arbitrarily long period of time, resulting in extremely high memory usage and significantly restricting the degree of parallelism of typical GPU rendering algorithms that parallelize computations over pixels in each frame of animations. To address this problem, we design our system to process multiple frames of a pixel in parallel. This new parallelization approach enables better utilization of GPU memory and also makes it possible to design an efficient out‐of‐core algorithm required in rendering real‐world animations. We also develop an analytical visibility algorithm to resolve depth conflicts of objects, reducing the required temporal resampling rate and further exposing parallelism. Experiments show that we are able to handle very large scenes and improve runtime performance up to an order of magnitude.  相似文献   

5.
We propose a new technique for in‐core and out‐of‐core GPU ray tracing using a generalization of hierarchical occlusion culling in the style of the CHC++ method. Our method exploits the rasterization pipeline and hardware occlusion queries in order to create coherent batches of work for localized shader‐based ray tracing kernels. By combining hierarchies in both ray space and object space, the method is able to share intermediate traversal results among multiple rays. We exploit temporal coherence among similar ray sets between frames and also within the given frame. A suitable management of the current visibility state makes it possible to benefit from occlusion culling for less coherent ray types like diffuse reflections. Since large scenes are still a challenge for modern GPU ray tracers, our method is most useful for scenes with medium to high complexity, especially since our method inherently supports ray tracing highly complex scenes that do not fit in GPU memory. For in‐core scenes our method is comparable to CUDA ray tracing and performs up to 5.94 × better than pure shader‐based ray tracing.  相似文献   

6.
Ambient occlusion is a cheap but effective approximation of global illumination. Recently, screen‐space ambient occlusion (SSAO) methods, which sample the frame buffer as a discretization of the scene geometry, have become very popular for real‐time rendering. We present temporal SSAO (TSSAO), a new algorithm which exploits temporal coherence to produce high‐quality ambient occlusion in real time. Compared to conventional SSAO, our method reduces both noise as well as blurring artefacts due to strong spatial filtering, faithfully representing fine‐grained geometric structures. Our algorithm caches and reuses previously computed SSAO samples, and adaptively applies more samples and spatial filtering only in regions that do not yet have enough information available from previous frames. The method works well for both static and dynamic scenes.  相似文献   

7.
Automatic camera control for scenes depicting human motion is an imperative topic in motion capture base animation, computer games, and other animation based fields. This challenging control problem is complex and combines both geometric constraints, visibility requirements, and aesthetic elements. Therefore, existing optimization‐based approaches for human action overview are often too demanding for online computation. In this paper, we introduce an effective automatic camera control which is extremely efficient and allows online performance. Rather than optimizing a complex quality measurement, at each time it selects one active camera from a multitude of cameras that render the dynamic scene. The selection is based on the correlation between each view stream and the human motion in the scene. Two factors allow for rapid selection among tens of candidate views in real‐time, even for complex multi‐character scenes: the efficient rendering of the multitude of view streams, and optimized calculations of the correlations using modified CCA. In addition to the method's simplicity and speed, it exhibits good agreement with both cinematic idioms and previous human motion camera control work. Our evaluations show that the method is able to cope with the challenges put forth by severe occlusions, multiple characters and complex scenes.  相似文献   

8.
We introduce a screen‐space statistical filtering method for real‐time rendering with global illumination. It is inspired by statistical filtering proposed by Meyer et al. to reduce the noise in global illumination over a period of time by estimating the principal components from all rendered frames. Our work extends their method to achieve nearly real‐time performance on modern GPUs. More specifically, our method employs the candid covariance‐free incremental PCA to overcome several limitations of the original algorithm by Meyer et al., such as its high computational cost and memory usage that hinders its implementation on GPUs. By combining the reprojection and per‐pixel weighting techniques, our method handles the view changes and object movement in dynamic scenes as well.  相似文献   

9.
Detection and Tracking of Occluded People   总被引:2,自引:0,他引:2  
We consider the problem of detection and tracking of multiple people in crowded street scenes. State-of-the-art methods perform well in scenes with relatively few people, but are severely challenged by scenes with many subjects that partially occlude each other. This limitation is due to the fact that current people detectors fail when persons are strongly occluded. We observe that typical occlusions are due to overlaps between people and propose a people detector tailored to various occlusion levels. Instead of treating partial occlusions as distractions, we leverage the fact that person/person occlusions result in very characteristic appearance patterns that can help to improve detection results. We demonstrate the performance of our occlusion-aware person detector on a new dataset of people with controlled but severe levels of occlusion and on two challenging publicly available benchmarks outperforming single person detectors in each case.  相似文献   

10.
Constructing a Multivalued Representation for View Synthesis   总被引:1,自引:1,他引:1  
A fundamental problem in computer vision and graphics is that of arbitrary view synthesis for static 3-D scenes, whereby a user-specified viewpoint of the given scene may be created directly from a representation. We propose a novel compact representation for this purpose called the multivalued representation (MVR). Starting with an image sequence captured by a moving camera undergoing either unknown planar translation or orbital motion, a MVR is derived for each preselected reference frame, and may then be used to synthesize arbitrary views of the scene. The representation itself is comprised of multiple depth and intensity levels in which the k-th level consists of points occluded by exactly k surfaces. To build a MVR with respect to a particular reference frame, dense depth maps are first computed for all the neighboring frames of the reference frame. The depth maps are then combined together into a single map, where points are organized by occlusions rather than by coherent affine motions. This grouping facilitates an automatic process to determine the number of levels and helps to reduce the artifacts caused by occlusions in the scene. An iterative multiframe algorithm is presented for dense depth estimation that both handles low-contrast regions and produces piecewise smooth depth maps. Reconstructed views as well as arbitrary flyarounds of real scenes are presented to demonstrate the effectiveness of the approach.  相似文献   

11.
Segmenting a moving foreground (fg) from its background (bg) is a fundamental step in many Machine Vision and Computer Graphics applications. Nevertheless, hardly any attempts have been made to tackle this problem in dynamic 3D scanned scenes. Scanned dynamic scenes are typically challenging due to noise and large missing parts. Here, we present a novel approach for motion segmentation in dynamic point‐cloud scenes designed to cater to the unique properties of such data. Our key idea is to augment fg/bg classification with an active learning framework by refining the segmentation process in an adaptive manner. Our method initially classifies the scene points as either fg or bg in an un‐supervised manner. This, by training discriminative RBF‐SVM classifiers on automatically labeled, high‐certainty fg/bg points. Next, we adaptively detect unreliable classification regions (i.e. where fg/bg separation is uncertain), locally add more training examples to better capture the motion in these areas, and re‐train the classifiers to fine‐tune the segmentation. This not only improves segmentation accuracy, but also allows our method to perform in a coarse‐to‐fine manner, thereby efficiently process high‐density point‐clouds. Additionally, we present a unique interactive paradigm for enhancing this learning process, by using a manual editing tool. The user explicitly edits the RBF‐SVM decision borders in unreliable regions in order to refine and correct the classification. We provide extensive qualitative and quantitative experiments on both real (scanned) and synthetic dynamic scenes.  相似文献   

12.
The generation of inbetween frames that interpolate a given set of key frames is a major component in the production of a 2D feature animation. Our objective is to considerably reduce the cost of the inbetweening phase by offering an intuitive and effective interactive environment that automates inbetweening when possible while allowing the artist to guide, complement, or override the results. Tight inbetweens, which interpolate similar key frames, are particularly time‐consuming and tedious to draw. Therefore, we focus on automating these high‐precision and expensive portions of the process. We have designed a set of user‐guided semi‐automatic techniques that fit well with current practice and minimize the number of required artist‐gestures. We present a novel technique for stroke interpolation from only two keys which combines a stroke motion constructed from logarithmic spiral vertex trajectories with a stroke deformation based on curvature averaging and twisting warps. We discuss our system in the context of a feature animation production environment and evaluate our approach with real production data.  相似文献   

13.
Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multi-view approach to solving this problem. In our approach we neither detect nor track objects from any single camera or camera pair; rather evidence is gathered from all the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multi-view approaches that require fully calibrated views our approach is purely image-based and uses only 2D constructs. To this end we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views, to resolve occlusions and localize people on a reference scene plane. For greater robustness this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis, are demonstrated in challenging multi-view, crowded scenes.  相似文献   

14.
Tracking in a Dense Crowd Using Multiple Cameras   总被引:1,自引:0,他引:1  
Tracking people in a dense crowd is a challenging problem for a single camera tracker due to occlusions and extensive motion that make human segmentation difficult. In this paper we suggest a method for simultaneously tracking all the people in a densely crowded scene using a set of cameras with overlapping fields of view. To overcome occlusions, the cameras are placed at a high elevation and only people’s heads are tracked. Head detection is still difficult since each foreground region may consist of multiple subjects. By combining data from several views, height information is extracted and used for head segmentation. The head tops, which are regarded as 2D patches at various heights, are detected by applying intensity correlation to aligned frames from the different cameras. The detected head tops are then tracked using common assumptions on motion direction and velocity. The method was tested on sequences in indoor and outdoor environments under challenging illumination conditions. It was successful in tracking up to 21 people walking in a small area (2.5 people per m2), in spite of severe and persistent occlusions.  相似文献   

15.
Temporally consistent motion segmentation from RGB‐D videos is challenging because of the limitations of current RGB‐D sensors. We formulate segmentation as a motion assignment problem, where a motion is a sequence of rigid transformations through all frames of the input. We capture the quality of each potential assignment by defining an appropriate energy function that accounts for occlusions and a sensor‐specific noise model. To make energy minimization tractable, we work with a discrete set instead of the continuous, high dimensional space of motions, where the discrete motion set provides an upper bound for the original energy. We repeatedly minimize our energy, and in each step extend and refine the motion set to further lower the bound. A quantitative comparison to the current state of the art demonstrates the benefits of our approach in difficult scenarios.  相似文献   

16.
We present an image‐based rendering system to viewpoint‐navigate through space and time of complex real‐world, dynamic scenes. Our approach accepts unsynchronized, uncalibrated multivideo footage as input. Inexpensive, consumer‐grade camcorders suffice to acquire arbitrary scenes, for example in the outdoors, without elaborate recording setup procedures, allowing also for hand‐held recordings. Instead of scene depth estimation, layer segmentation or 3D reconstruction, our approach is based on dense image correspondences, treating view interpolation uniformly in space and time: spatial viewpoint navigation, slow motion or freeze‐and‐rotate effects can all be created in the same way. Acquisition simplification, integration of moving cameras, generalization to difficult scenes and space–time symmetric interpolation amount to a widely applicable virtual video camera system.  相似文献   

17.
Multi‐view reconstruction aims at computing the geometry of a scene observed by a set of cameras. Accurate 3D reconstruction of dynamic scenes is a key component for a large variety of applications, ranging from special effects to telepresence and medical imaging. In this paper we propose a method based on Moving Least Squares surfaces which robustly and efficiently reconstructs dynamic scenes captured by a calibrated set of hybrid color+depth cameras. Our reconstruction provides spatio‐temporal consistency and seamlessly fuses color and geometric information. We illustrate our approach on a variety of real sequences and demonstrate that it favorably compares to state‐of‐the‐art methods.  相似文献   

18.
Tracking hundreds of persons in the large and high density scenarios is a particularly challenging task due to the frequent occlusions and merged measurements. In such circumstances, a stronger dynamic model for prediction usually plays a more important role in the overall tracking process. In this paper, we propose an elaborate dynamic model for multiple pedestrians tracking in the extremely crowded environments. The novelty of this tracking model is that: the global semantic scene structure, local instantaneous crowd flow and the social interactions among persons are taken into account together and combined into an unified approach, which can make the prediction for persons’ motion more powerful and accurate. We apply the proposed model by using an online “tracking-learning” framework, which can not only perform the robust tracking in the extremely crowded scenarios, but also ensures that the entire process is fully automatic and online. The testing is conducted on the JR subway station of Tokyo, and the experimental results show that the system with our tracking model can robustly track more than 180 targets at the same time while the occlusions and merge/split frequently occur.  相似文献   

19.
Voxel‐based rendering has recently received significant attention due to its potential in the context of efficiently rendering massively large and highly detailed scenes. Unfortunately, few scenes are available in the form of sparse voxel octrees. In this paper, we present an out‐of‐core algorithm for constructing a sparse voxel octree from a triangle mesh. Our algorithm allows the input triangle mesh, the output sparse voxel octree and, most importantly, the intermediate high‐resolution 3D voxel grid, to be larger than available memory. We demonstrate that our out‐of‐core algorithm can construct sparse voxel octrees from triangle meshes using only a fraction of the memory required by an in‐core algorithm in roughly the same time, and that our out‐of‐core algorithm can also handle extremely large triangle meshes.  相似文献   

20.
To design a bas‐relief from a 3D scene is an inherently interactive task in many scenarios. The user normally needs to get instant feedback to select a proper viewpoint. However, current methods are too slow to facilitate this interaction. This paper proposes a two‐scale bas‐relief modeling method, which is computationally efficient and easy to produce different styles of bas‐reliefs. The input 3D scene is first rendered into two textures, one recording the depth information and the other recording the normal information. The depth map is then compressed to produce a base surface with level‐of‐depth, and the normal map is used to extract local details with two different schemes. One scheme provides certain freedom to design bas‐reliefs with different visual appearances, and the other provides a control over the level of detail. Finally, the local feature details are added into the base surface to produce the final result. Our approach allows for real‐time computation due to its implementation on graphics hardware. Experiments with a wide range of 3D models and scenes show that our approach can effectively generate digital bas‐reliefs in real time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号