首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Real-time estimates of a crowd size is a central task in civilian surveillance. In this paper we present a novel system counting people in a crowd scene with overlapping cameras. This system fuses all single view foreground information to localize each person present on the scene. The purpose of our fusion strategy is to use the foreground pixels of each single views to improve real-time objects association between each camera of the network. The foreground pixels are obtained by using an algorithm based on codebook. In this work, we aggregate the resulting silhouettes over cameras network, and compute a planar homography projection of each camera’s visual hull into ground plane. The visual hull is obtained by finding the convex hull of the foreground pixels. After the projection into the ground plane, we fuse the obtained polygons by using the geometric properties of the scene and on the quality of each camera detection. We also suggest a region-based approach tracking strategy which keeps track of people movements and of their identities along time, also enabling tolerance to occasional misdetections. This tracking strategy is implemented on the result of the views fusion and allows to estimate the crowd size dependently on each frame. Assessment of experiments using public datasets proposed for the evaluation of counting people system demonstrates the performance of our fusion approach. These results prove that the fusion strategy can run in real-time and is efficient for making data association. We also prove that the combination of our fusion approach and the proposed tracking improve the people counting.

  相似文献   

2.
Visual Modeling with a Hand-Held Camera   总被引:10,自引:0,他引:10  
In this paper a complete system to build visual models from camera images is presented. The system can deal with uncalibrated image sequences acquired with a hand-held camera. Based on tracked or matched features the relations between multiple views are computed. From this both the structure of the scene and the motion of the camera are retrieved. The ambiguity on the reconstruction is restricted from projective to metric through self-calibration. A flexible multi-view stereo matching scheme is used to obtain a dense estimation of the surface geometry. From the computed data different types of visual models are constructed. Besides the traditional geometry- and image-based approaches, a combined approach with view-dependent geometry and texture is presented. As an application fusion of real and virtual scenes is also shown.  相似文献   

3.
An approach based on fuzzy logic for matching both articulated and non-articulated objects across multiple non-overlapping field of views (FoVs) from multiple cameras is proposed. We call it fuzzy logic matching algorithm (FLMA). The approach uses the information of object motion, shape and camera topology for matching objects across camera views. The motion and shape information of targets are obtained by tracking them using a combination of ConDensation and CAMShift tracking algorithms. The information of camera topology is obtained and used by calculating the projective transformation of each view with the common ground plane. The algorithm is suitable for tracking non-rigid objects with both linear and non-linear motion. We show videos of tracking objects across multiple cameras based on FLMA. From our experiments, the system is able to correctly match the targets across views with a high accuracy.  相似文献   

4.
监控系统中的多摄像机协同   总被引:8,自引:0,他引:8  
描述了一个用于室内场合对多个目标进行跟踪的分布式监控系统.该系统由多个廉价的固定镜头的摄像机构成,具有多个摄像机处理模块和一个中央模块用于协调摄像机间的跟踪任务.由于每个运动目标有可能被多个摄像机同时跟踪,因此如何选择最合适的摄像机对某一目标跟踪,特别是在系统资源紧张时,成为一个问题.提出的新算法能根据目标与摄像机之间的距离并考虑到遮挡的情况,把目标分配给相应的摄像机,因此在遮挡出现时,系统能把遮挡的目标分配给能看见目标并距离最近的那个摄像机.实验表明该系统能协调好多个摄像机进行目标跟踪,并处理好遮挡问题.  相似文献   

5.
How far can human detection and tracking go in real world crowded scenes? Many algorithms often fail in such scenes due to frequent and severe occlusions as well as viewpoint changes. In order to handle these difficulties, we propose Scene Aware Detection (SAD) and Block Assignment Tracking (BAT) that incorporate with some available scene models (e.g. background, layout, ground plane and camera models). The SAD is proposed for accurate detection through utilizing 1) camera model to deal with viewpoint changes by rectifying sub-images, 2) a structural filter approach to handle occlusions based on a feature sharing mechanism in which a three-level hierarchical structure is built for humans, and 3) foregrounds for pruning negative and false positive samples and merging intermediate detection results. Many detection or appearance based tracking systems are prone to errors in occluded scenes because of failures of detectors and interactions of multiple objects. Differently, the BAT formulates tracking as a block assignment process, where blocks with the same label form the appearance of one object. In the BAT, we model objects on two levels, one is the ensemble level to measure how it is like an object by discriminative models, and the other one is the block level to measure how it is like a target object by appearance and motion models. The main advantage of BAT is that it can track an object even when all the part detectors fail as long as the object has assigned blocks. Extensive experiments in many challenging real world scenes demonstrate the efficiency and effectiveness of our approach.  相似文献   

6.
This paper presents a novel method for virtual view synthesis that allows viewers to virtually fly through real soccer scenes, which are captured by multiple cameras in a stadium. The proposed method generates images of arbitrary viewpoints by view interpolation of real camera images near the chosen viewpoints. In this method, cameras do not need to be strongly calibrated since projective geometry between cameras is employed for the interpolation. For avoiding the complex and unreliable process of 3-D recovery, object scenes are segmented into several regions according to the geometric property of the scene. Dense correspondence between real views, which is necessary for intermediate view generation, is automatically obtained by applying projective geometry to each region. By superimposing intermediate images for all regions, virtual views for the entire soccer scene are generated. The efforts for camera calibration are reduced and correspondence matching requires no manual operation; hence, the proposed method can be easily applied to dynamic events in a large space. An application for fly-through observations of soccer match replays is introduced along with the algorithm of view synthesis and experimental results. This is a new approach for providing arbitrary views of an entire dynamic event.  相似文献   

7.
We present an algorithm for the layered segmentation of video data in multiple views. The approach is based on computing the parameters of a layered representation of the scene in which each layer is modelled by its motion, appearance and occupancy, where occupancy describes, probabilistically, the layer’s spatial extent and not simply its segmentation in a particular view. The problem is formulated as the MAP estimation of all layer parameters conditioned on those at the previous time step; i.e., a sequential estimation problem that is equivalent to tracking multiple objects in a given number views. Expectation–Maximisation is used to establish layer posterior probabilities for both occupancy and visibility, which are represented distinctly. Evidence from areas in each view which are described poorly under the model is used to propose new layers automatically. Since these potential new layers often occur at the fringes of images, the algorithm is able to segment and track these in a single view until such time as a suitable candidate match is discovered in the other views. The algorithm is shown to be very effective at segmenting and tracking non-rigid objects and can cope with extreme occlusion. We demonstrate an application of this representation to dynamic novel view synthesis.  相似文献   

8.
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-baseline cameras. The proposed global occlusion estimation approach can deal with severe inter-person occlusions in one or more views by exploiting information from other views. Image features from non-occluded views are given more weight than image features from occluded views. Self occlusion is handled by local occlusion estimation. The local occlusion estimation is used to update the image likelihood function by sorting body parts as a function of distance to the cameras. The combination of the global and the local occlusion estimation leads to accurate tracking results at much lower computational costs. We evaluate the performance of our approach on a pose estimation data set in which inter-person and self occlusions are present. The results of our experiments show that our approach is able to robustly track multiple people during large movement with severe inter-person occlusions and self occlusions, whilst maintaining near real-time performance.  相似文献   

9.
We introduce a framework for unconstrained 3D human upper body pose estimation from multiple camera views in complex environment. Its main novelty lies in the integration of three components: single-frame pose recovery, temporal integration and model texture adaptation. Single-frame pose recovery consists of a hypothesis generation stage, in which candidate 3D poses are generated, based on probabilistic hierarchical shape matching in each camera view. In the subsequent hypothesis verification stage, the candidate 3D poses are re-projected into the other camera views and ranked according to a multi-view likelihood measure. Temporal integration consists of computing K-best trajectories combining a motion model and observations in a Viterbi-style maximum-likelihood approach. Poses that lie on the best trajectories are used to generate and adapt a texture model, which in turn enriches the shape likelihood measure used for pose recovery. The multiple trajectory hypotheses are used to generate pose predictions, augmenting the 3D pose candidates generated at the next time step.  相似文献   

10.
This paper proposes a method to locate and track people by combining evidence from multiple cameras using the homography constraint. The proposed method use foreground pixels from simple background subtraction to compute evidence of the location of people on a reference ground plane. The algorithm computes the amount of support that basically corresponds to the “foreground mass” above each pixel. Therefore, pixels that correspond to ground points have more support. The support is normalized to compensate for perspective effects and accumulated on the reference plane for all camera views. The detection of people on the reference plane becomes a search for regions of local maxima in the accumulator. Many false positives are filtered by checking the visibility consistency of the detected candidates against all camera views. The remaining candidates are tracked using Kalman filters and appearance models. Experimental results using challenging data from PETS’06 show good performance of the method in the presence of severe occlusion. Ground truth data also confirms the robustness of the method.  相似文献   

11.
In this paper, we propose to capture image-based rendering scenes using a novel approach called active rearranged capturing (ARC). Given the total number of available cameras, ARC moves them strategically on the camera plane in order to minimize the sum of squared rendering errors for a given set of light rays to be rendered. Assuming the scene changes slowly so that the optimized camera locations are valid in the next time instance, we formulate the problem as a recursive weighted vector quantization problem, which can be solved efficiently. The ARC approach is verified on both synthetic and real-world scenes. In particular, a large self-reconfigurable camera array is built to demonstrate ARC's performance on real-world scenes. The system renders virtual views at 5-10 frames/s depending on the scene complexity on a moderately equipped computer. Given the virtual view point, the cameras move on a set of rails to perform ARC and improve the rendering quality on the fly  相似文献   

12.
近年来,随着 GPU 技术的深入发展和并行算法的日益成熟,使得实时三维重建成 为可能。文中实现了一种针对小场景的交互式稠密三维重建系统,此系统借助先进的移动跟踪 技术,可以准确地估计相机的即时位置。提出了一种改进的多视深度生成算法,在 GPU 加速下 能够实时计算场景的深度。改进算法中的亚像素级的半全局匹配代价累积提高了多视立体匹配 的精度,并结合全局优化的方法计算出了准确的场景深度信息。深度图被转换为距离场,使用 全局优化的直方图压缩融合算法和并行的原始对偶算法实现了深度的实时融合。实验结果证明 了重建系统的可行性和重建算法的正确性。  相似文献   

13.
视觉目标跟踪任务中的遮挡问题是最具挑战的场景属性之一,研究有效的抗遮挡模型学习方案,对构建适应复杂场景的长期鲁棒跟踪模型具有重要意义.剖析了遮挡影响跟踪性能的本质原因,以抗遮挡性能较好的先进跟踪算法为研究对象,系统分析了模型学习中有效抗遮挡机制,并对其改善长短期遮挡问题的有效性进行比较分析,包括以硬负样本挖掘、有效样本...  相似文献   

14.
Tracking in a Dense Crowd Using Multiple Cameras   总被引:1,自引:0,他引:1  
Tracking people in a dense crowd is a challenging problem for a single camera tracker due to occlusions and extensive motion that make human segmentation difficult. In this paper we suggest a method for simultaneously tracking all the people in a densely crowded scene using a set of cameras with overlapping fields of view. To overcome occlusions, the cameras are placed at a high elevation and only people’s heads are tracked. Head detection is still difficult since each foreground region may consist of multiple subjects. By combining data from several views, height information is extracted and used for head segmentation. The head tops, which are regarded as 2D patches at various heights, are detected by applying intensity correlation to aligned frames from the different cameras. The detected head tops are then tracked using common assumptions on motion direction and velocity. The method was tested on sequences in indoor and outdoor environments under challenging illumination conditions. It was successful in tracking up to 21 people walking in a small area (2.5 people per m2), in spite of severe and persistent occlusions.  相似文献   

15.
Multiple human tracking in high-density crowds   总被引:1,自引:0,他引:1  
In this paper, we introduce a fully automatic algorithm to detect and track multiple humans in high-density crowds in the presence of extreme occlusion. Typical approaches such as background modeling and body part-based pedestrian detection fail when most of the scene is in motion and most body parts of most of the pedestrians are occluded. To overcome this problem, we integrate human detection and tracking into a single framework and introduce a confirmation-by-classification method for tracking that associates detections with tracks, tracks humans through occlusions, and eliminates false positive tracks. We use a Viola and Jones AdaBoost detection cascade, a particle filter for tracking, and color histograms for appearance modeling. To further reduce false detections due to dense features and shadows, we introduce a method for estimation and utilization of a 3D head plane that reduces false positives while preserving high detection rates. The algorithm learns the head plane from observations of human heads incrementally, without any a priori extrinsic camera calibration information, and only begins to utilize the head plane once confidence in the parameter estimates is sufficiently high. In an experimental evaluation, we show that confirmation-by-classification and head plane estimation together enable the construction of an excellent pedestrian tracker for dense crowds.  相似文献   

16.
This paper explores the problem of multi-view feature matching from an unordered set of widely separated views. A set of local invariant features is extracted independently from each view. First we propose a new view-ordering algorithm that organizes all the unordered views into clusters of related (i.e. the same scene) views by efficiently computing the view-similarity values of all view pairs by reasonably selecting part of extracted features to match. Second a robust two-view matching algorithm is developed to find initial matches, then detect the outliers and finally incrementally find more reliable feature matches under the epipolar constraint between two views from dense to sparse based on an assumption that changes of both motion and feature characteristics of one match are consistent with those of neighbors. Third we establish the reliable multi-view matches across related views by reconstructing missing matches in a neighboring triple of views and efficiently determining the states of matches between view pairs. Finally, the reliable multi-view matches thus obtained are used to automatically track all the views by using a self-calibration method. The proposed methods were tested on several sets of real images. Experimental results show that it is efficient and can track a large set of multi-view feature matches across multiple widely separated views.  相似文献   

17.
Monitoring of large sites requires coordination between multiple cameras, which in turn requires methods for relating events between distributed cameras. This paper tackles the problem of automatic external calibration of multiple cameras in an extended scene, that is, full recovery of their 3D relative positions and orientations. Because the cameras are placed far apart, brightness or proximity constraints cannot be used to match static features, so we instead apply planar geometric constraints to moving objects tracked throughout the scene. By robustly matching and fitting tracked objects to a planar model, we align the scene's ground plane across multiple views and decompose the planar alignment matrix to recover the 3D relative camera and ground plane positions. We demonstrate this technique in both a controlled lab setting where we test the effects of errors in the intrinsic camera parameters, and in an uncontrolled, outdoor setting. In the latter, we do not assume synchronized cameras and we show that enforcing geometric constraints enables us to align the tracking data in time. In spite of noise in the intrinsic camera parameters and in the image data, the system successfully transforms multiple views of the scene's ground plane to an overhead view and recovers the relative 3D camera and ground plane positions  相似文献   

18.
复杂场景下实现快速稳定地自适应跟踪是视觉领域亟需解决的课题之一, 利用目标的多特征信息进行高效融合是提升跟踪算法鲁棒性能的重要途径。本文首先基于DST(Dempster-Shafer Theory)和PCR5(Proportional Conflict Redistribution No.5)设计一种新的合并策略融合运动目标的颜色和纹理特征,其次在粒子滤波框架下建立复杂场景下的多目标自适应跟踪模型,最终实现了复杂场景下多特征信息融合的自适应视觉跟踪。实验结果及性能分析表明,该方法在不良的跟踪条件下,高冲突证据的自适应处理能力得到明显改善,有效提高了粒子的使用效率和跟踪的鲁棒性,可以较好实现复杂场景下准确、稳定地多目标跟踪。  相似文献   

19.
Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding   总被引:1,自引:0,他引:1  
We propose a novel approach to understanding activities from their partial observations monitored through multiple non-overlapping cameras separated by unknown time gaps. In our approach, each camera view is first decomposed automatically into regions based on the correlation of object dynamics across different spatial locations in all camera views. A new Cross Canonical Correlation Analysis (xCCA) is then formulated to discover and quantify the time delayed correlations of regional activities observed within and across multiple camera views in a single common reference space. We show that learning the time delayed activity correlations offers important contextual information for (i) spatial and temporal topology inference of a camera network; (ii) robust person re-identification and (iii) global activity interpretation and video temporal segmentation. Crucially, in contrast to conventional methods, our approach does not rely on either intra-camera or inter-camera object tracking; it thus can be applied to low-quality surveillance videos featured with severe inter-object occlusions. The effectiveness and robustness of our approach are demonstrated through experiments on 330 hours of videos captured from 17 cameras installed at two busy underground stations with complex and diverse scenes.  相似文献   

20.
In this paper, we propose an efficient and robust method for multiple targets tracking in cluttered scenes using multiple cues. Our approach combines the use of Monte Carlo sequential filtering for tracking and Dezert-Smarandache theory (DSmT) to integrate the information provided by the different cues. The use of DSmT provides the necessary framework to quantify and overcome the conflict that might appear between the cues due to the occlusion. Our tracking approach is tested with color and location cues on a cluttered scene where multiple targets are involved in partial or total occlusion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号