首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multi-view approach to solving this problem. In our approach we neither detect nor track objects from any single camera or camera pair; rather evidence is gathered from all the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multi-view approaches that require fully calibrated views our approach is purely image-based and uses only 2D constructs. To this end we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views, to resolve occlusions and localize people on a reference scene plane. For greater robustness this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis, are demonstrated in challenging multi-view, crowded scenes.  相似文献   

2.
We present a novel framework for real-time multi-perspective rendering. While most existing approaches are based on ray-tracing, we present an alternative approach by emulating multi-perspective rasterization on the classical perspective graphics pipeline. To render a general multi-perspective camera, we first decompose the camera into piecewise linear primitive cameras called the general linear cameras or GLCs. We derive the closed-form projection equations for GLCs and show how to rasterize triangles onto GLCs via a two-pass rendering algorithm. In the first pass, we compute the GLC projection coefficients of each scene triangle using a vertex shader. The linear raster on the graphics hardware then interpolates these coefficients at each pixel. Finally, we use these interpolated coefficients to compute the projected pixel coordinates using a fragment shader. In the second pass, we move the pixels to their actual projected positions. To avoid holes, we treat neighboring pixels as triangles and re-render them onto the GLC image plane. We demonstrate our real-time multi-perspective rendering framework in a wide range of applications including synthesizing panoramic and omnidirectional views, rendering reflections on curved mirrors, and creating multi-perspective faux animations. Compared with the GPU-based ray tracing methods, our rasterization approach scales better with scene complexity and it can render scenes with a large number of triangles at interactive frame rates.  相似文献   

3.
This paper proposes a method to locate and track people by combining evidence from multiple cameras using the homography constraint. The proposed method use foreground pixels from simple background subtraction to compute evidence of the location of people on a reference ground plane. The algorithm computes the amount of support that basically corresponds to the “foreground mass” above each pixel. Therefore, pixels that correspond to ground points have more support. The support is normalized to compensate for perspective effects and accumulated on the reference plane for all camera views. The detection of people on the reference plane becomes a search for regions of local maxima in the accumulator. Many false positives are filtered by checking the visibility consistency of the detected candidates against all camera views. The remaining candidates are tracked using Kalman filters and appearance models. Experimental results using challenging data from PETS’06 show good performance of the method in the presence of severe occlusion. Ground truth data also confirms the robustness of the method.  相似文献   

4.
An approach based on fuzzy logic for matching both articulated and non-articulated objects across multiple non-overlapping field of views (FoVs) from multiple cameras is proposed. We call it fuzzy logic matching algorithm (FLMA). The approach uses the information of object motion, shape and camera topology for matching objects across camera views. The motion and shape information of targets are obtained by tracking them using a combination of ConDensation and CAMShift tracking algorithms. The information of camera topology is obtained and used by calculating the projective transformation of each view with the common ground plane. The algorithm is suitable for tracking non-rigid objects with both linear and non-linear motion. We show videos of tracking objects across multiple cameras based on FLMA. From our experiments, the system is able to correctly match the targets across views with a high accuracy.  相似文献   

5.
Monitoring of large sites requires coordination between multiple cameras, which in turn requires methods for relating events between distributed cameras. This paper tackles the problem of automatic external calibration of multiple cameras in an extended scene, that is, full recovery of their 3D relative positions and orientations. Because the cameras are placed far apart, brightness or proximity constraints cannot be used to match static features, so we instead apply planar geometric constraints to moving objects tracked throughout the scene. By robustly matching and fitting tracked objects to a planar model, we align the scene's ground plane across multiple views and decompose the planar alignment matrix to recover the 3D relative camera and ground plane positions. We demonstrate this technique in both a controlled lab setting where we test the effects of errors in the intrinsic camera parameters, and in an uncontrolled, outdoor setting. In the latter, we do not assume synchronized cameras and we show that enforcing geometric constraints enables us to align the tracking data in time. In spite of noise in the intrinsic camera parameters and in the image data, the system successfully transforms multiple views of the scene's ground plane to an overhead view and recovers the relative 3D camera and ground plane positions  相似文献   

6.
Tracking in a Dense Crowd Using Multiple Cameras   总被引:1,自引:0,他引:1  
Tracking people in a dense crowd is a challenging problem for a single camera tracker due to occlusions and extensive motion that make human segmentation difficult. In this paper we suggest a method for simultaneously tracking all the people in a densely crowded scene using a set of cameras with overlapping fields of view. To overcome occlusions, the cameras are placed at a high elevation and only people’s heads are tracked. Head detection is still difficult since each foreground region may consist of multiple subjects. By combining data from several views, height information is extracted and used for head segmentation. The head tops, which are regarded as 2D patches at various heights, are detected by applying intensity correlation to aligned frames from the different cameras. The detected head tops are then tracked using common assumptions on motion direction and velocity. The method was tested on sequences in indoor and outdoor environments under challenging illumination conditions. It was successful in tracking up to 21 people walking in a small area (2.5 people per m2), in spite of severe and persistent occlusions.  相似文献   

7.
This paper addresses the problem of localizing people in low and high density crowds with a network of heterogeneous cameras. The problem is recast as a linear inverse problem. It relies on deducing the discretized occupancy vector of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e., made of few non-zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. The proposed framework is (i) generic to any scene of people, i.e., people are located in low and high density crowds, (ii) scalable to any number of cameras and already working with a single camera, (iii) unconstrained by the scene surface to be monitored, and (iv) versatile with respect to the camera??s geometry, e.g., planar or omnidirectional. Qualitative and quantitative results are presented on the APIDIS and the PETS 2009 Benchmark datasets. The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization techniques.  相似文献   

8.
Pan–tilt–zoom (PTZ) cameras are well suited for object identification and recognition in far-field scenes. However, the effective use of PTZ cameras is complicated by the fact that a continuous online camera calibration is needed and the absolute pan, tilt and zoom values provided by the camera actuators cannot be used because they are not synchronized with the video stream. So, accurate calibration must be directly extracted from the visual content of the frames. Moreover, the large and abrupt scale changes, the scene background changes due to the camera operation and the need of camera motion compensation make target tracking with these cameras extremely challenging. In this paper, we present a solution that provides continuous online calibration of PTZ cameras which is robust to rapid camera motion, changes of the environment due to varying illumination or moving objects. The approach also scales beyond thousands of scene landmarks extracted with the SURF keypoint detector. The method directly derives the relationship between the position of a target in the ground plane and the corresponding scale and position in the image and allows real-time tracking of multiple targets with high and stable degree of accuracy even at far distances and any zoom level.  相似文献   

9.
Automated virtual camera control has been widely used in animation and interactive virtual environments. We have developed a multiple sparse camera based free view video system prototype that allows users to control the position and orientation of a virtual camera, enabling the observation of a real scene in three dimensions (3D) from any desired viewpoint. Automatic camera control can be activated to follow selected objects by the user. Our method combines a simple geometric model of the scene composed of planes (virtual environment), augmented with visual information from the cameras and pre-computed tracking information of moving targets to generate novel perspective corrected 3D views of the virtual camera and moving objects. To achieve real-time rendering performance, view-dependent textured mapped billboards are used to render the moving objects at their correct locations and foreground masks are used to remove the moving objects from the projected video streams. The current prototype runs on a PC with a common graphics card and can generate virtual 2D views from three cameras of resolution 768×576 with several moving objects at about 11 fps.  相似文献   

10.
对固定镜头下视频序列中运动人体的检测和跟踪方法进行研究,利用灰度图像差分双向投影信息检测人体目标,提出一种基于统计运动区域几何特征固定比例的分割算法,使用最近邻匹配方法对人体进行跟踪。完整地实现了一个有效的实时人群计数系统。大量室内和室外场景实验结果表明,该算法具有很好的实时性(每秒处理25帧~30帧且可并行处理4路视频)、对光照变化的鲁棒性以及对稀疏人群检测精度高等特点。  相似文献   

11.

Detection-based pedestrian counting methods produce results of considerable accuracy in non-crowded scenes. However, the detection-based approach is dependent on the camera viewpoint. On the other hand, map-based pedestrian counting methods are performed by measuring features that do not require separate detection of each pedestrian in the scene. Thus, these methods are more effective especially in high crowd density. In this paper, we propose a hybrid map-based model that is a new directional pedestrian counting model. Our proposed model is composed of direction estimation module with classified foreground motion vectors, and pedestrian counting module with principal component analysis. Our contributions in this paper have two aspects. First, we present a directional moving pedestrian counting system that does not depend on object detection or tracking. Second, the number and major directions of pedestrian movements can be detected, by classifying foreground motion vectors. This representation is more powerful than simple features in terms of handling noise, and can count the moving pedestrians in images more accurately.

  相似文献   

12.
From depth sensors to thermal cameras, the increased availability of camera sensors beyond the visible spectrum has created many exciting applications. Most of these applications require combining information from these hyperspectral cameras with a regular RGB camera. Information fusion from multiple heterogeneous cameras can be a very complex problem. They can be fused at different levels from pixel to voxel or even semantic objects, with large variations in accuracy, communication, and computation costs. In this paper, we propose a system for robust segmentation of human figures in video sequences by fusing visible-light and thermal imageries. Our system focuses on the geometric transformation between visual blobs corresponding to human figures observed at both cameras. This approach provides the most reliable fusion at the expense of high computation and communication costs. To reduce the computational complexity of the geometric fusion, an efficient calibration procedure is first applied to rectify the two camera views without the complex procedure of estimating the intrinsic parameters of the cameras. To geometrically register different blobs at the pixel level, a blob-to-blob homography in the rectified domain is then computed in real-time by estimating the disparity for each blob-pair. Precise segmentation is finally achieved using a two-tier tracking algorithm and a unified background model. Our experimental results show that our proposed system provides significant improvements over existing schemes under various conditions.  相似文献   

13.
This paper presents an efficient image-based approach to navigate a scene based on only three wide-baseline uncalibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images, an accurate trifocal plane is extracted from the trifocal tensor of these three images. Next, based on a small number of feature marks using a friendly GUI, the correct dense disparity maps are obtained by using our trinocular-stereo algorithm. Employing the barycentric warping scheme with the computed disparity, we can generate an arbitrary novel view within a triangle spanned by three camera centers. Furthermore, after self-calibration of the cameras, 3D objects can be correctly augmented into the virtual environment synthesized by the tri-view morphing algorithm. Three applications of the tri-view morphing algorithm are demonstrated. The first one is 4D video synthesis, which can be used to fill in the gap between a few sparsely located video cameras to synthetically generate a video from a virtual moving camera. This synthetic camera can be used to view the dynamic scene from a novel view instead of the original static camera views. The second application is multiple view morphing, where we can seamlessly fly through the scene over a 2D space constructed by more than three cameras. The last one is dynamic scene synthesis using three still images, where several rigid objects may move in any orientation or direction. After segmenting three reference frames into several layers, the novel views in the dynamic scene can be generated by applying our algorithm. Finally, the experiments are presented to illustrate that a series of photo-realistic virtual views can be generated to fly through a virtual environment covered by several static cameras.  相似文献   

14.
Reliable and real-time crowd counting is one of the most important tasks in intelligent visual surveillance systems. Most previous works only count passing people based on color information. Owing to the restrictions of color information influences themselves for multimedia processing, they will be affected inevitably by the unpredictable complex environments (e.g. illumination, occlusion, and shadow). To overcome this bottleneck, we propose a new algorithm by multimodal joint information processing for crowd counting. In our method, we use color and depth information together with a ordinary depth camera (e.g. Microsoft Kinect). Specifically, we first detect each head of the passing or still person in the surveillance region with adaptive modulation ability to varying scenes on depth information. Then, we track and count each detected head on color information. The characteristic advantage of our algorithm is that it is scene adaptive, which means the algorithm can be applied into all kinds of different scenes directly without additional conditions. Based on the proposed approach, we have built a practical system for robust and fast crowd counting facing complicated scenes. Extensive experimental results show the effectiveness of our proposed method.  相似文献   

15.
When occlusion is minimal, a single camera is generally sufficient to detect and track objects. However, when the density of objects is high, the resulting occlusion and lack of visibility suggests the use of multiple cameras and collaboration between them so that an object is detected using information available from all the cameras in the scene.In this paper, we present a system that is capable of segmenting, detecting and tracking multiple people in a cluttered scene using multiple synchronized surveillance cameras located far from each other. The system is fully automatic, and takes decisions about object detection and tracking using evidence collected from many pairs of cameras. Innovations that help us tackle the problem include a region-based stereo algorithm capable of finding 3D points inside an object knowing only the projections of the object (as a whole) in two views, a segmentation algorithm using bayesian classification and the use of occlusion analysis to combine evidence from different camera pairs.The system has been tested using different densities of people in the scene. This helps us determine the number of cameras required for a particular density of people. Experiments have also been conducted to verify and quantify the efficacy of the occlusion analysis scheme.  相似文献   

16.
监控系统中的多摄像机协同   总被引:8,自引:0,他引:8  
描述了一个用于室内场合对多个目标进行跟踪的分布式监控系统.该系统由多个廉价的固定镜头的摄像机构成,具有多个摄像机处理模块和一个中央模块用于协调摄像机间的跟踪任务.由于每个运动目标有可能被多个摄像机同时跟踪,因此如何选择最合适的摄像机对某一目标跟踪,特别是在系统资源紧张时,成为一个问题.提出的新算法能根据目标与摄像机之间的距离并考虑到遮挡的情况,把目标分配给相应的摄像机,因此在遮挡出现时,系统能把遮挡的目标分配给能看见目标并距离最近的那个摄像机.实验表明该系统能协调好多个摄像机进行目标跟踪,并处理好遮挡问题.  相似文献   

17.
论文中提出了将计算机双目视觉技术应用于人群密度估计方法:首先根据目标在两个摄像平面上的视差的平均值,计算出该位置的矫正参数,然后根据目标在不同位置的矫正参数,拟合出矫正函数。以前景像素、前景边缘像素等图像特征作为人群密度估计的特征,应用矫正参数进行矫正。实验证明,相对于现有方法,该方法可以消除射影畸形的影响,大大提高特征的有效性,从而提高人群密度估计的准确性。  相似文献   

18.
This paper proposes a new method for self-calibrating a set of stationary non-rotating zooming cameras. This is a realistic configuration, usually encountered in surveillance systems, in which each zooming camera is physically attached to a static structure (wall, ceiling, robot, or tripod). In particular, a linear, yet effective method to recover the affine structure of the observed scene from two or more such stationary zooming cameras is presented. The proposed method solely relies on point correspondences across images and no knowledge about the scene is required. Our method exploits the mostly translational displacement of the so-called principal plane of each zooming camera to estimate the location of the plane at infinity. The principal plane of a camera, at any given setting of its zoom, is encoded in its corresponding perspective projection matrix from which it can be easily extracted. As a displacement of the principal plane of a camera under the effect of zooming allows the identification of a pair of parallel planes, each zooming camera can be used to locate a line on the plane at infinity. Hence, two or more such zooming cameras in general positions allow the obtainment of an estimate of the plane at infinity making it possible, under the assumption of zero-skew and/or known aspect ratio, to linearly calculate the camera's parameters. Finally, the parameters of the camera and the coordinates of the plane at infinity are refined through a nonlinear least-squares optimization procedure. The results of our extensive experiments using both simulated and real data are also reported in this paper.  相似文献   

19.
针对双摄像机下存在人体遮挡情况时的跟踪问题,提出了利用人体3维位置信息来实现跟踪的方法。该方法首先对其中一个摄像机视频图像中的人体像素抽样,接着在其他摄像机视频图像中找出抽样像素的匹配点,计算出每一对匹配点在世界坐标系中所对应的3维点,然后依据3维位置信息将3维点聚类,找出每一个聚类区域中的3维点所对应的图像中的一组像素点,并对其构建高斯平滑直方图模型。在此基础上,依据直方图模型将互相遮挡的人体分割开来,最后根据求取的人体像素点的匹配关系来确定不同摄像机中同一个人的对应关系。实验结果表明,该方法能有效实现遮挡情况下的人体跟踪。  相似文献   

20.
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-baseline cameras. The proposed global occlusion estimation approach can deal with severe inter-person occlusions in one or more views by exploiting information from other views. Image features from non-occluded views are given more weight than image features from occluded views. Self occlusion is handled by local occlusion estimation. The local occlusion estimation is used to update the image likelihood function by sorting body parts as a function of distance to the cameras. The combination of the global and the local occlusion estimation leads to accurate tracking results at much lower computational costs. We evaluate the performance of our approach on a pose estimation data set in which inter-person and self occlusions are present. The results of our experiments show that our approach is able to robustly track multiple people during large movement with severe inter-person occlusions and self occlusions, whilst maintaining near real-time performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号