首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
融合SPA遮挡分割的多目标跟踪方法   总被引:1,自引:0,他引:1       下载免费PDF全文
复杂环境下的多目标视频跟踪是计算机视觉领域的一个难点,有效处理目标间遮挡是解决多目标跟踪问题的关键。将运动分割方法引入目标跟踪领域,提出一种融合骨架点指派(SPA)遮挡分割的多目标跟踪方法。由底层光流信息得到骨架点,并估计骨架点遮挡状态;综合使用目标外观、运动、颜色信息等高级语义信息,将骨架点指派给各个目标;最后以骨架点为核,对运动前景密集分类,得到准确的目标前景像素;在粒子滤波器跟踪框架下,使用概率外观模型进行多目标跟踪。在PETS2009数据集上的实验结果表明,文中方法能够改进现有多目标跟踪方法对目标间交互适应性较差的缺点,更好地处理动态遮挡问题。  相似文献   

2.
This paper addresses the problem of estimating a camera motion from a non-calibrated monocular camera. Compared to existing methods that rely on restrictive assumptions, we propose a method which can estimate camera motion with much less restrictions by adopting new example-based techniques compensating the lack of information. Specifically, we estimate the focal length of the camera by referring to visually similar training images with which focal lengths are associated. For one step camera estimation, we refer to stationary points (landmark points) whose depths are estimated based on RGB-D candidates. In addition to landmark points, moving objects can be also used as an information source to estimate the camera motion. Therefore, our method simultaneously estimates the camera motion for a video, and the 3D trajectories of objects in this video by using Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) particle filtering. Our method is evaluated on challenging datasets demonstrating its effectiveness and efficiency.  相似文献   

3.
《Advanced Robotics》2013,27(10):1043-1058
Many applications in computer vision are based on a single static camera observing a scene which is static except for one or more figures (people, vehicles, etc.) moving through it. In these applications it is useful to understand whether the moving figure is partially occluded by some static element of the scene. Such partial occlusions, when undetected, confuse the analysis of the figure's pose and activity. We present an algorithm that uses only the information provided by moving figures to simply and efficiently derive the position of static occluding bodies. Once these occlusions are obtained, we demonstrate successful reasoning about the occlusion status of future figures within the same scene. The occlusion positions from multiple views of the same scene are used to extract an estimate of the three-dimensional position and shape of the occlusion. Experimental results validating the method are included.  相似文献   

4.
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-baseline cameras. The proposed global occlusion estimation approach can deal with severe inter-person occlusions in one or more views by exploiting information from other views. Image features from non-occluded views are given more weight than image features from occluded views. Self occlusion is handled by local occlusion estimation. The local occlusion estimation is used to update the image likelihood function by sorting body parts as a function of distance to the cameras. The combination of the global and the local occlusion estimation leads to accurate tracking results at much lower computational costs. We evaluate the performance of our approach on a pose estimation data set in which inter-person and self occlusions are present. The results of our experiments show that our approach is able to robustly track multiple people during large movement with severe inter-person occlusions and self occlusions, whilst maintaining near real-time performance.  相似文献   

5.
多平面多视点单应矩阵间的约束   总被引:4,自引:0,他引:4  
用代数方法系统地讨论了多平面多视点下单应矩阵间的约束关系.主要结论有(A)如 果视点间摄像机的运动为纯平移运动,则1)对于所有平面关于两视点间的单应矩阵的集合,或 单个平面关于所有视点的单应矩阵的集合的秩均等于4,2)对于多平面多视点的标准单应矩阵 的集合其秩仍等于4,3)根据以上结论可推出现有文献中关于"相对单应矩阵"约束的所有结 果;(B)如果视点间摄像机的运动为一般运动,则1)对于所有平面关于两个视点间的单应矩阵 集合的秩等于4的结论仍成立,2)对于其它情况其秩不再等于4而是等于9.  相似文献   

6.
Motion segmentation using occlusions   总被引:4,自引:0,他引:4  
We examine the key role of occlusions in finding independently moving objects instantaneously in a video obtained by a moving camera with a restricted field of view. In this problem, the image motion is caused by the combined effect of camera motion (egomotion), structure (depth), and the independent motion of scene entities. For a camera with a restricted field of view undergoing a small motion between frames, there exists, in general, a set of 3D camera motions compatible with the observed flow field even if only a small amount of noise is present, leading to ambiguous 3D motion estimates. If separable sets of solutions exist, motion-based clustering can detect one category of moving objects. Even if a single inseparable set of solutions is found, we show that occlusion information can be used to find ordinal depth, which is critical in identifying a new class of moving objects. In order to find ordinal depth, occlusions must not only be known, but they must also be filled (grouped) with optical flow from neighboring regions. We present a novel algorithm for filling occlusions and deducing ordinal depth under general circumstances. Finally, we describe another category of moving objects which is detected using cardinal comparisons between structure from motion and structure estimates from another source (e.g., stereo).  相似文献   

7.
目的 光场相机通过一次成像同时记录场景的空间信息和角度信息,获取多视角图像和重聚焦图像,在深度估计中具有独特优势。遮挡是光场深度估计中的难点问题之一,现有方法没有考虑遮挡或仅仅考虑单一遮挡情况,对于多遮挡场景点,方法失效。针对遮挡问题,在多视角立体匹配框架下,提出了一种对遮挡鲁棒的光场深度估计算法。方法 首先利用数字重聚焦算法获取重聚焦图像,定义场景的遮挡类型,并构造相关性成本量。然后根据最小成本原则自适应选择最佳成本量,并求解局部深度图。最后利用马尔可夫随机场结合成本量和平滑约束,通过图割算法和加权中值滤波获取全局优化深度图,提升深度估计精度。结果 实验在HCI合成数据集和Stanford Lytro Illum实际场景数据集上展开,分别进行局部深度估计与全局深度估计实验。实验结果表明,相比其他先进方法,本文方法对遮挡场景效果更好,均方误差平均降低约26.8%。结论 本文方法能够有效处理不同遮挡情况,更好地保持深度图边缘信息,深度估计结果更准确,且时效性更好。此外,本文方法适用场景是朗伯平面场景,对于含有高光的非朗伯平面场景存在一定缺陷。  相似文献   

8.
运动遮挡边界处的运动估计是一种困难的问题,外极面图像方法将运动估计转化为转迹线的检测,人造物体的轨迹线容易通过边缘跟踪的方法获得,但对于纹理复杂的自然景物,轨迹跟踪较为困难。  相似文献   

9.
Multiple human tracking in high-density crowds   总被引:1,自引:0,他引:1  
In this paper, we introduce a fully automatic algorithm to detect and track multiple humans in high-density crowds in the presence of extreme occlusion. Typical approaches such as background modeling and body part-based pedestrian detection fail when most of the scene is in motion and most body parts of most of the pedestrians are occluded. To overcome this problem, we integrate human detection and tracking into a single framework and introduce a confirmation-by-classification method for tracking that associates detections with tracks, tracks humans through occlusions, and eliminates false positive tracks. We use a Viola and Jones AdaBoost detection cascade, a particle filter for tracking, and color histograms for appearance modeling. To further reduce false detections due to dense features and shadows, we introduce a method for estimation and utilization of a 3D head plane that reduces false positives while preserving high detection rates. The algorithm learns the head plane from observations of human heads incrementally, without any a priori extrinsic camera calibration information, and only begins to utilize the head plane once confidence in the parameter estimates is sufficiently high. In an experimental evaluation, we show that confirmation-by-classification and head plane estimation together enable the construction of an excellent pedestrian tracker for dense crowds.  相似文献   

10.
Robust detection and tracking of pedestrians in image sequences are essential for many vision applications. In this paper, we propose a method to detect and track multiple pedestrians using motion, color information and the AdaBoost algorithm. Our approach detects pedestrians in a walking pose from a single camera on a mobile or stationary system. In the case of mobile systems, ego-motion of the camera is compensated for by corresponding feature sets. The region of interest is calculated by the difference image between two consecutive images using the compensated image. Pedestrian detector is learned by boosting a number of weak classifiers which are based on Histogram of Oriented Gradient (HOG) features. Pedestrians are tracked by block matching method using color information. Our tracking system can track pedestrians with possibly partial occlusions and without misses using information stored in advance even after occlusion is ended. The proposed approach has been tested on a number of image sequences, and was shown to detect and track multiple pedestrians very well.  相似文献   

11.
In this paper, we address the problem of ego-motion estimation by fusing visual and inertial information. The hardware consists of an inertial measurement unit (IMU) and a monocular camera. The camera provides visual observations in the form of features on a horizontal plane. Exploiting the geometric constraint of features on the plane into visual and inertial data, we propose a novel closed form measurement model for this system. Our first contribution in this paper is an observability analysis of the proposed planar-based visual inertial navigation system (VINS). In particular, we prove that the system has only three unobservable states corresponding to global translations parallel to the plane, and rotation around the gravity vector. Hence, compared to general VINS, an advantage of using features on the horizontal plane is that the vertical translation along the normal of the plane becomes observable. As the second contribution, we present a state-space formulation for the pose estimation in the analyzed system and solve it via a modified unscented Kalman filter (UKF). Finally, the findings of the theoretical analysis and 6-DoF motion estimation are validated by simulations as well as using experimental data.  相似文献   

12.
This paper is about detecting bipedal motion in video sequences by using point trajectories in a framework of classification. Given a number of point trajectories, we find a subset of points which are arising from feet in bipedal motion by analysing their spatio-temporal correlation in a pairwise fashion. To this end, we introduce probabilistic trajectories as our new features which associate each point over a sufficiently long time period in the presence of noise. They are extracted from directed acyclic graphs whose edges represent temporal point correspondences and are weighted with their matching probability in terms of appearance and location. The benefit of the new representation is that it practically tolerates inherent ambiguity for example due to occlusions. We then learn the correlation between the motion of two feet using the probabilistic trajectories in a decision forest classifier. The effectiveness of the algorithm is demonstrated in experiments on image sequences captured with a static camera, and extensions to deal with a moving camera are discussed.  相似文献   

13.
Shape and motion from image streams under orthography: a factorization method   总被引:56,自引:18,他引:56  
Inferring scene geometry and camera motion from a stream of images is possible in principle, but is an ill-conditioned problem when the objects are distant with respect to their size. We have developed a factorization method that can overcome this difficulty by recovering shape and motion under orthography without computing depth as an intermediate step.An image stream can be represented by the 2F×P measurement matrix of the image coordinates of P points tracked through F frames. We show that under orthographic projection this matrix is of rank 3.Based on this observation, the factorization method uses the singular-value decomposition technique to factor the measurement matrix into two matrices which represent object shape and camera rotation respectively. Two of the three translation components are computed in a preprocessing stage. The method can also handle and obtain a full solution from a partially filled-in measurement matrix that may result from occlusions or tracking failures.The method gives accurate results, and does not introduce smoothing in either shape or motion. We demonstrate this with a series of experiments on laboratory and outdoor image streams, with and without occlusions.  相似文献   

14.
This paper presents an approach to shape and motion estimation that integrates heterogeneous knowledge into a unique model-based framework. We describe the observed scenes in terms of structured geometric elements (points, line segments, rectangles, 3D corners) sharing explicitly Euclidean relationships (orthogonality, parallelism, colinearity, coplanarity). Camera trajectories are represented with adaptative models which account for the regularity of usual camera motions. Two different strategies of automatic model building lead us to reduced models for shape and motion estimation with a minimal number of parameters. These models increase the robustness to noise and occlusions, improve the reconstruction, and provide a high-level representation of the observed scene. The parameters are optimally computed within a sequential Bayesian estimation procedure that gives accurate and reliable results on synthetic and real video imagery.  相似文献   

15.
目的 光场相机可以通过单次曝光同时从多个视角采样单个场景,在深度估计领域具有独特优势。消除遮挡的影响是光场深度估计的难点之一。现有方法基于2D场景模型检测各视角遮挡状态,但是遮挡取决于所采样场景的3D立体模型,仅利用2D模型无法精确检测,不精确的遮挡检测结果将降低后续深度估计精度。针对这一问题,提出了3D遮挡模型引导的光场图像深度获取方法。方法 向2D模型中的不同物体之间添加前后景关系和深度差信息,得到场景的立体模型,之后在立体模型中根据光线的传输路径推断所有视角的遮挡情况并记录在遮挡图(occlusion map)中。在遮挡图引导下,在遮挡和非遮挡区域分别使用不同成本量进行深度估计。在遮挡区域,通过遮挡图屏蔽被遮挡视角,基于剩余视角的成像一致性计算深度;在非遮挡区域,根据该区域深度连续特性设计了新型离焦网格匹配成本量,相比传统成本量,该成本量能够感知更广范围的色彩纹理,以此估计更平滑的深度图。为了进一步提升深度估计的精度,根据遮挡检测和深度估计的依赖关系设计了基于最大期望(exception maximization,EM)算法的联合优化框架,在该框架下,遮挡图和深度图通过互相引导的方式相继提升彼此精度。结果 实验结果表明,本文方法在大部分实验场景中,对于单遮挡、多遮挡和低对比度遮挡在遮挡检测和深度估计方面均能达到最优结果。均方误差(mean square error,MSE)对比次优结果平均降低约19.75%。结论 针对遮挡场景的深度估计,通过理论分析和实验验证,表明3D遮挡模型相比传统2D遮挡模型在遮挡检测方面具有一定优越性,本文方法更适用于复杂遮挡场景的深度估计。  相似文献   

16.
The metric reconstruction of a non-rigid object viewed by a generic camera poses new challenges since current approaches for Structure from Motion assume the rigidity constraint of a shape as an essential condition. In this work, we focus on the estimation of the 3-D Euclidean shape and motion of a non-rigid shape observed by a perspective camera. In such case deformation and perspective effects are difficult to decouple – the parametrization of the 3-D non-rigid body may mistakenly account for the perspective distortion. Our method relies on the fact that it is often a reasonable assumption that some of the points on the object’s surface are deforming throughout the sequence while others remain rigid. Thus, relying on the rigidity constraints of a subset of rigid points, we estimate the perspective to metric upgrade transformation. First, we use an automatic segmentation algorithm to identify the set of rigid points. These are then used to estimate the internal camera calibration parameters and the overall rigid motion. Finally, we formulate the problem of non-rigid shape and motion estimation as a non-linear optimization where the objective function to be minimized is the image reprojection error. The prior information that some of the points in the object are rigid can also be added as a constraint to the non-linear minimization scheme in order to avoid ambiguous configurations. We perform experiments on different synthetic and real data sets which show that even when using a minimal set of rigid points and when varying the intrinsic camera parameters it is possible to obtain reliable metric information.  相似文献   

17.
Constructing a Multivalued Representation for View Synthesis   总被引:1,自引:1,他引:1  
A fundamental problem in computer vision and graphics is that of arbitrary view synthesis for static 3-D scenes, whereby a user-specified viewpoint of the given scene may be created directly from a representation. We propose a novel compact representation for this purpose called the multivalued representation (MVR). Starting with an image sequence captured by a moving camera undergoing either unknown planar translation or orbital motion, a MVR is derived for each preselected reference frame, and may then be used to synthesize arbitrary views of the scene. The representation itself is comprised of multiple depth and intensity levels in which the k-th level consists of points occluded by exactly k surfaces. To build a MVR with respect to a particular reference frame, dense depth maps are first computed for all the neighboring frames of the reference frame. The depth maps are then combined together into a single map, where points are organized by occlusions rather than by coherent affine motions. This grouping facilitates an automatic process to determine the number of levels and helps to reduce the artifacts caused by occlusions in the scene. An iterative multiframe algorithm is presented for dense depth estimation that both handles low-contrast regions and produces piecewise smooth depth maps. Reconstructed views as well as arbitrary flyarounds of real scenes are presented to demonstrate the effectiveness of the approach.  相似文献   

18.
Efficient visibility computation is a prominent requirement when designing automated camera control techniques for dynamic 3D environments; computer games, interactive storytelling or 3D media applications all need to track 3D entities while ensuring their visibility and delivering a smooth cinematic experience. Addressing this problem requires to sample a large set of potential camera positions and estimate visibility for each of them, which in practice is intractable despite the efficiency of ray-casting techniques on recent platforms. In this work, we introduce a novel GPU-rendering technique to efficiently compute occlusions of tracked targets in Toric Space coordinates – a parametric space designed for cinematic camera control. We then rely on this occlusion evaluation to derive an anticipation map predicting occlusions for a continuous set of cameras over a user-defined time window. We finally design a camera motion strategy exploiting this anticipation map to minimize the occlusions of tracked entities over time. The key features of our approach are demonstrated through comparison with traditionally used ray-casting on benchmark scenes, and through an integration in multiple game-like 3D scenes with heavy, sparse and dense occluders.  相似文献   

19.
目前利用深度学习进行多视图深度估计的方法可以根据卷积类型可以大致分为两类.其中,基于2D卷积网络的模型预测计算速度快,但预测精度较低;基于3D卷积网络的模型预测精度高,却存在高硬件消耗.同时,多视图中相机外部参数的变化使得模型无法在物体边缘、遮挡或纹理较弱区域生成高精度预测结果.针对上述问题,提出了基于3D卷积的语义导...  相似文献   

20.
The view-independent visualization of 3D scenes is most often based on rendering accurate 3D models or utilizes image-based rendering techniques. To compute the 3D structure of a scene from a moving vision sensor or to use image-based rendering approaches, we need to be able to estimate the motion of the sensor from the recorded image information with high accuracy, a problem that has been well-studied. In this work, we investigate the relationship between camera design and our ability to perform accurate 3D photography, by examining the influence of camera design on the estimation of the motion and structure of a scene from video data. By relating the differential structure of the time varying plenoptic function to different known and new camera designs, we can establish a hierarchy of cameras based upon the stability and complexity of the computations necessary to estimate structure and motion. At the low end of this hierarchy is the standard planar pinhole camera for which the structure from motion problem is non-linear and ill-posed. At the high end is a camera, which we call the full field of view polydioptric camera, for which the motion estimation problem can be solved independently of the depth of the scene which leads to fast and robust algorithms for 3D Photography. In between are multiple view cameras with a large field of view which we have built, as well as omni-directional sensors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号