首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this paper, we develop a two-dimensional articulated body tracking algorithm based on the particle filtering method using partitioned sampling and model constraints. Particle filtering has been proven to be an effective approach in the object tracking field, especially when dealing with single-object tracking. However, when applying it to human body tracking, we have to face a “particle-explosion” problem. We then introduce partitioned sampling, applied to a new articulated human body model, to solve this problem. Furthermore, we develop a propagating method originated from belief propagation (BP), which enables a set of particles to carry several constraints. The proposed algorithm is then applied to tracking articulated body motion in several testing scenarios. The experimental results indicate that the proposed algorithm is effective and reliable for 2D articulated pose tracking.  相似文献   

2.
We present a novel representation and rendering method for free‐viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi‐automatic, data‐driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free‐viewpoint video generated from broadcast footage of challenging, uncontrolled environments.  相似文献   

3.
Uncalibrated Motion Capture Exploiting Articulated Structure Constraints   总被引:2,自引:0,他引:2  
We present an algorithm for 3D reconstruction of dynamic articulated structures, such as humans, from uncalibrated multiple views. The reconstruction exploits constraints associated with a dynamic articulated structure, specifically the conservation over time of length between rotational joints. These constraints admit reconstruction of metric structure from at least two different images in each of two uncalibrated parallel projection cameras. As a by product, the calibration of the cameras can also be computed. The algorithm is based on a stratified approach, starting with affine reconstruction from factorization, followed by rectification to metric structure using the articulated structure constraints. The exploitation of these specific constraints admits reconstruction and self-calibration with fewer feature points and views compared to standard self-calibration. The method is extended to pairs of cameras that are zooming, where calibration of the cameras allows compensation for the changing scale factor in a scaled orthographic camera. Results are presented in the form of stick figures and animated 3D reconstructions using pairs of sequences from broadcast television. The technique shows promise as a means of creating 3D animations of dynamic activities such as sports events.  相似文献   

4.
应用多个正交视角轮流逼近3维目标的坐标   总被引:1,自引:0,他引:1       下载免费PDF全文
为实现目标的快速、精确3维定位和跟踪,提出一种正交摄像机视频定位系统及其坐标轮流逼近的迭代定位算法。系统中平面摄像机光轴按照正交方式布置,拍摄方向均指向原点。不同于现有的大部分计算机视觉方法,本算法中没有影响定位效率和精度问题的图像配准操作。证明了迭代算法的收敛性。数值验证和实际试验表明,本算法计算简单、误差稳定性好,收敛快,因此具有良好的应用潜力。  相似文献   

5.
This paper presents a framework for visual scanning and target tracking with a set of independent pan-tilt cameras. The approach is systematic and based on Model Predictive Control (MPC), and was inspired by our understanding of the chameleon visual system. We make use of the most advanced results in the MPC theory in order to design the scanning and tracking controllers. The scanning algorithm combines information about the environment and a model for the motion of the target to perform optimal scanning based on stochastic MPC. The target tracking controller is a switched control combining smooth pursuit and saccades. Min-Max and minimum-time MPC theory is used for the design of the tracking control laws. We make use of the observed chameleon’s behavior to guide the scanning and the tracking controller design procedures, the way they are combined together and their tuning. Finally, simulative and experimental validation of the approach on a robotic chameleon head composed of two independent Pan-Tilt cameras is presented.
Ehud RivlinEmail:
  相似文献   

6.
This article proposes a statistical approach for fast articulated 3D body tracking, similar to the loose-limbed model, but using the factor graph representation and a fast estimation algorithm. A fast Nonparametric Belief Propagation on factor graphs is used to estimate the current marginal for each limb. All belief propagation messages are represented as sums of weighted samples. The resulting algorithm corresponds to a set of particle filters, one for each limb, where an extra step recomputes the weight of each sample by taking into account the links between limbs. Applied to upper body tracking with stereo and colour images, the resulting algorithm estimates the body pose in quasi real-time (10 Hz). Results on sequences illustrate the effectiveness of this approach.  相似文献   

7.
This paper demonstrates innovative techniques for estimating the trajectory of a soccer ball from multiple fixed cameras. Since the ball is nearly always moving and frequently occluded, its size and shape appearance varies over time and between cameras. Knowledge about the soccer domain is utilized and expressed in terms of field, object and motion models to distinguish the ball from other movements in the tracking and matching processes. Using ground plane velocity, longevity, normalized size and color features, each of the tracks obtained from a Kalman filter is assigned with a likelihood measure that represents the ball. This measure is further refined by reasoning through occlusions and back-tracking in the track history. This can be demonstrated to improve the accuracy and continuity of the results. Finally, a simple 3D trajectory model is presented, and the estimated 3D ball positions are fed back to constrain the 2D processing for more efficient and robust detection and tracking. Experimental results with quantitative evaluations from several long sequences are reported.  相似文献   

8.
Object Level Grouping for Video Shots   总被引:1,自引:0,他引:1  
We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object’s possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions. In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy. We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video. We illustrate the method on the feature film “Groundhog Day.” Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).  相似文献   

9.
Multiple human tracking in high-density crowds   总被引:1,自引:0,他引:1  
In this paper, we introduce a fully automatic algorithm to detect and track multiple humans in high-density crowds in the presence of extreme occlusion. Typical approaches such as background modeling and body part-based pedestrian detection fail when most of the scene is in motion and most body parts of most of the pedestrians are occluded. To overcome this problem, we integrate human detection and tracking into a single framework and introduce a confirmation-by-classification method for tracking that associates detections with tracks, tracks humans through occlusions, and eliminates false positive tracks. We use a Viola and Jones AdaBoost detection cascade, a particle filter for tracking, and color histograms for appearance modeling. To further reduce false detections due to dense features and shadows, we introduce a method for estimation and utilization of a 3D head plane that reduces false positives while preserving high detection rates. The algorithm learns the head plane from observations of human heads incrementally, without any a priori extrinsic camera calibration information, and only begins to utilize the head plane once confidence in the parameter estimates is sufficiently high. In an experimental evaluation, we show that confirmation-by-classification and head plane estimation together enable the construction of an excellent pedestrian tracker for dense crowds.  相似文献   

10.
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-baseline cameras. The proposed global occlusion estimation approach can deal with severe inter-person occlusions in one or more views by exploiting information from other views. Image features from non-occluded views are given more weight than image features from occluded views. Self occlusion is handled by local occlusion estimation. The local occlusion estimation is used to update the image likelihood function by sorting body parts as a function of distance to the cameras. The combination of the global and the local occlusion estimation leads to accurate tracking results at much lower computational costs. We evaluate the performance of our approach on a pose estimation data set in which inter-person and self occlusions are present. The results of our experiments show that our approach is able to robustly track multiple people during large movement with severe inter-person occlusions and self occlusions, whilst maintaining near real-time performance.  相似文献   

11.
In this work we propose algorithms to learn the locations of static occlusions and reason about both static and dynamic occlusion scenarios in multi-camera scenes for 3D surveillance (e.g., reconstruction, tracking). We will show that this leads to a computer system which is able to more effectively track (follow) objects in video when they are obstructed from some of the views. Because of the nature of the application area, our algorithm will be under the constraints of using few cameras (no more than 3) that are configured wide-baseline. Our algorithm consists of a learning phase, where a 3D probabilistic model of occlusions is estimated per-voxel, per-view over time via an iterative framework. In this framework, at each frame the visual hull of each foreground object (person) is computed via a Markov Random Field that integrates the occlusion model. The model is then updated at each frame using this solution, providing an iterative process that can accurately estimate the occlusion model over time and overcome the few-camera constraint. We demonstrate the application of such a model to a number of areas, including visual hull reconstruction, the reconstruction of the occluding structures themselves, and 3D tracking.  相似文献   

12.
We propose a model-based tracking method for articulated objects in monocular video sequences under varying illumination conditions. The tracking method uses estimates of optical flows constructed by projecting model textures into the camera images and comparing the projected textures with the recorded information. An articulated body is modelled in terms of 3D primitives, each possessing a specified texture on its surface. An important step in model-based tracking of 3D objects is the estimation of the pose of the object during the tracking process. The optimal pose is estimated by minimizing errors between the computed optical flow and the projected 2D velocities of the model textures. This estimation uses a least-squares method with kinematic constraints for the articulated object and a perspective camera model. We test our framework with an articulated robot and show results.  相似文献   

13.
Tracking in a Dense Crowd Using Multiple Cameras   总被引:1,自引:0,他引:1  
Tracking people in a dense crowd is a challenging problem for a single camera tracker due to occlusions and extensive motion that make human segmentation difficult. In this paper we suggest a method for simultaneously tracking all the people in a densely crowded scene using a set of cameras with overlapping fields of view. To overcome occlusions, the cameras are placed at a high elevation and only people’s heads are tracked. Head detection is still difficult since each foreground region may consist of multiple subjects. By combining data from several views, height information is extracted and used for head segmentation. The head tops, which are regarded as 2D patches at various heights, are detected by applying intensity correlation to aligned frames from the different cameras. The detected head tops are then tracked using common assumptions on motion direction and velocity. The method was tested on sequences in indoor and outdoor environments under challenging illumination conditions. It was successful in tracking up to 21 people walking in a small area (2.5 people per m2), in spite of severe and persistent occlusions.  相似文献   

14.
We present a complete end-to-end framework to detect and exploit entry and exit regions in video using behavioral models of object trajectories. Using easily collected “weak” tracking data (short and frequently broken tracks) as input, we construct a set of entity tracks to provide more reliable entry and exit observations. These observations are then clustered to produce a set of potential entry and exit regions within the scene, and a behavior-based reliability metric is used to score each region and select the final zones. We also present an extension of our fixed-view approach to detect entry and exit regions within the entire viewspace of a pan–tilt–zoom camera. We additionally provide methods employing the regions to learn scene occlusions and causal relationships from entry–exit pairs along with exploitation algorithms (e.g., anomaly detection). Qualitative and quantitative experiments are presented using multiple outdoor surveillance cameras and demonstrate the reliability and usefulness of our approach.  相似文献   

15.
This paper develops a concept of Panoramic Appearance Map (PAM) for performing person reidentification in a multi-camera setup. Each person is tracked in multiple cameras and the position on the floor plan is determined using triangulation. Using the geometry of the cameras and the person location, a panoramic map centered at the person’s location is created with the horizontal axis representing the azimuth angle and vertical axis representing the height. Each pixel in the map image gets color information from the cameras which can observe it. The maps between different tracks are compared using a distance measure based on weighted SSD in order to select the best match. Temporalintegration by registering multiple maps over the tracking period improves the matching performance. Experimental results of matching persons between two camera sets show the effectiveness of the approach. This work has been sponsored by the Technical Support Working Group (TSWG) of US Department of Defence (DoD).  相似文献   

16.
《Real》1997,3(6):415-432
Real-time motion capture plays a very important role in various applications, such as 3D interface for virtual reality systems, digital puppetry, and real-time character animation. In this paper we challenge the problem of estimating and recognizing the motion of articulated objects using theoptical motion capturetechnique. In addition, we present an effective method to control the articulated human figure in realtime.The heart of this problem is the estimation of 3D motion and posture of an articulated, volumetric object using feature points from a sequence of multiple perspective views. Under some moderate assumptions such as smooth motion and known initial posture, we develop a model-based technique for the recovery of the 3D location and motion of a rigid object using a variation of Kalman filter. The posture of the 3D volumatric model is updated by the 2D image flow of the feature points for all views. Two novel concepts – the hierarchical Kalman filter (KHF) and the adaptive hierarchical structure (AHS) incorporating the kinematic properties of the articulated object – are proposed to extend our formulation for the rigid object to the articulated one. Our formulation also allows us to avoid two classic problems in 3D tracking: the multi-view correspondence problem, and the occlusion problem. By adding more cameras and placing them appropriately, our approach can deal with the motion of the object in a very wide area. Furthermore, multiple objects can be handled by managing multiple AHSs and processing multiple HKFs.We show the validity of our approach using the synthetic data acquired simultaneously from the multiple virtual camera in a virtual environment (VE) and real data derived from a moving light display with walking motion. The results confirm that the model-based algorithm works well on the tracking of multiple rigid objects.  相似文献   

17.
基于视频的三维人体运动跟踪   总被引:6,自引:2,他引:4  
提出一种结合多种图像特征,在多摄像机环境下跟踪人体运动的方法.通过定义人体模型、摄像机投影模型以及相似性度量模型来得到优化框架下的目标函数,并使用牛顿-高斯优化算法对其进行求解.模拟数据和实际数据的实验表明,文中方法比仅仅使用灰度特征,跟踪结果得到了改善,对比实验结果也优于基于概率算法的退火粒子滤波.  相似文献   

18.
We present a method for automatically estimating the motion of an articulated object filmed by two or more fixed cameras. We focus our work on the case where the quality of the images is poor, and where only an approximation of a geometric model of the tracked object is available. Our technique uses physical forces applied to each rigid part of a kinematic 3D model of the object we are tracking. These forces guide the minimization of the differences between the pose of the 3D model and the pose of the real object in the video images. We use a fast recursive algorithm to solve the dynamical equations of motion of any 3D articulated model. We explain the key parts of our algorithms: how relevant information is extracted from the images, how the forces are created, and how the dynamical equations of motion are solved. A study of what kind of information should be extracted in the images and of when our algorithms fail is also presented. Finally we present some results about the tracking of a person. We also show the application of our method to the tracking of a hand in sequences of images, showing that the kind of information to extract from the images depends on their quality and of the configuration of the cameras.  相似文献   

19.
In this work a method is presented to track and estimate pose of articulated objects using the motion of a sparse set of moving features. This is achieved by using a bottom-up generative approach based on the Pictorial Structures representation [1]. However, unlike previous approaches that rely on appearance, our method is entirely dependent on motion. Initial low-level part detection is based on how a region moves as opposed to its appearance. This work is best described as Pictorial Structures using motion. A standard feature tracker is used to automatically extract a sparse set of features. These features typically contain many tracking errors, however, the presented approach is able to overcome both this and their sparsity. The proposed method is applied to two problems: 2D pose estimation of articulated objects walking side onto the camera and 3D pose estimation of humans walking and jogging at arbitrary orientations to the camera. In each domain quantitative results are reported that improve on state of the art. The motivation of this work is to illustrate the information present in low-level motion that can be exploited for the task of pose estimation.  相似文献   

20.
In this article, we present an approach for the fusion of 2d and 3d measurements for model-based person tracking, also known as Human Motion Capture. The applied body model is defined geometrically with generalized cylinders, and is set up hierarchically with connecting joints of different types. The joint model can be parameterized to control the degrees of freedom, adhesion and stiffness. This results in an articulated body model with constrained kinematic degrees of freedom.The fusion approach incorporates this model knowledge together with the measurements, and tracks the target body iteratively with an extended Iterative Closest Point (ICP) approach. Generally, the ICP is based on the concept of correspondences between measurements and model, which is normally exploited to incorporate 3d point cloud measurements. The concept has been generalized to represent and incorporate also 2d image space features.Together with the 3D point cloud from a 3d time-of-flight (ToF) camera, arbitrary features, derived from 2D camera images, are used in the fusion algorithm for tracking of the body. This gives complementary information about the tracked body, enabling not only tracking of depth motions but also turning movements of the human body, which is normally a hard problem for markerless human motion capture systems.The resulting tracking system, named VooDoo is used to track humans in a Human–Robot Interaction (HRI) context. We only rely on sensors on board the robot, i.e. the color camera, the ToF camera and a laser range finder. The system runs in realtime (~20 Hz) and is able to robustly track a human in the vicinity of the robot.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号