首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a novel probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audiovisual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results-based on an objective evaluation procedure-that show that our framework 1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy, 2) can deal with cases of visual clutter and occlusion, and 3) significantly outperforms a traditional sampling-based approach  相似文献   

2.
基于图像序列的人体跟踪   总被引:3,自引:2,他引:3  
代凯乾  刘肖琳 《计算机仿真》2007,24(7):202-204,224
由于人体的非刚体性和人体之间会经常发生遮挡,使得人体跟踪是一个很有挑战性的课题.针对这一特点,提出了用结合卡尔曼滤波和贝叶斯的方法来完成多个人体的跟踪,先建立简单的背景模型,然后用背景差分法得到前景区域,提取运动人体,并用EM(期望-最大化)算法建立相应的人体模型.在人体间没有发生遮挡时,用卡尔曼滤波方法来跟踪各个人体;人体间出现遮挡时,用贝叶斯方法来判别和跟踪相应人体.实验表明,该方法既能保证跟踪的快速性,又能很好地处理人体间相互遮挡的情况,该算法鲁棒性好,跟踪结果令人满意.  相似文献   

3.
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-baseline cameras. The proposed global occlusion estimation approach can deal with severe inter-person occlusions in one or more views by exploiting information from other views. Image features from non-occluded views are given more weight than image features from occluded views. Self occlusion is handled by local occlusion estimation. The local occlusion estimation is used to update the image likelihood function by sorting body parts as a function of distance to the cameras. The combination of the global and the local occlusion estimation leads to accurate tracking results at much lower computational costs. We evaluate the performance of our approach on a pose estimation data set in which inter-person and self occlusions are present. The results of our experiments show that our approach is able to robustly track multiple people during large movement with severe inter-person occlusions and self occlusions, whilst maintaining near real-time performance.  相似文献   

4.
We propose a novel framework to generate a global texture atlas for a deforming geometry. Our approach distinguishes from prior arts in two aspects. First, instead of generating a texture map for each timestamp to color a dynamic scene, our framework reconstructs a global texture atlas that can be consistently mapped to a deforming object. Second, our approach is based on a single RGB‐D camera, without the need of a multiple‐camera setup surrounding a scene. In our framework, the input is a 3D template model with an RGB‐D image sequence, and geometric warping fields are found using a state‐of‐the‐art non‐rigid registration method [GXW*15] to align the template mesh to noisy and incomplete input depth images. With these warping fields, our multi‐scale approach for texture coordinate optimization generates a sharp and clear texture atlas that is consistent with multiple color observations over time. Our approach is accelerated by graphical hardware and provides a handy configuration to capture a dynamic geometry along with a clean texture atlas. We demonstrate our approach with practical scenarios, particularly human performance capture. We also show that our approach is resilient on misalignment issues caused by imperfect estimation of warping fields and inaccurate camera parameters.  相似文献   

5.
This paper presents a method which avoids the common practice of using a complex joint state-space representation and performing tedious joint data association for multiple object tracking applications. Instead, we propose a distributed Bayesian formulation using multiple interactive trackers that requires much lower complexity for real-time tracking applications. When the objects' observations do not interact with each other, our approach performs as multiple independent trackers. However, when the objects' observations exhibit interaction, defined as close proximity or partial and complete occlusion, we extend the conventional Bayesian tracking framework by modeling such interaction in terms of potential functions. The proposed "magnetic-inertia" model represents the cumulative effect of virtual physical forces that objects undergo while interacting with each other. It implicitly handles the "error merge " and "object labeling" problems and thus solves the difficult object occlusion and data association problems in an innovative way. Our preliminary simulations have demonstrated that the proposed approach is far superior to other methods in both robustness and speed  相似文献   

6.
Augmented Reality (AR) composes virtual objects with real scenes in a mixed environment where human–computer interaction has more semantic meanings. To seamlessly merge virtual objects with real scenes, correct occlusion handling is a significant challenge. We present an approach to separate occluded objects in multiple layers by utilizing depth, color, and neighborhood information. Scene depth is obtained by stereo cameras and two Gaussian local kernels are used to represent color, spatial smoothness. These three cues are intelligently fused in a probability framework, where the occlusion information can be safely estimated. We apply our method to handle occlusions in video‐based AR where virtual objects are simply overlapped on real scenes. Experiment results show the approach can correctly register virtual and real objects in different depth layers, and provide a spatial‐awareness interaction environment. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
This paper addresses the problem of analyzing human motion in an image sequence. This is of particular importance for video-surveillance applications. We have developed an original and efficient approach to track the apparent contours of a moving articulated structure, avoiding the use of 3D models. This method exploits spatio-temporal XT-slices from the image sequence volume XYT, where typical trajectory patterns can be associated with articulated motion such as human walking. We reconstruct these trajectories online by introducing an appropriate predictive model while correctly handling occlusion periods. This paradigm can lead to a simple trajectory recognition scheme. Experiments with real-world images depicting human walk or gesture are reported, and obtained results validate the proposed approach  相似文献   

8.
A Markov pixon information approach for low-level image description   总被引:1,自引:0,他引:1  
The problem of extracting information from an image which corresponds to early stage processing in vision is addressed. We propose a new approach (the MPI approach) which simultaneously provides a restored image, a segmented image and a map which reflects the local scale for representing the information. Embedded in a Bayesian framework, this approach is based on an information prior, a pixon model and two Markovian priors. This model based approach is oriented to detect and analyze small parabolic patches in a noisy environment. The number of clusters and their parameters are not required for the segmentation process. The MPI approach is applied to the analysis of statistical parametric maps obtained from fMRI experiments  相似文献   

9.
To successfully interact with and learn from humans in cooperative modes, robots need a mechanism for recognizing, characterizing, and emulating human skills. In particular, it is our interest to develop the mechanism for recognizing and emulating simple human actions, i.e., a simple activity in a manual operation where no sensory feedback is available. To this end, we have developed a method to model such actions using a hidden Markov model (HMM) representation. We proposed an approach to address two critical problems in action modeling: classifying human action-intent, and learning human skill, for which we elaborated on the method, procedure, and implementation issues in this paper. This work provides a framework for modeling and learning human actions from observations. The approach can be applied to intelligent recognition of manual actions and high-level programming of control input within a supervisory control paradigm, as well as automatic transfer of human skills to robotic systems  相似文献   

10.
A Behavioral Analysis of Computational Models of Visual Attention   总被引:3,自引:0,他引:3  
Robots often incorporate computational models of visual attention to streamline processing. Even though the number of visual attention systems employed on robots has increased dramatically in recent years, the evaluation of these systems has remained primarily qualitative and subjective. We introduce quantitative methods for evaluating computational models of visual attention by direct comparison with gaze trajectories acquired from humans. In particular, we focus on the need for metrics based not on distances within the image plane, but that instead operate at the level of underlying features. We present a framework, based on dimensionality-reduction over the features of human gaze trajectories, that can simultaneously be used for both optimizing a particular computational model of visual attention and for evaluating its performance in terms of similarity to human behavior. We use this framework to evaluate the Itti et al. (1998) model of visual attention, a computational model that serves as the basis for many robotic visual attention systems.  相似文献   

11.
朱琳  周杰  宋靖雁 《计算机学报》2008,31(1):151-160
跟踪多个运动物体,尤其是在遮挡过程中跟踪多个运动物体,是计算机视觉领域一个重要但具有挑战性的问题.该文提出了一种新的在线采样、更新学习和分类的跟踪框架来处理多物体跟踪问题.首先,对遮挡发生前若干帧的各物体进行块采样,作为训练样本进行在线分类器设计.各帧的物体区域也在线进行块采样,并用这些分类器来进行分类标号.如果遮挡没有发生,一些新的训练样本被添加用来更新分类器.当遮挡发生时,根据标号结果,前景区域被分割成多个目标物体.和以往方法相比,新方法不依赖于一些假设条件,如场景深度信息、物体的先验模型(比如形状、种类、区域内颜色各向同性、运动规律等),具有更好的适应能力.实验结果验证了该文方法的稳定性和有效性.  相似文献   

12.
Multiple human tracking in high-density crowds   总被引:1,自引:0,他引:1  
In this paper, we introduce a fully automatic algorithm to detect and track multiple humans in high-density crowds in the presence of extreme occlusion. Typical approaches such as background modeling and body part-based pedestrian detection fail when most of the scene is in motion and most body parts of most of the pedestrians are occluded. To overcome this problem, we integrate human detection and tracking into a single framework and introduce a confirmation-by-classification method for tracking that associates detections with tracks, tracks humans through occlusions, and eliminates false positive tracks. We use a Viola and Jones AdaBoost detection cascade, a particle filter for tracking, and color histograms for appearance modeling. To further reduce false detections due to dense features and shadows, we introduce a method for estimation and utilization of a 3D head plane that reduces false positives while preserving high detection rates. The algorithm learns the head plane from observations of human heads incrementally, without any a priori extrinsic camera calibration information, and only begins to utilize the head plane once confidence in the parameter estimates is sufficiently high. In an experimental evaluation, we show that confirmation-by-classification and head plane estimation together enable the construction of an excellent pedestrian tracker for dense crowds.  相似文献   

13.
Tracking multiple objects is more challenging than tracking a single object. Some problems arise in multiple-object tracking that do not exist in single-object tracking, such as object occlusion, the appearance of a new object and the disappearance of an existing object, updating the occluded object, etc. In this article, we present an approach to handling multiple-object tracking in the presence of occlusions, background clutter, and changing appearance. The occlusion is handled by considering the predicted trajectories of the objects based on a dynamic model and likelihood measures. We also propose target-model-update conditions, ensuring the proper tracking of multiple objects. The proposed method is implemented in a probabilistic framework such as a particle filter in conjunction with a color feature. The particle filter has proven very successful for nonlinear and non-Gaussian estimation problems. It approximates a posterior probability density of the state, such as the object’s position, by using samples or particles, where each state is denoted as the hypothetical state of the tracked object and its weight. The observation likelihood of the objects is modeled based on a color histogram. The sample weight is measured based on the Bhattacharya coefficient, which measures the similarity between each sample’s histogram and a specified target model. The algorithm can successfully track multiple objects in the presence of occlusion and noise. Experimental results show the effectiveness of our method in tracking multiple objects.  相似文献   

14.
对于运动视觉目标,如何对遮挡区域进行规避是视觉领域一个具有挑战性的问题.本文提出了一种新颖的基于运动视觉目标深度图像利用遮挡信息实现动态遮挡规避的方法.该方法主要利用遮挡区域最佳观测方位模型和视觉目标运动估计方程,通过合理规划摄像机的观测方位逐渐完成对遮挡区域的观测.主要贡献在于:1)提出了深度图像遮挡边界中关键点的概念,利用其构建关键线段对遮挡区域进行快速建模;2)基于关键线段和遮挡区域建模结果,提出了一种构建遮挡区域最佳观测方位模型的方法;3)提出一种混合曲率特征,通过计算深度图像对应的混合曲率矩阵,增加了图像匹配过程中提取特征点的数量,有利于准确估计视觉目标的运动.实验结果验证了所提方法的可行性和有效性.  相似文献   

15.
Faces in natural images are often occluded by a variety of objects. We propose a fully automated, probabilistic and occlusion-aware 3D morphable face model adaptation framework following an analysis-by-synthesis setup. The key idea is to segment the image into regions explained by separate models. Our framework includes a 3D morphable face model, a prototype-based beard model and a simple model for occlusions and background regions. The segmentation and all the model parameters have to be inferred from the single target image. Face model adaptation and segmentation are solved jointly using an expectation–maximization-like procedure. During the E-step, we update the segmentation and in the M-step the face model parameters are updated. For face model adaptation we apply a stochastic sampling strategy based on the Metropolis–Hastings algorithm. For segmentation, we apply loopy belief propagation for inference in a Markov random field. Illumination estimation is critical for occlusion handling. Our combined segmentation and model adaptation needs a proper initialization of the illumination parameters. We propose a RANSAC-based robust illumination estimation technique. By applying this method to a large face image database we obtain a first empirical distribution of real-world illumination conditions. The obtained empirical distribution is made publicly available and can be used as prior in probabilistic frameworks, for regularization or to synthesize data for deep learning methods.  相似文献   

16.
The majority of existing tracking algorithms are based on the maximum a posteriori solution of a probabilistic framework using a Hidden Markov Model, where the distribution of the object state at the current time instance is estimated based on current and previous observations. However, this approach is prone to errors caused by distractions such as occlusions, background clutters and multi-object confusions. In this paper, we propose a multiple object tracking algorithm that seeks the optimal state sequence that maximizes the joint multi-object state-observation probability. We call this algorithm trajectory tracking since it estimates the state sequence or “trajectory” instead of the current state. The algorithm is capable of tracking unknown time-varying number of multiple objects. We also introduce a novel observation model which is composed of the original image, the foreground mask given by background subtraction and the object detection map generated by an object detector. The image provides the object appearance information. The foreground mask enables the likelihood computation to consider the multi-object configuration in its entirety. The detection map consists of pixel-wise object detection scores, which drives the tracking algorithm to perform joint inference on both the number of objects and their configurations efficiently. The proposed algorithm has been implemented and tested extensively in a complete CCTV video surveillance system to monitor entries and detect tailgating and piggy-backing violations at access points for over six months. The system achieved 98.3% precision in event classification. The violation detection rate is 90.4% and the detection precision is 85.2%. The results clearly demonstrate the advantages of the proposed detection based trajectory tracking framework.  相似文献   

17.
In this paper, we present a framework for visual object tracking based on clustering trajectories of image key points extracted from an image sequence. The main contribution of our method is that the trajectories are automatically extracted from the image sequence and they are provided directly to a model-based clustering approach. In most other methodologies, the latter constitutes a difficult part since the resulting feature trajectories have a short duration, as the key points disappear and reappear due to occlusion, illumination, viewpoint changes and noise. We present here a sparse, translation invariant regression mixture model for clustering trajectories of variable length. The overall scheme is converted into a maximum a posteriori approach, where the Expectation–Maximization (EM) algorithm is used for estimating the model parameters. The proposed method detects the different objects in the input image sequence by assigning each trajectory to a cluster, and simultaneously provides their motion. Numerical results demonstrate the ability of the proposed method to offer more accurate and robust solutions in comparison with other tracking approaches, such as the mean shift tracker, the camshift tracker and the Kalman filter.  相似文献   

18.
目的 光场相机通过一次成像同时记录场景的空间信息和角度信息,获取多视角图像和重聚焦图像,在深度估计中具有独特优势。遮挡是光场深度估计中的难点问题之一,现有方法没有考虑遮挡或仅仅考虑单一遮挡情况,对于多遮挡场景点,方法失效。针对遮挡问题,在多视角立体匹配框架下,提出了一种对遮挡鲁棒的光场深度估计算法。方法 首先利用数字重聚焦算法获取重聚焦图像,定义场景的遮挡类型,并构造相关性成本量。然后根据最小成本原则自适应选择最佳成本量,并求解局部深度图。最后利用马尔可夫随机场结合成本量和平滑约束,通过图割算法和加权中值滤波获取全局优化深度图,提升深度估计精度。结果 实验在HCI合成数据集和Stanford Lytro Illum实际场景数据集上展开,分别进行局部深度估计与全局深度估计实验。实验结果表明,相比其他先进方法,本文方法对遮挡场景效果更好,均方误差平均降低约26.8%。结论 本文方法能够有效处理不同遮挡情况,更好地保持深度图边缘信息,深度估计结果更准确,且时效性更好。此外,本文方法适用场景是朗伯平面场景,对于含有高光的非朗伯平面场景存在一定缺陷。  相似文献   

19.
Detection and tracking of humans in video streams is important for many applications. We present an approach to automatically detect and track multiple, possibly partially occluded humans in a walking or standing pose from a single camera, which may be stationary or moving. A human body is represented as an assembly of body parts. Part detectors are learned by boosting a number of weak classifiers which are based on edgelet features. Responses of part detectors are combined to form a joint likelihood model that includes an analysis of possible occlusions. The combined detection responses and the part detection responses provide the observations used for tracking. Trajectory initialization and termination are both automatic and rely on the confidences computed from the detection responses. An object is tracked by data association and meanshift methods. Our system can track humans with both inter-object and scene occlusions with static or non-static backgrounds. Evaluation results on a number of images and videos and comparisons with some previous methods are given. Electronic Supplementary Material Supplementary material is available in the online version of this article at  相似文献   

20.
Robust object tracking with background-weighted local kernels   总被引:7,自引:0,他引:7  
Object tracking is critical to visual surveillance, activity analysis and event/gesture recognition. The major issues to be addressed in visual tracking are illumination changes, occlusion, appearance and scale variations. In this paper, we propose a weighted fragment based approach that tackles partial occlusion. The weights are derived from the difference between the fragment and background colors. Further, a fast and yet stable model updation method is described. We also demonstrate how edge information can be merged into the mean shift framework without having to use a joint histogram. This is used for tracking objects of varying sizes. Ideas presented here are computationally simple enough to be executed in real-time and can be directly extended to a multiple object tracking system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号