首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper address the problems of modeling the appearance of humans and distinguishing human appearance from the appearance of general scenes. We seek a model of appearance and motion that is generic in that it accounts for the ways in which people's appearance varies and, at the same time, is specific enough to be useful for tracking people in natural scenes. Given a 3D model of the person projected into an image we model the likelihood of observing various image cues conditioned on the predicted locations and orientations of the limbs. These cues are taken to be steered filter responses corresponding to edges, ridges, and motion-compensated temporal differences. Motivated by work on the statistics of natural scenes, the statistics of these filter responses for human limbs are learned from training images containing hand-labeled limb regions. Similarly, the statistics of the filter responses in general scenes are learned to define a background distribution. The likelihood of observing a scene given a predicted pose of a person is computed, for each limb, using the likelihood ratio between the learned foreground (person) and background distributions. Adopting a Bayesian formulation allows cues to be combined in a principled way. Furthermore, the use of learned distributions obviates the need for hand-tuned image noise models and thresholds. The paper provides a detailed analysis of the statistics of how people appear in scenes and provides a connection between work on natural image statistics and the Bayesian tracking of people.  相似文献   

2.
在动态场景的SLAM系统中,传统的特征点法视觉SLAM系统易受动态物体的影响,使得图像前后两帧的动态物体区域出现大量的误匹配,导致机器人定位精度不高。为此,提出一种结合自适应窗隔匹配模型与深度学习算法的动态场景RGB-D SLAM算法。构建基于自适应窗隔匹配模型的视觉SLAM前端算法框架,该框架筛选图像帧后采用基于网格的概率运动统计方式实现匹配点筛选,以获得静态区域的特征匹配点对,然后使用恒速度模型或参考帧模型实现位姿估计。利用深度学习算法Mask R-CNN提供的语义信息进行动态场景的静态三维稠密地图构建。在TUM数据集和实际环境中进行算法性能验证,结果表明,该算法在动态场景下的定位精度和跟踪速度均优于ORB-SLAM2及DynaSLAM系统,在全长为6.62 m的高动态场景中定位精度可达1.475 cm,平均跟踪时间为0.024 s。  相似文献   

3.
基于背景建模的动态场景目标检测   总被引:1,自引:1,他引:0       下载免费PDF全文
周箴毅  胡福乔 《计算机工程》2008,34(24):203-205
背景建模一直是运动目标检测中的一个重要课题。该文提出一个适用于动态背景的基于非参数估计的前景背景对比模型。模型通过核函数估计的方法模拟了像素点五维特征向量(彩色灰度值,图像坐标)的概率分布,并在图像序列中滚动更新。对于每一个新入帧通过马尔可夫随机场最大后验概率判决框架将前景背景全局分割问题转化为最大流最小切求解。实验证明,上述算法能够在一般目标检测,特别是动态场景(摇动树枝等)的检测中取得较好的效果。  相似文献   

4.
This contribution addresses the problem of pose estimation and tracking of vehicles in image sequences from traffic scenes recorded by a stationary camera. In a new algorithm, the vehicle pose is estimated by directly matching polyhedral vehicle models to image gradients without an edge segment extraction process. The new approach is significantly more robust than approaches that rely on feature extraction since the new approach exploits more information from the image data. We successfully tracked vehicles that were partially occluded by textured objects, e.g., foliage, where a previous approach based on edge segment extraction failed. Moreover, the new pose estimation approach is also used to determine the orientation and position of the road relative to the camera by matching an intersection model directly to image gradients. Results from various experiments with real world traffic scenes are presented.  相似文献   

5.
6.
We present an appearance model for establishing correspondence between tracks of people which may be taken at different places, at different times or across different cameras. The appearance model is constructed by kernel density estimation. To incorporate structural information and to achieve invariance to motion and pose, besides color features, an additional feature of path-length is used. To achieve illumination invariance, two types of illumination insensitive color features are discussed: brightness color feature and RGB rank feature. The similarity between a test image and an appearance model is measured by the information gain or Kullback–Leibler distance. To thoroughly represent the information contained in a video sequence with as little data as possible, a key frame selection and matching scheme is proposed. Experimental results demonstrate the important role of the path-length feature in the appearance model and the effectiveness of the proposed appearance model and matching method.  相似文献   

7.
A tightly-coupled stereo vision-aided inertial navigation system is proposed in this work, as a synergistic incorporation of vision with other sensors. In order to avoid loss of information possibly resulting by visual preprocessing, a set of feature-based motion sensors and an inertial measurement unit are directly fused together to estimate the vehicle state. Two alternative feature-based observation models are considered within the proposed fusion architecture. The first model uses the trifocal tensor to propagate feature points by homography, so as to express geometric constraints among three consecutive scenes. The second one is derived by using a rigid body motion model applied to three-dimensional (3D) reconstructed feature points. A kinematic model accounts for the vehicle motion, and a Sigma-Point Kalman filter is used to achieve a robust state estimation in the presence of non-linearities. The proposed formulation is derived for a general platform-independent 3D problem, and it is tested and demonstrated with a real dynamic indoor data-set alongside of a simulation experiment. Results show improved estimates than in the case of a classical visual odometry approach and of a loosely-coupled stereo vision-aided inertial navigation system, even in GPS (Global Positioning System)-denied conditions and when magnetometer measurements are not reliable.  相似文献   

8.
针对动态场景下视觉SLAM(simultaneous localization and mapping)算法易受运动特征点影响,从而导致位姿估计准确度低、鲁棒性差的问题,提出了一种基于动态区域剔除的RGB-D视觉SLAM算法。首先借助语义信息,识别出属于移动对象的特征点,并借助相机的深度信息利用多视图几何检测特征点在此时是否保持静止;然后使用从静态对象提取的特征点和从可移动对象导出的静态特征点来微调相机姿态估计,以此实现系统在动态场景中准确而鲁棒的运行;最后利用TUM数据集中的动态室内场景进行了实验验证。实验表明,在室内动态环境中,所提算法能够有效提高相机的位姿估计精度,实现动态环境中的地图更新,在提升系统鲁棒性的同时也提高了地图构建的准确性。  相似文献   

9.
Optical flow estimation is discussed based on a model for time-varying images more general than that implied by Horn and Schunk (1981). The emphasis is on applications where low contrast imagery, nonrigid or evolving object patterns movement, as well as large interframe displacements are encountered. Template matching is identified as having advantages over point correspondence and the gradient-based approach in dealing with such applications. The two fundamental uncertainties in feature matching, whether template matching or feature point correspondences, are discussed. Correlation template matching procedures are established based on likelihood measurement. A method for determining optical flow is developed by combining template matching and relaxation labeling. A number of candidate displacements for each template and their respective likelihood measures are determined. Then, relaxation labeling is employed to iteratively update each candidate's likelihood by requiring smoothness within a motion field. Real cloud images from satellites are used to test the method  相似文献   

10.
Discriminative human pose estimation is the problem of inferring the 3D articulated pose of a human directly from an image feature. This is a challenging problem due to the highly non-linear and multi-modal mapping from the image feature space to the pose space. To address this problem, we propose a model employing a mixture of Gaussian processes where each Gaussian process models a local region of the pose space. By employing the models in this way we are able to overcome the limitations of Gaussian processes applied to human pose estimation — their O(N3) time complexity and their uni-modal predictive distribution. Our model is able to give a multi-modal predictive distribution where each mode is represented by a different Gaussian process prediction. A logistic regression model is used to give a prior over each expert prediction in a similar fashion to previous mixture of expert models. We show that this technique outperforms existing state of the art regression techniques on human pose estimation data sets for ballet dancing, sign language and the HumanEva data set.  相似文献   

11.
Moving vehicles are detected and tracked automatically in monocular image sequences from road traffic scenes recorded by a stationary camera. In order to exploit the a priori knowledge about shape and motion of vehicles in traffic scenes, a parameterized vehicle model is used for an intraframe matching process and a recursive estimator based on a motion model is used for motion estimation. An interpretation cycle supports the intraframe matching process with a state MAP-update step. Initial model hypotheses are generated using an image segmentation component which clusters coherently moving image features into candidate representations of images of a moving vehicle. The inclusion of an illumination model allows taking shadow edges of the vehicle into account during the matching process. Only such an elaborate combination of various techniques has enabled us to track vehicles under complex illumination conditions and over long (over 400 frames) monocular image sequences. Results on various real-world road traffic scenes are presented and open problems as well as future work are outlined.  相似文献   

12.
目的 目标跟踪是计算机视觉领域重点研究方向之一,在智能交通、人机交互等方面有着广泛应用。尽管目前基于相关滤波的方法由于其高效、鲁棒在该领域取得了显著进展,但特征的选择和表示一直是追踪过程中建立目标外观时的首要考虑因素。为了提高外观模型的鲁棒性,越来越多的跟踪器中引入梯度特征、颜色特征或其他组合特征代替原始灰度单一特征,但是该类方法没有结合特征本身考虑不同特征在模型中所占的比重。方法 本文重点研究特征的选取以及融合方式,通过引入权重向量对特征进行融合,设计了基于加权多特征外观模型的追踪器。根据特征的计算方式,构造了一项二元一次方程,将权重向量的求解转化为确定特征的比例系数,结合特征本身的维度信息,得到方程的有限组整数解集,最后通过实验确定最终的比例系数,并将其归一化得到权重向量,进而构建一种新的加权混合特征模型对目标外观建模。结果 采用OTB-100中的100个视频序列,将本文算法与其他7种主流算法,包括5种相关滤波类方法,以精确度、平均中心误差、实时性为评价指标进行了对比实验分析。在保证实时性的同时,本文算法在Basketball、DragonBaby、Panda、Lemming等多个数据集上均表现出了更好的追踪结果。在100个视频集上的平均结果与基于多特征融合的尺度自适应跟踪器相比,精确度提高了1.2%。结论 本文基于相关滤波的追踪框架在进行目标的外观描述时引入权重向量,进而提出了加权多特征融合追踪器,使得在复杂动态场景下追踪长度更长,提高了算法的鲁棒性。  相似文献   

13.
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure, we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. Results of applying our method to three data sets in a variety of conditions demonstrate that, from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval.  相似文献   

14.
Underwater images often exhibit severe color deviations and degraded visibility, which limits many practical applications in ocean engineering. Although extensive research has been conducted into underwater image enhancement, little of which demonstrates the significant robustness and generalization for diverse real-world underwater scenes. In this paper, we propose an adaptive color correction algorithm based on the maximum likelihood estimation of Gaussian parameters, which effectively removes color casts of a variety of underwater images. A novel algorithm using weighted combination of gradient maps in HSV color space and absolute difference of intensity for accurate background light estimation is proposed, which circumvents the influence of white or bright regions that challenges existing physical model-based methods. To enhance contrast of resultant images, a piece-wise affine transform is applied to the transmission map estimated via background light differential. Finally, with the estimated background light and transmission map, the scene radiance is recovered by addressing an inverse problem of image formation model. Extensive experiments reveal that our results are characterized by natural appearance and genuine color, and our method achieves competitive performance with the state-of-the-art methods in terms of objective evaluation metrics, which further validates the better robustness and higher generalization ability of our enhancement model.  相似文献   

15.
Tracking multiple objects is critical to automatic video content analysis and virtual reality. The major problem is how to solve data association problem when ambiguous measurements are caused by objects in close proximity. To tackle this problem, we propose a multiple information fusion-based multiple hypotheses tracking algorithm integrated with appearance feature, local motion pattern feature and repulsion–inertia model for multi-object tracking. Appearance model based on HSV–local binary patterns histogram and local motion pattern based on optical flow are adopted to describe objects. A likelihood calculation framework is proposed to incorporate the similarities of appearance, dynamic process and local motion pattern. To consider the changes in appearance and motion pattern over time, we make use of an effective template updating strategy for each object. In addition, a repulsion–inertia model is adopted to explore more useful information from ambiguous detections. Experimental results show that the proposed approach generates better trajectories with less missing objects and identity switches.  相似文献   

16.
针对压缩跟踪(CT)算法在构建判别表观模型过程中提取背景像素稀疏Haar-like特征导致目标跟踪漂移加重的问题,提出一种融合归一化灰度直方图全局特征模板的改进算法。与局部特征模板相比,全局特征模板更适于对目标和背景进行判别。改进算法基于压缩感知理论提取局部稀疏Haar-like特征构建表观模型M1得到跟踪目标的第一个估计参数H(v),提取归一化全局灰度直方图特征构建表观模型M2得到跟踪目标的第二个估计参数HD,使用H(v)和HD的线性组合作为表观模型利用贝叶斯分类器进行目标跟踪。实验结果表明,改进的算法提升了算法的鲁棒性,减轻了漂移问题。  相似文献   

17.
We present a novel approach for 3D human body shape model adaptation to a sequence of multi-view images, given an initial shape model and initial pose sequence. In a first step, the most informative frames are determined by optimization of an objective function that maximizes a shape–texture likelihood function and a pose diversity criterion (i.e. the model surface area that lies close to the occluding contours), in the selected frames. Thereafter, a batch-mode optimization is performed of the underlying shape- and pose-parameters, by means of an objective function that includes both contour and texture cues over the selected multi-view frames.Using above approach, we implement automatic pose and shape estimation using a three-step procedure: first, we recover initial poses over a sequence using an initial (generic) body model. Both model and poses then serve as input to the above mentioned adaptation process. Finally, a more accurate pose recovery is obtained by means of the adapted model.We demonstrate the effectiveness of our frame selection, model adaptation and integrated pose and shape recovery procedure in experiments using both challenging outdoor data and the HumanEva data set.  相似文献   

18.
The Lucas–Kanade tracker (LKT) is a commonly used method to track target objects over 2D images. The key principle behind the object tracking of an LKT is to warp the object appearance so as to minimize the difference between the warped object’s appearance and a pre-stored template. Accordingly, the 2D pose of the tracked object in terms of translation, rotation, and scaling can be recovered from the warping. To extend the LKT for 3D pose estimation, a model-based 3D LKT assumes a 3D geometric model for the target object in the 3D space and tries to infer the 3D object motion by minimizing the difference between the projected 2D image of the 3D object and the pre-stored 2D image template. In this paper, we propose an extended model-based 3D LKT for estimating 3D head poses by tracking human heads on video sequences. In contrast to the original model-based 3D LKT, which uses a template with each pixel represented by a single intensity value, the proposed model-based 3D LKT exploits an adaptive template with each template pixel modeled by a continuously updated Gaussian distribution during head tracking. This probabilistic template modeling improves the tracker’s ability to handle temporal fluctuation of pixels caused by continuous environmental changes such as varying illumination and dynamic backgrounds. Due to the new probabilistic template modeling, we reformulate the head pose estimation as a maximum likelihood estimation problem, rather than the original difference minimization procedure. Based on the new formulation, an algorithm to estimate the best head pose is derived. The experimental results show that the proposed extended model-based 3D LKT achieves higher accuracy and reliability than the conventional one does. Particularly, the proposed LKT is very effective in handling varying illumination, which cannot be well handled in the original LKT.  相似文献   

19.
人体行为识别是计算机视觉的研究难点和热点,主流的研究框架包括行为特征提取、人体行为表示和识别算法3个方面,目前简单场景下的人体简单动作的识别已基本得到解决,而复杂场景下的行为识别仍面临很多困难。对近几年人体行为识别的发展做了比较详细的研究,从人体行为识别的研究范畴、特征提取以及行为模型等方面综述了目前复杂场景下人体行为识别的研究方法。与已有的相关综述文献不同的是,文中结合了近三年国内外人体行为识别领域中新的研究热点和成果,如姿态特征的提取和表示、基于稀疏编码和卷积神经网络的人体行为表示方法等。最后阐述了该领域目前存在的困难以及可能的发展趋向。  相似文献   

20.
支持向量数据描述(SVDD)算法是解决单类分类问题的最好方法之一,在人体姿态估计问题中获得了成功的应用,在建立部位外观模型方面取得了良好的效果,但现有利用SVDD算法建立的部位外观模型将所有训练样本和样本不同特征都平等对待。为克服存在的这两个缺陷,提出了一种样本和特征加权的SVDD算法,并用其建立了一种基于样本和特征加权SVDD算法的部位外观模型。样本的权重系数根据样本到样本中心的距离远近来确定,样本特征的权重系数根据特征对应图像区域被训练图像中真实人体部位包含次数的多少来确定。仿真实验结果表明所建立的部位外观模型比利用标准SVDD算法建立的部位外观模型能更准确地描述真实人体部位的外观,能得到更高的人体姿态估计准确度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号