共查询到20条相似文献,搜索用时 0 毫秒
1.
David Joseph Tan Federico Tombari Nassir Navab 《International Journal of Computer Vision》2018,126(2-4):158-183
We demonstrate how 3D head tracking and pose estimation can be effectively and efficiently achieved from noisy RGB-D sequences. Our proposal leverages on a random forest framework, designed to regress the 3D head pose at every frame in a temporal tracking manner. One peculiarity of the algorithm is that it exploits together (1) a generic training dataset of 3D head models, which is learned once offline; and, (2) an online refinement with subject-specific 3D data, which aims for the tracker to withstand slight facial deformations and to adapt its forest to the specific characteristics of an individual subject. The combination of these works allows our algorithm to be robust even under extreme poses, where the user’s face is no longer visible on the image. Finally, we also propose another solution that utilizes a multi-camera system such that the data simultaneously acquired from multiple RGB-D sensors helps the tracker to handle challenging conditions that affect a subset of the cameras. Notably, the proposed multi-camera frameworks yields a real-time performance of approximately 8 ms per frame given six cameras and one CPU core, and scales up linearly to 30 fps with 25 cameras. 相似文献
2.
This paper introduces the use of a visual attention model to improve the accuracy of gaze tracking systems. Visual attention models simulate the selective attention part of the human visual system. For instance, in a bottom‐up approach, a saliency map is defined for the image and gives an attention weight to every pixel of the image as a function of its colour, edge or intensity. Our algorithm uses an uncertainty window, defined by the gaze tracker accuracy, and located around the gaze point given by the tracker. Then, using a visual attention model, it searches for the most salient points, or objects, located inside this uncertainty window, and determines a novel, and hopefully, better gaze point. This combination of a gaze tracker together with a visual attention model is considered as the main contribution of the paper. We demonstrate the promising results of our method by presenting two experiments conducted in two different contexts: (1) a free exploration of a visually rich 3D virtual environment without a specific task, and (2) a video game based on gaze tracking involving a selection task. Our approach can be used to improve real‐time gaze tracking systems in many interactive 3D applications such as video games or virtual reality applications. The use of a visual attention model can be adapted to any gaze tracker and the visual attention model can also be adapted to the application in which it is used. 相似文献
3.
Tracking in a Dense Crowd Using Multiple Cameras 总被引:1,自引:0,他引:1
Tracking people in a dense crowd is a challenging problem for a single camera tracker due to occlusions and extensive motion that make human segmentation difficult. In this paper we suggest a method for simultaneously tracking all the people in a densely crowded scene using a set of cameras with overlapping fields of view. To overcome occlusions, the cameras are placed at a high elevation and only people’s heads are tracked. Head detection is still difficult since each foreground region may consist of multiple subjects. By combining data from several views, height information is extracted and used for head segmentation. The head tops, which are regarded as 2D patches at various heights, are detected by applying intensity correlation to aligned frames from the different cameras. The detected head tops are then tracked using common assumptions on motion direction and velocity. The method was tested on sequences in indoor and outdoor environments under challenging illumination conditions. It was successful in tracking up to 21 people walking in a small area (2.5 people per m2), in spite of severe and persistent occlusions. 相似文献
4.
Tesfaye Yonatan Tariku Zemene Eyasu Prati Andrea Pelillo Marcello Shah Mubarak 《International Journal of Computer Vision》2019,127(9):1303-1320
International Journal of Computer Vision - In this paper, a unified three-layer hierarchical approach for solving tracking problem in a multiple non-overlapping cameras setting is proposed. Given a... 相似文献
5.
Zhenbao Liu Sicong Tang Weiwei Xu Shuhui Bu Junwei Han Kun Zhou 《Computer Graphics Forum》2014,33(7):269-278
Since indoor scenes are frequently changed in daily life, such as re‐layout of furniture, the 3D reconstructions for them should be flexible and easy to update. We present an automatic 3D scene update algorithm to indoor scenes by capturing scene variation with RGBD cameras. We assume an initial scene has been reconstructed in advance in manual or other semi‐automatic way before the change, and automatically update the reconstruction according to the newly captured RGBD images of the real scene update. It starts with an automatic segmentation process without manual interaction, which benefits from accurate labeling training from the initial 3D scene. After the segmentation, objects captured by RGBD camera are extracted to form a local updated scene. We formulate an optimization problem to compare to the initial scene to locate moved objects. The moved objects are then integrated with static objects in the initial scene to generate a new 3D scene. We demonstrate the efficiency and robustness of our approach by updating the 3D scene of several real‐world scenes. 相似文献
6.
Sisil Kumarawadu Keigo Watanabe Kazuo Kiguchi Kiyotaka Izumi 《Journal of Intelligent and Robotic Systems》2003,36(2):129-147
In this article we present a neurally-inspired self-adaptive active binocular tracking scheme and an efficient mathematical model for online computation of desired binocular-head trajectories. The self-adaptive neural network (NN) model is general and can be adopted in output tracking schemes of any partly known robotic systems. The tracking scheme ingeniously combines the conventional Resolved Velocity Control (RVC) technique and an adaptive compensating NN model constructed using SoftMax basis functions as nonlinear activation function. Desired trajectories to the servo controller are computed online by the use of a suitable linear kinematics mathematical model of the system. Online weight tuning algorithm guarantees tracking with small errors and error rates as well as bounded NN weights. 相似文献
7.
In this paper, we propose a system that can detect and track hair regions of heads automatically and runs at video-rate (30 frames per-second) by making use of both the color and the depth information obtained from a Kinect. Our system has three characteristics: (1) Using a 6D feature vector to describe both the 3D color feature and 3D geometric feature ofa pixel uniformly; (2) Classifying pixels into foreground (e.g., hair) and background with K-means clustering algorithm; (3) Selecting and updating the cluster centers of foreground and background before and during hair tracking automatically. Our system can track hair of any color or any style robustly in clustered background where some objects have color similar to the hair or in environment where the illumination changes. Moreover, it can also be used to track faces (or heads) if the face (= skin + hair) is selected as foreground. 相似文献
8.
Michael Zollhöfer Patrick Stotko Andreas Görlitz Christian Theobalt Matthias Nießner Reinhard Klein Andreas Kolb 《Computer Graphics Forum》2018,37(2):625-652
The advent of affordable consumer grade RGB‐D cameras has brought about a profound advancement of visual scene reconstruction methods. Both computer graphics and computer vision researchers spend significant effort to develop entirely new algorithms to capture comprehensive shape models of static and dynamic scenes with RGB‐D cameras. This led to significant advances of the state of the art along several dimensions. Some methods achieve very high reconstruction detail, despite limited sensor resolution. Others even achieve real‐time performance, yet possibly at lower quality. New concepts were developed to capture scenes at larger spatial and temporal extent. Other recent algorithms flank shape reconstruction with concurrent material and lighting estimation, even in general scenes and unconstrained conditions. In this state‐of‐the‐art report, we analyze these recent developments in RGB‐D scene reconstruction in detail and review essential related work. We explain, compare, and critically analyze the common underlying algorithmic concepts that enabled these recent advancements. Furthermore, we show how algorithms are designed to best exploit the benefits of RGB‐D data while suppressing their often non‐trivial data distortions. In addition, this report identifies and discusses important open research questions and suggests relevant directions for future work. 相似文献
9.
The eXtensible Markup Language (XML) has reached a wide acceptance as the relevant standardization for representing and exchanging data on the Web. Unfortunately, XML covers the syntactic level but lacks semantics, and thus cannot be directly used for the Semantic Web. Currently, finding a way to utilize XML data for the Semantic Web is challenging research. As we have known that ontology can formally represent shared domain knowledge and enable semantics interoperability. Therefore, in this paper, we investigate how to represent and reason about XML with ontologies. Firstly, we give formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents. On this basis, we propose formal approaches for transforming the XML data sources into ontologies, and we also discuss the correctness of the transformations and provide several transformation examples. Furthermore, following the proposed approaches, we implement a prototype tool that can automatically transform XML into ontologies. Finally, we apply the transformed ontologies for reasoning about XML, so that some reasoning problems of XML may be checked by the existing ontology reasoners. 相似文献
10.
11.
Q. Wen D. Bradley T. Beeler S. Park O. Hilliges J. Yong F. Xu 《Computer Graphics Forum》2020,39(2):475-485
3D gaze tracking from a single RGB camera is very challenging due to the lack of information in determining the accurate gaze target from a monocular RGB sequence. The eyes tend to occupy only a small portion of the video, and even small errors in estimated eye orientations can lead to very large errors in the triangulated gaze target. We overcome these difficulties with a novel lightweight eyeball calibration scheme that determines the user-specific visual axis, eyeball size and position in the head. Unlike the previous calibration techniques, we do not need the ground truth positions of the gaze points. In the online stage, gaze is tracked by a new gaze fitting algorithm, and refined by a 3D gaze regression method to correct for bias errors. Our regression is pre-trained on several individuals and works well for novel users. After the lightweight one-time user calibration, our method operates in real time. Experiments show that our technique achieves state-of-the-art accuracy in gaze angle estimation, and we demonstrate applications of 3D gaze target tracking and gaze retargeting to an animated 3D character. 相似文献
12.
Three dimensional (3D) displays typically rely on stereo disparity, requiring specialized hardware to be worn or embedded in the display. We present a novel 3D graphics display system for volumetric scene visualization using only standard 2D display hardware and a pair of calibrated web cameras. Our computer vision-based system requires no worn or other special hardware. Rather than producing the depth illusion through disparity, we deliver a full volumetric 3D visualization—enabling users to interactively explore 3D scenes by varying their viewing position and angle according to the tracked 3D position of their face and eyes. We incorporate a novel wand-based calibration that allows the cameras to be placed at arbitrary positions and orientations relative to the display. The resulting system operates at real-time speeds (~25 fps) with low latency (120–225 ms) delivering a compelling natural user interface and immersive experience for 3D viewing. In addition to objective evaluation of display stability and responsiveness, we report on user trials comparing users’ timings on a spatial orientation task. 相似文献
13.
M. Zollhöfer J. Thies P. Garrido D. Bradley T. Beeler P. Pérez M. Stamminger M. Nießner C. Theobalt 《Computer Graphics Forum》2018,37(2):523-550
The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB‐D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use. Motivated by this rapid progress, this state‐of‐the‐art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance‐based animation to real‐time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization‐based reconstruction algorithms. We provide an in‐depth overview of the underlying concepts of real‐world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under‐constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo‐geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing. 相似文献
14.
自适应UKF算法在目标跟踪中的应用 总被引:14,自引:0,他引:14
针对目标跟踪中系统噪声统计特性未知导致滤波发散或者滤波精度不高的问题, 提出了一种自适应无迹卡尔曼滤波(Unscented Kalman filter, UKF)算法.该算法在滤波过程中,利用改进的Sage-Husa估 计器在线估计未知系统噪声的统计特性,并对滤波发散的情况进行判断和抑制, 有效提高了滤波的数值稳定性,减小了状态估计误差. 仿真实验结果表明,与标准UKF算法相比,自适应UKF算法明显改善了目标跟踪的精度和稳定性. 相似文献
15.
为了对标记点丢失的多幅自标定图像进行精确重建,提出了一种基于标记点丢失的多幅自标定图像的3维重建和相机姿态恢复的方法。该方法与原来方法的不同之处在于,该方法是利用标记点(编码点和非编码点)的方式,即用编码点进行单CCD相机的自标定和姿态恢复,而用非编码点进行3维点的3维重建。该方法有以下3个主要特点:(1)由于该方法采用了标记点的自动识别匹配,所以避免了手工交互选择图像点对(point correspondences)费工费时的问题;(2)由于标记点匹配精确,提高了3维点的重建精度,故符合工程要求;(3)由于噪音对标记点的像点影响较小,因此该方法比以前的方法具有更好的鲁棒性。实验结果表明,利用该方法产生的3维重建点精确可靠,能够满足逆向工程等应用的要求。 相似文献
16.
Nicolas Lehment Moritz Kaiser Gerhard Rigoll 《International Journal of Computer Vision》2013,101(3):482-497
The observation likelihood approximation is a central problem in stochastic human pose tracking. In this article we present a new approach to quantify the correspondence between hypothetical and observed human poses in depth images. Our approach is based on segmented point clouds, enabling accurate approximations even under conditions of self-occlusion and in the absence of color or texture cues. The segmentation step extracts small regions of high saliency such as hands or arms and ensures that the information contained in these regions is not marginalized by larger, less salient regions such as the chest. To enable the rapid, parallel evaluation of many poses, a fast ellipsoid body model is used which handles occlusion and intersection detection in an integrated manner. The proposed approximation function is evaluated on both synthetic and real camera data. In addition, we compare our approximation function against the corresponding function used by a state-of-the-art pose tracker. The approach is suitable for parallelization on GPUs or multicore CPUs. 相似文献
17.
We present a method for automatically estimating the motion of an articulated object filmed by two or more fixed cameras. We focus our work on the case where the quality of the images is poor, and where only an approximation of a geometric model of the tracked object is available. Our technique uses physical forces applied to each rigid part of a kinematic 3D model of the object we are tracking. These forces guide the minimization of the differences between the pose of the 3D model and the pose of the real object in the video images. We use a fast recursive algorithm to solve the dynamical equations of motion of any 3D articulated model. We explain the key parts of our algorithms: how relevant information is extracted from the images, how the forces are created, and how the dynamical equations of motion are solved. A study of what kind of information should be extracted in the images and of when our algorithms fail is also presented. Finally we present some results about the tracking of a person. We also show the application of our method to the tracking of a hand in sequences of images, showing that the kind of information to extract from the images depends on their quality and of the configuration of the cameras. 相似文献
18.
提出了一种稳定、快速地获取摄像机视频运动图像的三维重建方法,并对该运动图像做适当的虚拟化处理以展示重建效果。采用基于尺度不变特征点匹配的摄像机标定进行三维重建。尺度不变特征对于视频图像中的特征具有优秀敏锐的匹配能力,极大地放宽了摄像机标定对于设备上的限制,拓宽了实时三维重建的适用范围。通过对系统的一系列优化,不但提升了三维重建的精度,减少了错误匹配对摄像机标定的影响,而且进一步提升了处理速度。通过在三维重建的基础之上进行虚拟化处理,展示了本系统的三维重建效果。实验结果表明,该系统适用范围广,处理速度较快,重建精度高,实现了基于视频运动图像的三维重建。 相似文献
19.
逻辑推理是对身份认证进行形式化研究的重要手段,但现有研究成果主要集中在认证机制、认证协议等单个方面,并不考虑应用环境,通过引入身份认证域,对应用系统中身份认证进行形式化描述;在此基础上,提出一种基于谓词的身份认证建模及推理方法,包括7种谓词、8个推理规则和一种4步骤推理方法等,并对基于静态口令、动态口令和数字证书的身份认证模型进行实例分析. 相似文献
20.
Tracking and visualizing turbulent 3D features 总被引:2,自引:0,他引:2
Visualizing 3D time-varying fluid datasets is difficult because of the immense amount of data to be processed and understood. These datasets contain many evolving amorphous regions, and it is difficult to observe patterns and visually follow regions of interest. In this paper, we present a technique which isolates and tracks full-volume representations of regions of interest from 3D regular and curvilinear computational fluid dynamics datasets. Connected voxel regions (“features”) are extracted from each time step and matched to features in subsequent time steps. Spatial overlap is used to determine the matching. The features from each time step are stored in octree forests to speed up the matching process. Once the features have been identified and tracked, the properties of the features and their evolutionary history can be computed. This information can be used to enhance isosurface visualization and volume rendering by color coding individual regions. We demonstrate the algorithm on four 3D time-varying simulations from ongoing research in computational fluid dynamics and show how tracking can significantly improve and facilitate the processing of massive datasets 相似文献