首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recent methods for 2D facial landmark localization perform well on close-to-frontal faces, but 2D landmarks are insufficient to represent 3D structure of a facial shape. For applications that require better accuracy, such as facial motion capture and 3D shape recovery, 3DA-2D (2D Projections of 3D Facial Annotations) is preferred. Inferring the 3D structure from a single image is an ill-posed problem whose accuracy and robustness are not always guaranteed. This paper aims to solve accurate 2D facial landmark localization and the transformation between 2D and 3DA-2D landmarks. One way to increase the accuracy is to input more precisely annotated facial images. The traditional cascaded regressions cannot effectively handle large or noisy training data sets. In this paper, we propose a Mini-Batch Cascaded Regressions (MBCR) method that can iteratively train a robust model from a large data set. Benefiting from the incremental learning strategy and a small learning rate, MBCR is robust to noise in training data. We also propose a new Cross-Dimension Annotations Conversion (CDAC) method to map facial landmarks from 2D to 3DA-2D coordinates and vice versa. The experimental results showed that CDAC combined with MBCR outperforms the-state-of-the-art methods in 3DA-2D facial landmark localization. Moreover, CDAC can run efficiently at up to 110 fps on a 3.4 GHz-CPU workstation. Thus, CDAC provides a solution to transform existing 2D alignment methods into 3DA-2D ones without slowing down the speed. Training and testing code as well as the data set can be downloaded from https://github.com/SWJTU-3DVision/CDAC.  相似文献   

2.
Liu  Feng  Chen  Zhigang  Wang  Jie 《Multimedia Tools and Applications》2019,78(4):4527-4544

Traditional image object classification and detection algorithms and strategies cannot meet the problem of video image acquisition and processing. Deep learning deliberately simulates the hierarchical structure of human brain, and establishes the mapping from low-level signals to high-level semantics, so as to achieve hierarchical feature representation of data. Deep learning technology has powerful visual information processing ability, which has become the forefront technology and domestic and international research hotspots to deal with this challenge. In order to solve the problem of target space location in video surveillance system, time-consuming and other problems, in this paper, we propose the algorithm based on RNN-LSTM deep learning. At the same time, according to the principle of OpenGL perspective imaging and photogrammetry consistency, we use 3D scene simulation imaging technology, relying on the corresponding relationship between video images and simulation images we locate the target object. In the 3D virtual scene, we set up the virtual camera to simulate the imaging processing of the actual camera, and the pixel coordinates in the video image of the surveillance target are substituted into the simulation image, next, the spatial coordinates of the target are inverted by the inverse process of the virtual imaging. The experimental results show that the detection of target objects has high accuracy, which has an important reference value for outdoor target localization through video surveillance images.

  相似文献   

3.
In current visual SLAM methods, point-like landmarks (As in Filliat and Meyer (Cogn Syst Res 4(4):243–282, 2003), we use this expression to denote a landmark generated by a point or an object considered as punctual.) are used for representation on maps. As the observation of each point-like landmark gives only angular information about a bearing camera, a covariance matrix between point-like landmarks must be estimated in order to converge with a global scale estimation. However, as the computational complexity of covariance matrices scales in a quadratic way with the number of landmarks, the maximum number of landmarks that is possible to use is normally limited to a few hundred. In this paper, a visual SLAM system based on the use of what are called rigid-body 3D landmarks is proposed. A rigid-body 3D landmark represents the 6D pose of a rigid body in space (position and orientation), and its observation gives full-pose information about a bearing camera. Each rigid-body 3D landmark is created from a set of N point-like landmarks by collapsing 3N state components into seven state components plus a set of parameters that describe the shape of the landmark. Rigid-body 3D landmarks are represented and estimated using so-called point-quaternions, which are introduced here. By using rigid-body 3D landmarks, the computational time of an EKF-SLAM system can be reduced up to 5.5%, as the number of landmarks increases. The proposed visual SLAM system is validated in simulated and real video sequences (outdoor). The proposed methodology can be extended to any SLAM system based on the use of point-like landmarks, including those generated by laser measurement.  相似文献   

4.
In this paper we propose a system for localization of cephalometric landmarks. The process of localization is carried out in two steps: deriving a smaller expectation window for each landmark using a trained neuro-fuzzy system (NFS) then applying a template-matching algorithm to pin point the exact location of the landmark. Four points are located on each image using edge detection. The four points are used to extract more features such as distances, shifts and rotation angles of the skull. Limited numbers of representative groups that will be used for training are selected based on k-means clustering. The most effective features are selected based on a Fisher discriminant for each feature set. Using fuzzy linguistics if-then rules, membership degree is assigned to each of the selected features and fed to the FNS. The FNS is trained, utilizing gradient descent, to learn the relation between the sizes, rotations and translations of landmarks and their locations. The data for training is obtained manually from one image from each cluster. Images whose features are located closer to the center of their cluster are used for extracting data for the training set. The expected locations on target images can then be predicted using the trained FNS. For each landmark a parametric template space is constructed from a set of templates extracted from several images based on the clarity of the landmark in that image. The template is matched to the search windows to find the exact location of the landmark. Decomposition of landmark shapes is used to desensitize the algorithm to size differences. The system is trained to locate 20 landmarks on a database of 565 images. Preliminary results show a recognition rate of more than 90%.  相似文献   

5.
In this paper, we propose to study the integration of a new source of a priori information, which is the virtual 3D city model. We study this integration for two tasks: vehicles geo-localization and obstacles detection. A virtual 3D city model is a realistic representation of the evolution environment of a vehicle. It is a database of geographical and textured 3D data. We describe an ego-localization method that combines measurements of a GPS (Global Positioning System) receiver, odometers, a gyrometer, a video camera and a virtual 3D city model. GPS is often consider as the main sensor for localization of vehicles. But, in urban areas, GPS is not precise or even can be unavailable. So, GPS data are fused with odometers and gyrometer measurements using an Unscented Kalman Filter (UKF). However, during long GPS unavailability, localization with only odometers and gyrometer drift. Thus, we propose a new observation of the location of the vehicle. This observation is based on the matching between the current image acquired by an on-board camera and the virtual 3D city model of the environment. We also propose an obstacle detection method based on the comparison between the image acquired by the on-board camera and the image extracted from the 3D model. The following principle is used: the image acquired by the on-board camera contains the possible dynamic obstacles whereas they are absent from the 3D model. The two proposed concepts are tested on real data.  相似文献   

6.
An algorithm for accurate localization of facial landmarks coupled with a head pose estimation from a single monocular image is proposed. The algorithm is formulated as an optimization problem where the sum of individual landmark scoring functions is maximized with respect to the camera pose by fitting a parametric 3D shape model. The landmark scoring functions are trained by a structured output SVM classifier that takes a distance to the true landmark position into account when learning. The optimization criterion is non-convex and we propose a robust initialization scheme which employs a global method to detect a raw but reliable initial landmark position. Self-occlusions causing landmarks invisibility are handled explicitly by excluding the corresponding contributions from the data term. This allows the algorithm to operate correctly for a large range of viewing angles. Experiments on standard “in-the-wild” datasets demonstrate that the proposed algorithm outperforms several state-of-the-art landmark detectors especially for non-frontal face images. The algorithm achieves the average relative landmark localization error below 10% of the interocular distance in 98.3% of the 300 W dataset test images.  相似文献   

7.
In this paper, we propose a real-time vision-based localization approach for humanoid robots using a single camera as the only sensor. In order to obtain an accurate localization of the robot, we first build an accurate 3D map of the environment. In the map computation process, we use stereo visual SLAM techniques based on non-linear least squares optimization methods (bundle adjustment). Once we have computed a 3D reconstruction of the environment, which comprises of a set of camera poses (keyframes) and a list of 3D points, we learn the visibility of the 3D points by exploiting all the geometric relationships between the camera poses and 3D map points involved in the reconstruction. Finally, we use the prior 3D map and the learned visibility prediction for monocular vision-based localization. Our algorithm is very efficient, easy to implement and more robust and accurate than existing approaches. By means of visibility prediction we predict for a query pose only the highly visible 3D points, thus, speeding up tremendously the data association between 3D map points and perceived 2D features in the image. In this way, we can solve very efficiently the Perspective-n-Point (PnP) problem providing robust and fast vision-based localization. We demonstrate the robustness and accuracy of our approach by showing several vision-based localization experiments with the HRP-2 humanoid robot.  相似文献   

8.
透视投影下三维运动重建   总被引:4,自引:1,他引:4  
单相机下,三维信息的恢复往往呈现出病态,如何在透视投影约束下恢复高质量的三维人体运动信息是计算机视觉领域一个具有挑战性的课题。提出一个扩展模型,即如何从起点出发,依次扩展到得三维空间中人体关节点的位置,从而恢复出运动信息。相对于已有的算法,提出在搜索空间中用一组最优准则得到连续平滑的扩展起点,并且在扩展过程中利用全局运动平滑假设,有效地消除了深度信息恢复时固有的二义性。最后,将结果运用于动画制作,验证提出的算法的有效性。  相似文献   

9.
To enable real-time, person-independent 3D registration from 2D video, we developed a 3D cascade regression approach in which facial landmarks remain invariant across pose over a range of approximately 60°. From a single 2D image of a person's face, a dense 3D shape is registered in real time for each frame. The algorithm utilizes a fast cascade regression framework trained on high-resolution 3D face-scans of posed and spontaneous emotion expression. The algorithm first estimates the location of a dense set of landmarks and their visibility, then reconstructs face shapes by fitting a part-based 3D model. Because no assumptions are required about illumination or surface properties, the method can be applied to a wide range of imaging conditions that include 2D video and uncalibrated multi-view video. The method has been validated in a battery of experiments that evaluate its precision of 3D reconstruction, extension to multi-view reconstruction, temporal integration for videos and 3D head-pose estimation. Experimental findings strongly support the validity of real-time, 3D registration and reconstruction from 2D video. The software is available online at http://zface.org.  相似文献   

10.
Recovering 3D human body configurations using shape contexts   总被引:3,自引:0,他引:3  
The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process would succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently - tracking just becomes repeated recognition. We present results on a variety of data sets.  相似文献   

11.
This paper presents a 3D contour reconstruction approach employing a wheeled mobile robot equipped with an active laser‐vision system. With observation from an onboard CCD camera, a laser line projector fixed‐mounted below the camera is used for detecting the bottom shape of an object while an actively‐controlled upper laser line projector is utilized for 3D contour reconstruction. The mobile robot is driven to move around the object by a visual servoing and localization technique while the 3D contour of the object is being reconstructed based on the 2D image of the projected laser line. Asymptotical convergence of the closed‐loop system has been established. The proposed algorithm also has been used experimentally with a Dr Robot X80sv mobile robot upgraded with the low‐cost active laser‐vision system, thereby demonstrating effective real‐time performance. This seemingly novel laser‐vision robotic system can be applied further in unknown environments for obstacle avoidance and guidance control tasks. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

12.
We present a novel approach to the real-time non-photorealistic rendering of 3D models in which a single hand-drawn exemplar specifies its appearance. We employ guided patch-based synthesis to achieve high visual quality as well as temporal coherence. However, unlike previous techniques that maintain consistency in one dimension (temporal domain), in our approach, multiple dimensions are taken into account to cover all degrees of freedom given by the available space of interactions (e.g., camera rotations). To enable interactive experience, we precalculate a sparse latent representation of the entire interaction space, which allows rendering of a stylized image in real-time, even on a mobile device. To the best of our knowledge, the proposed system is the first that enables interactive example-based stylization of 3D models with full temporal coherence in predefined interaction space.  相似文献   

13.
14.
In computer vision and image analysis, image registration between 2D projections and a 3D image that achieves high accuracy and near real-time computation is challenging. In this paper, we propose a novel method that can rapidly detect an object’s 3D rigid motion or deformation from a 2D projection image or a small set thereof. The method is called CLARET (Correction via Limited-Angle Residues in External Beam Therapy) and consists of two stages: registration preceded by shape space and regression learning. In the registration stage, linear operators are used to iteratively estimate the motion/deformation parameters based on the current intensity residue between the target projection(s) and the digitally reconstructed radiograph(s) (DRRs) of the estimated 3D image. The method determines the linear operators via a two-step learning process. First, it builds a low-order parametric model of the image region’s motion/deformation shape space from its prior 3D images. Second, using learning-time samples produced from the 3D images, it formulates the relationships between the model parameters and the co-varying 2D projection intensity residues by multi-scale linear regressions. The calculated multi-scale regression matrices yield the coarse-to-fine linear operators used in estimating the model parameters from the 2D projection intensity residues in the registration. The method’s application to Image-guided Radiation Therapy (IGRT) requires only a few seconds and yields good results in localizing a tumor under rigid motion in the head and neck and under respiratory deformation in the lung, using one treatment-time imaging 2D projection or a small set thereof.  相似文献   

15.
Given an Internet photo collection of a landmark, we compute a 3D time-lapse video sequence where a virtual camera moves continuously in time and space. While previous work assumed a static camera, the addition of camera motion during the time-lapse creates a very compelling impression of parallax. Achieving this goal, however, requires addressing multiple technical challenges, including solving for time-varying depth maps, regularizing 3D point color profiles over time, and reconstructing high quality, hole-free images at every frame from the projected profiles. Our results show photorealistic time-lapses of skylines and natural scenes over many years, with dramatic parallax effects.  相似文献   

16.
In wearable visual computing, maintaining a time-evolving representation of the 3D environment along with the pose of the camera provides the geometrical foundation on which person-centred processing can be built. In this paper, an established method for the recognition of feature clusters is used on live imagery to identify and locate planar objects around the wearer. Objects’ locations are incorporated as additional 3D measurements into a monocular simultaneous localization and mapping process, which routinely uses 2D image measurements to acquire and maintain a map of the surroundings, irrespective of whether objects are present or not. Augmenting the 3D maps with automatically recognized objects enables useful annotations of the surroundings to be presented to the wearer. After demonstrating the geometrical integrity of the method, experiments show its use in two augmented reality applications.  相似文献   

17.
Online shopping has become quite popular since its first arrival on the internet. Although numerous studies have been performed to investigate various issues related to the internet store, some research issues relating to the spatial cognition of the elderly (the fastest growing internet group) when exploring a 3D virtual store still await further empirical investigation. The objective of this study was to examine how elderly users acquire spatial knowledge in an on-screen virtual store. Specifically, the impact of different types of landmarks on the acquisition of spatial knowledge was examined. In addition, in this study, goods-classification was seen as an implicit landmark associated with the acquisition of spatial knowledge. Therefore, it is worth observing the impact during the location of the goods and examining the combined effect with landmarks. The experimental results indicated that landmarks are important for the elderly as they attempt to locate goods within a 3D virtual store, no matter what types are used. However, landmarks are not the only resources for constructing spatial knowledge in a 3D virtual store; the classification of goods is also a good resource and may be more important than landmarks. In addition, the combined effect of goods-classification and landmarks in a 2D image would be best for the elderly in terms of acquired spatial cognition and the location of goods within a 3D virtual store.  相似文献   

18.
In this paper, we present a 3D face photography system based on a facial expression training dataset, composed of both facial range images (3D geometry) and facial texture (2D photography). The proposed system allows one to obtain a 3D geometry representation of a given face provided as a 2D photography, which undergoes a series of transformations through the texture and geometry spaces estimated. In the training phase of the system, the facial landmarks are obtained by an active shape model (ASM) extracted from the 2D gray-level photography. Principal components analysis (PCA) is then used to represent the face dataset, thus defining an orthonormal basis of texture and another of geometry. In the reconstruction phase, an input is given by a face image to which the ASM is matched. The extracted facial landmarks and the face image are fed to the PCA basis transform, and a 3D version of the 2D input image is built. Experimental tests using a new dataset of 70 facial expressions belonging to ten subjects as training set show rapid reconstructed 3D faces which maintain spatial coherence similar to the human perception, thus corroborating the efficiency and the applicability of the proposed system.  相似文献   

19.
20.
We investigate the localization of a camera subject to a planar motion with horizontal optical axis in the presence of known vertical landmarks. Under these assumptions, a calibrated camera can measure the distance to the viewed landmarks. We propose to replace the trilateration method by intersecting a pair of Chasles-Apollonius circles. In the case of square pixels but unknown focal length we introduce a new method to recover the camera position from one image with three vertical landmarks. To this end we consider virtual landmarks and Apollonius-like circles. We extend this method in order to deal with an unknown principal point by using four landmarks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号