首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A real-time algorithm for affine-structure-based video compression for facial images is presented. The face undergoing motion is segmented and triangulated to yield a set of control points. The set of control points generated by triangulation are tracked across a few frames using an intensity-based correlation technique. For accurate motion and structure estimation a Kalman-filter-based algorithm is used to track features on the facial image. The structure information of the control points is transmitted only during the bootstrapping stage. After that only the motion information is transmitted to the decoder. This reduces the number of motion parameters associated with control points in each frame. The local motion of the eyes and lips is captured using local 2-D affine transformations. For real time implementation a quad-tree based search technique is adopted to solve local correlation. Any remaining reconstruction error is accounted for using predictive encoding. Results on real image sequences demonstrate the applicability of the method  相似文献   

2.
平面旋转人脸检测与特征定位方法研究   总被引:4,自引:1,他引:3       下载免费PDF全文
吴暾华  周昌乐 《电子学报》2007,35(9):1714-1718
提出了一种基于角点检测、AdaBoost算法和C-V方法的平面旋转人脸检测及特征定位方法.方法首先根据AdaBoost算法训练样本得到脸、眼、鼻、嘴4个检测器;然后以角点作为眼睛的候选点,枚举任意两个角点构造可能的人脸区域,并在区域内运用人脸检测器进行检测;接着利用眼、鼻、嘴检测器检测出人脸特征所在的矩形区域;最后利用C-V方法从各个特征区域中分割出人脸特征的轮廓,进而得到人脸关键特征点的位置.在CMU平面旋转测试集上的检测率为94.6%,误报24个,提取出的特征点位置准确.实验结果表明方法是有效的.  相似文献   

3.
In this paper, we present a probabilistic approach to determining whether extracted facial features from a video sequence are appropriate for creating a 3D face model. In our approach, the distance between two feature points selected from the MPEG‐4 facial object is defined as a random variable for each node of a probability network. To avoid generating an unnatural or non‐realistic 3D face model, automatically extracted 2D facial features from a video sequence are fed into the proposed probabilistic network before a corresponding 3D face model is built. Simulation results show that the proposed probabilistic network can be used as a quality control agent to verify the correctness of extracted facial features.  相似文献   

4.
基于肤色检测的快速五官定位算法   总被引:3,自引:1,他引:2  
根据视频应用的特点,结合人脸的肤色和特征部位几何分布特征,提出了一种应用于视频序列人脸部位的五官定位算法。实验表明,该算法定位速度快,误检率低。  相似文献   

5.
Automatic facial feature extraction by genetic algorithms   总被引:12,自引:0,他引:12  
An automatic facial feature extraction algorithm is presented. The algorithm is composed of two main stages: the face region estimation stage and the feature extraction stage. In the face region estimation stage, a second-chance region growing method is adopted to estimate the face region of a target image. In the feature extraction stage, genetic search algorithms are applied to extract the facial feature points within the face region. It is shown by simulation results that the proposed algorithm can automatically and exactly extract facial features with limited computational complexity.  相似文献   

6.
Intelligently tracking objects with varied shapes, color, lighting conditions, and backgrounds is an extremely useful application in many HCI applications, such as human body motion capture, hand gesture recognition, and virtual reality (VR) games. However, accurately tracking different objects under uncontrolled environments is a tough challenge due to the possibly dynamic object parts, varied lighting conditions, and sophisticated backgrounds. In this work, we propose a novel semantically-aware object tracking framework, wherein the key is weakly-supervised learning paradigm that optimally transfers the video-level semantic tags into various regions. More specifically, give a set of training video clips, each of which is associated with multiple video-level semantic tags, we first propose a weakly-supervised learning algorithm to transfer the semantic tags into various video regions. The key is a MIL (Zhong et al., 2020) [1]-based manifold embedding algorithm that maps the entire video regions into a semantic space, wherein the video-level semantic tags are well encoded. Afterward, for each video region, we use the semantic feature combined with the appearance feature as its representation. We designed a multi-view learning algorithm to optimally fuse the above two types of features. Based on the fused feature, we learn a probabilistic Gaussian mixture model to predict the target probability of each candidate window, where the window with the maximal probability is output as the tracking result. Comprehensive comparative results on a challenging pedestrian tracking task as well as the human hand gesture recognition have demonstrated the effectiveness of our method. Moreover, visualized tracking results have shown that non-rigid objects with moderate occlusions can be well localized by our method.  相似文献   

7.
Scalable low bit-rate video coding is vital for the transmission of video signals over wireless channels. A scalable model-based video coding scheme is proposed in this paper to achieve this. This paper mainly addresses automatic scalable face model design. Firstly, a robust and adaptive face segmentation method is proposed, which is based on piecewise skin-colour distributions. 43 million skin pixels from 900 images are used to train the skin-colour model, which can identify skin-colour pixels reliably under different lighting conditions. Next, reliable algorithms are proposed for detecting the eyes, mouth and chin that are used to verify the face candidatures. Then, based on the detected facial features and human face muscular distributions, a heuristic scalable face model is designed to represent the rigid and non-rigid motion of head and facial features. A novel motion estimation algorithm is proposed to estimate the object model motion hierarchically. Experimental results are provided to illustrate the performance of the proposed algorithms for facial feature detection and the accuracy of the designed scalable face model for representing face motion.  相似文献   

8.
Attention modules embedded in deep networks mediate the selection of informative regions for object recognition. In addition, the combination of features learned from different branches of a network can enhance the discriminative power of these features. However, fusing features with inconsistent scales is a less-studied problem. In this paper, we first propose a multi-scale channel attention network with an adaptive feature fusion strategy (MSCAN-AFF) for face recognition (FR), which fuses the relevant feature channels and improves the network’s representational power. In FR, face alignment is performed independently prior to recognition, which requires the efficient localization of facial landmarks, which might be unavailable in uncontrolled scenarios such as low-resolution and occlusion. Therefore, we propose utilizing our MSCAN-AFF to guide the Spatial Transformer Network (MSCAN-STN) to align feature maps learned from an unaligned training set in an end-to-end manner. Experiments on benchmark datasets demonstrate the effectiveness of our proposed MSCAN-AFF and MSCAN-STN.  相似文献   

9.
We present a novel and practical way to integrate techniques from computer vision to low bit-rate coding systems for video teleconferencing applications. Our focus is to locate and track the faces and selected facial features of persons in typical head-and-shoulders video sequences, and to exploit the location information in a ‘classical’ video coding/decoding system. The motivation is to enable the system to encode selectively various image areas and to produce perceptually pleasing coded images where faces are sharper. We refer to this approach—a mix of classical waveform coding and model—based coding-as model-assisted coding. We propose two totally automatic algorithms which, respectively, perform the detection of a head outline, and identify an ‘eyes-nose-mouth’ region, both from downsampled binary thresholded edge images. The algorithms operate accurately and robustly, even in cases of significant head rotation or partial occlusion by moving objects. We show how the information about face and facial feature location can be advantageously exploited by low bit-rate waveform-based video coders. In particular, we describe a method of object-selective quantizer control in a standard coding system based on motion-compensated discrete cosine transform—CCITT's recommendation H.261. The approach is based on two novel algorithms, namely buffer rate modulation and buffer size modulation. By forcing the rate control algorithm to transfer a fraction of the total available bit-rate from the coding of the non-facial to that of the facial area, the coder produces images with better-rendered facial features, i.e. coding artefacts in the facial area are less pronounced and eye contact is preserved. The improvement was found to be perceptually significant on video sequences coded at the ISDN rate of 64 kbps, with 48 kbps for the input (color) video signal in QCIF format.  相似文献   

10.
The author's goal is to generate a virtual space close to the real communication environment between network users or between humans and machines. There should be an avatar in cyberspace that projects the features of each user with a realistic texture-mapped face to generate facial expression and action controlled by a multimodal input signal. Users can also get a view in cyberspace through the avatar's eyes, so they can communicate with each other by gaze crossing. The face fitting tool from multi-view camera images is introduced to make a realistic three-dimensional (3-D) face model with texture and geometry very close to the original. This fitting tool is a GUI-based system using easy mouse operation to pick up each feature point on a face contour and the face parts, which can enable easy construction of a 3-D personal face model. When an avatar is speaking, the voice signal is essential in determining the mouth shape feature. Therefore, a real-time mouth shape control mechanism is proposed by using a neural network to convert speech parameters to lip shape parameters. This neural network can realize an interpolation between specific mouth shapes given as learning data. The emotional factor can sometimes be captured by speech parameters. This media conversion mechanism is described. For dynamic modeling of facial expression, a muscle structure constraint is introduced for making a facial expression naturally with few parameters. We also tried to obtain muscle parameters automatically from a local motion vector on the face calculated by the optical flow in a video sequence  相似文献   

11.
12.
Automatic semantic video object extraction is an important step for providing content-based video coding, indexing and retrieval. However, it is very difficult to design a generic semantic video object extraction technique, which can provide variant semantic video objects by using the same function. Since the presence and absence of persons in an image sequence provide important clues about video content, automatic face detection and human being generation are very attractive for content-based video database applications. For this reason, we propose a novel face detection and semantic human object generation algorithm. The homogeneous image regions with accurate boundaries are first obtained by integrating the results of color edge detection and region growing procedures. The human faces are detected from these homogeneous image regions by using skin color segmentation and facial filters. These detected faces are then used as object seed for semantic human object generation. The correspondences of the detected faces and semantic human objects along time axis are further exploited by a contour-based temporal tracking procedure.  相似文献   

13.
A 3D facial reconstruction and expression modeling system which creates 3D video sequences of test subjects and facilitates interactive generation of novel facial expressions is described. Dynamic 3D video sequences are generated using computational binocular stereo matching with active illumination and are used for interactive expression modeling. An individual’s 3D video set is annotated with control points associated with face subregions. Dragging a control point updates texture and depth in only the associated subregion so that the user generates new composite expressions unseen in the original source video sequences. Such an interactive manipulation of dynamic 3D face reconstructions requires as little preparation on the test subject as possible. Dense depth data combined with video-based texture results in realistic and convincing facial animations, a feature lacking in conventional marker-based motion capture systems.  相似文献   

14.
为改善复杂光照条件下的多姿状鲁棒性人脸识别的效果,提出了小波变换与LBP的多姿状鲁棒性人脸识别方法。通过二维离散小波变换对人脸图像进行二级小波分解提取到低频特征信息分量,并以重构初始图像的方式实现降噪滤波处理,滤除低频光照分量后完成复杂光照补偿;继续分解复杂光照补偿后的图像,采用LBP算子对子图像的鲁棒性部分纹理特征进行描述后,提取出人脸图像各子图像的直方图特征并连接,得到人脸LBP纹理特征,通过统计法运算该特征距离,并通过K近邻分类器实现人脸特征分类识别。以Yale-B与AR人脸库为测试对象,结果表明,所研究方法对复杂光照鲁棒性较强,识别人脸的准确率与效率较高,整体识别效果较好。  相似文献   

15.
The present paper describes a novel method for the segmentation of faces, extraction of facial features and tracking of the face contour and features over time. Robust segmentation of faces out of complex scenes is done based on color and shape information. Additionally, face candidates are verified by searching for facial features in the interior of the face. As interesting facial features we employ eyebrows, eyes, nostrils, mouth and chin. We consider incomplete feature constellations as well. If a face and its features are detected once reliably, we track the face contour and the features over time. Face contour tracking is done by using deformable models like snakes. Facial feature tracking is performed by block matching. The success of our approach was verified by evaluating 38 different color image sequences, containing features as beard, glasses and changing facial expressions.  相似文献   

16.
Active Shape Model (ASM) is a powerful statistical tool to extract the facial features of a face image under frontal view. It mainly relies on Principle Component Analysis (PCA) to statistically model the variability in the training set of example shapes. Independent Component Analysis (ICA) has been proven to be more efficient to extract face features than PCA. In this paper, we combine the PCA and ICA by the consecutive strategy to form a novel ASM. Firstly, an initial model, which shows the global shape variability in the training set, is generated by the PCA-based ASM. And then, the final shape model, which contains more local characters, is established by the ICA-based ASM. Experimental results verify that the accuracy of facial feature extraction is statistically significantly improved by applying the ICA modes after the PCA modes.  相似文献   

17.
Reliable tracking of facial features in semantic-based video coding   总被引:1,自引:0,他引:1  
A new method of tracking the position of important facial features for semantic-based moving image coding is presented. Reliable and fast tracking of the facial features in head-and-shoulders scenes is of paramount importance for reconstruction of the speakers motion in videophone systems. The proposed method is based on eigenvalue decomposition of the sub-images extracted from subsequent frames of the video sequence. The motion of each facial feature (the left eye, the right eye, the nose and the lips) is tracked separately; this means that the algorithm can be easily adapted for a parallel machine. No restrictions, other than the presence of the speaker's face, were imposed on the actual contents of the scene. The algorithm was tested on numerous widely used head-and-shoulders video sequences containing moderate head pan, rotation and zoom, with remarkably good results. Tracking was maintained even when the facial features were occluded. The algorithm can also be used in other semantic-based systems  相似文献   

18.
提出了一种统一特征模型,集成了2种新型的基于主颜色分割与运动矢量可信度分析的中层特征.这一模型根据主颜色分割结果自适应地采用不同特征和相应的分类界面,通过时域统计特征的使用在支持向量机中自适应地确定分类阈值.通过在较大的足球视频数据集上实验表明,本文算法对切变和渐变的检测都超过了已有算法,基本达到体育视频检索的应用要求.  相似文献   

19.
基于双激活层深度卷积特征的人脸美丽预测研究   总被引:2,自引:0,他引:2       下载免费PDF全文
目前,人脸美丽预测存在数据规模小、分类难度大、深度特征研究不足等问题.为此,本文提出基于双激活层深度卷积特征的人脸美丽预测研究的解决方案.首先,采用数据增强和人脸对齐方法来增加训练集的样本数量和提高数据库的数据质量.其次,提出一种双激活层改进CNN模型,使其更适合人脸美丽预测应用.实验结果表明,本文所提方法在分类和回归预测方面均大幅度优于传统人脸美丽预测方法;同时,在主流的CNN模型中取得了较好的实时性和准确性,基于2000测试集的分类准确率达到61.1%,回归相关度达到0.8546.因此,双激活层在深层人脸美丽特征学习中发挥了重要作用,可广泛应用于人脸图像识别与处理.  相似文献   

20.
随着视频监控设备的广泛应用,行人再识别成为智能视频监控中的关键任务,具有广阔的应用前景。该文提出一种基于深度分解网络前景提取和映射模型学习的行人再识别算法。首先利用DDN模型对行人图像进行前景分割,然后提取前景图像的颜色直方图特征和原图像的Gabor纹理特征,利用提取的行人特征,学习不同摄像机之间的交叉映射模型,最后通过学习的映射模型将查寻集和候选集中的行人特征变换到一个特征分布较为一致的空间中,进行距离度量和排序。实验证明该算法能够提取较为鲁棒的行人特征,可克服背景干扰问题,行人再识别匹配率得到有效的提高。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号