首页 | 本学科首页   官方微博 | 高级检索  
     

多任务学习和对抗学习结合的自发与非自发表情识别
引用本文:郑壮强,姜其胜,王上飞. 多任务学习和对抗学习结合的自发与非自发表情识别[J]. 中国图象图形学报, 2020, 25(11): 2370-2379
作者姓名:郑壮强  姜其胜  王上飞
作者单位:中国科学技术大学计算机科学与技术学院, 合肥 230027;中国科学技术大学大数据学院, 合肥 230027
基金项目:安徽省重点研发计划项目(1804a09020038)
摘    要:目的 如何提取与个体身份无关的面部特征以及建模面部行为的时空模式是自发与非自发表情识别的核心问题,然而现有的自发与非自发表情识别工作尚未同时兼顾两者。针对此,本文提出多任务学习和对抗学习结合的自发与非自发表情识别方法,通过多任务学习和对抗学习捕获面部行为的时空模式以及与学习身份无关的面部特征,实现有效的自发与非自发表情区分。方法 所提方法包括4部分:特征提取器、多任务学习器、身份判别器以及多任务判别器。特征提取器用来获取与自发和非自发表情相关的特征;身份判别器用来监督特征提取器学习到的特征,与身份标签无关;多任务学习器预测表情高峰帧相对于初始帧之间的特征点偏移量以及表情类别,并试图迷惑多任务判别器;多任务判别器辨别输入是真实的还是预测的人脸特征点偏移量与表情类别。通过多任务学习器和多任务判别器之间的对抗学习,捕获面部行为的时空模式。通过特征提取器、多任务学习器和身份判别器的协同学习,学习与面部行为有关而与个体身份无关的面部特征。结果 在MMI(M&M initiative)、NVIE(natural visible and infrared facial expression)和BioVid(biopotential and video)数据集上的实验结果表明本文方法可以学习出与个体身份相关性较低的特征,通过同时预测特征点偏移量和表情类别,有效捕获自发和非自发表情的时空模式,从而获得较好的自发与非自发表情识别效果。结论 实验表明本文所提出的基于对抗学习的网络不仅可以有效学习个体无关但表情相关的面部中特征,而且还可以捕捉面部行为中的空间模式,而这些信息可以很好地改善自发与非自发表情识别。

关 键 词:自发与非自发表情识别  对抗学习  多任务学习  面部行为的时空模式  个体身份无关的面部特征
收稿时间:2020-06-05
修稿时间:2020-09-03

Posed and spontaneous expression distinction through multi-task and adversarial learning
Zheng Zhuangqiang,Jiang Qisheng,Wang Shangfei. Posed and spontaneous expression distinction through multi-task and adversarial learning[J]. Journal of Image and Graphics, 2020, 25(11): 2370-2379
Authors:Zheng Zhuangqiang  Jiang Qisheng  Wang Shangfei
Affiliation:School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;School of Data Science, University of Science and Technology of China, Hefei 230027, China
Abstract:Objective Posed and spontaneous expression distinction is a major problem in the field of facial expression analysis. Posed expressions are deliberately performed to confuse or cheat others, while spontaneous expressions occur naturally. The difference between the posed and spontaneous delivery of the same expression by a person is little due to the subjective fraud of posed expressions. At the same time, posed and spontaneous expression distinction suffers from the problem of high intraclass differences caused by individual differences. These limitations bring difficulties in posed and spontaneous expression distinction. However, behavioral studies have shown that significant differences exist between posed and spontaneous expressions in spatial patterns. For example, compared with spontaneous smiles, the contraction of zygomatic muscles is more likely to be asymmetric in posed smiles. Moreover, constricted orbicularis oculi muscle is presented in spontaneous smiles but absent in posed smiles. Such inherent spatial patterns in posed and spontaneous expressions can be utilized to facilitate posed and spontaneous expression distinction. Therefore, modeling spatial patterns inherent in facial behavior and extracting subject-independent facial features are important for posed and spontaneous expression distinction. Previous works typically focused on spatial pattern modeling in the facial behavior. Researchers commonly use landmarks to describe motion patterns of facial muscles approximately and capture spatial patterns inherent in facial behavior based on landmark information due to the difficulty in obtaining motion patterns of facial muscles. According to the difference in modeling spatial patterns inherent in facial behavior, studies on posed and spontaneous expression distinction can be categorized into two approaches, namely, feature- and probabilistic graphical model (PGM)-based methods. Feature-based methods implicitly capture spatial patterns using handcrafted low-level or deep features extracted by deep convolution networks. However, handcrafted low-level features have difficulty in describing complex spatial patterns inherent in the facial behavior. PGM-based methods model the distribution among landmarks and explicitly capture spatial patterns existing in facial behavior using PGMs. However, PGMs frequently simplify reasoning and calculation of models through independence or energy distribution assumptions, which are sometimes inconsistent with the ground truth distribution. At the same time, PGM-based methods typically use handcrafted low-level features and thus face similar defects. An adversarial network for posed and spontaneous expression distinction is proposed to solve the problems. Method On the one hand, we use landmark displacements between onset and corresponding apex frames to describe motion patterns of facial muscles approximately and capture spatial patterns inherent in facial behavior explicitly by modeling the joint distribution between expressions and landmark displacements. On the other hand, we alleviate the problem of high intraclass differences by extracting subject-independent features. Specifically, the proposed adversarial network consists of a feature extractor, a multitask learner, a multitask discriminator, and a feature discriminator. The feature extractor attempts to extract facial features, which are discriminative for posed and spontaneous expression distinction and robust for subjects. The multitask learner is used to classify posed and spontaneous expressions as well as predict facial landmark displacement simultaneously. The multitask discriminator distinguishes the predicted expression and landmark displacement from ground truth ones. The feature discriminator is a subject classifier that can be used to measure the correlation and independence between extracted facial features and subject identities. The feature extractor is trained cooperatively with the multitask learner but in an adversarial way with the feature discriminator. Thus, the feature extractor can learn good facial features for expression distinction and landmark displacement regression but not for subject recognition. The multitask learner competes with the multitask discriminator. The distribution of predicted expression and landmark displacement converges to the distribution of ground truth labels through adversarial learning. Thus, spatial patterns can be thoroughly explored for posed and spontaneous expression distinction. Result Experimental results on three benchmark datasets, i.e. MMI(M&M Initiative), NVIE(Natural visible and infrared facial expression), and BioVid(Biopotential and Video) demonstrate that the proposed adversarial network can not only effectively learn subject-independent and expression-discriminative facial features, so as to improve the generalization ability of the model on unseen subjects, but also make full use of spatial and temporal patterns inherent in facial behaviors to improve the performance of posed and spontaneous expression distinction, leading to the superior performance compared to state of the arts. Conclusion Experiments demonstrate the effectiveness of the proposed method.
Keywords:posed and spontaneous expression distinction  adversarial learning  multi-task learning  spatial and temporal patterns of facial behavior  subject-independent facial feature
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号