结合年龄监督和人脸先验的语音-人脸图像重建 Face reconstruction from voice based on age-supervised learning and face prior information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合年龄监督和人脸先验的语音-人脸图像重建

引用本文：	何立,庞善民. 结合年龄监督和人脸先验的语音-人脸图像重建[J]. 浙江大学学报(工学版), 2022, 56(5): 1006-1016. DOI: 10.3785/j.issn.1008-973X.2022.05.018

作者姓名：	何立庞善民

作者单位：	西安交通大学软件学院，陕西西安 710049

基金项目：	国家自然科学基金资助项目（61972312）；陕西省重点研发计划一般工业资助项目（2020GY-002）

摘要：	针对语音-人脸图像重建方法缺乏来自不同维度的监督约束及未利用人脸先验信息，导致生成图像和真实图像相似度不高的问题，提出结合年龄监督和人脸先验信息的语音-人脸图像重建方法. 通过预训练的年龄评估模型为当前数据集扩充年龄数据，弥补来自年龄监督信息的缺乏. 通过语音-人脸图像跨模态身份匹配方法，为给定语音检索接近真实人脸的面部图像，将得到的图像作为人脸先验信息使用. 该方法通过定义结合交叉熵损失和对抗损失的联合损失函数，从年龄感、低频内容和局部纹理等方面均衡提升重建图像质量. 基于数据集Voxceleb 1，通过人脸检索实验的方式进行测试，与当前主流方法进行比较和分析. 结果表明，该方法能有效提升生成图像与真实图像的相似度，所生成的图像具有更好的主客观评价结果.
关键词：	深度学习图像重建卷积神经网络生成对抗网络人脸先验信息
Face reconstruction from voice based on age-supervised learning and face prior information

Li HE,Shan-min PANG. Face reconstruction from voice based on age-supervised learning and face prior information[J]. Journal of Zhejiang University(Engineering Science), 2022, 56(5): 1006-1016. DOI: 10.3785/j.issn.1008-973X.2022.05.018

Authors:	Li HE Shan-min PANG

Abstract:	Previous voice-face image reconstruction methods lack effective supervised constraints from different dimensions and face prior information, which may lead to a low similarity between reconstructed and real images. Thus, a face reconstruction method based on age-supervised learning and face prior information was proposed. Age related data were provided for the present dataset through a pre-trained age estimation model, which strengthened age supervision. For given voice samples, voice-face cross-modal identity matching was applied to retrieve images similar to real speakers, where the retrieved results were considered as face prior information. A joint loss function that consists of the cross entropy loss and the adversarial loss was defined to improve age coincidence, low-frequency content and high-frequency textures of the reconstructed images. Results of face retrieval experiments conducted with dataset Voxceleb 1 showed that the proposed method can improve the similarity between generated and ground truth images. The images generated by the proposed method have better subjective and objective evaluation results than that of the compared methods.

Keywords:	deep learning image reconstruction convolutional neural network generative adversarial network face prior information

	点击此处可从《浙江大学学报(工学版)》浏览原始摘要信息
	点击此处可从《浙江大学学报(工学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏