基于混合生成对抗网络的多视角图像生成算法 |
| |
引用本文: | 卫星,李佳,孙晓,刘邵凡,陆阳.基于混合生成对抗网络的多视角图像生成算法[J].自动化学报,2021,47(11):2623-2636. |
| |
作者姓名: | 卫星 李佳 孙晓 刘邵凡 陆阳 |
| |
作者单位: | 1.合肥工业大学计算机与信息学院 合肥 230601 |
| |
基金项目: | 2020年安徽省自然科学基金联合基金(2008085UD08), 安徽省重点研发计划项目(201904d08020008, 202004a05020004), 合肥工业大学智能制造技术研究院智能网联及新能源汽车技术成果转化及产业化项目(IMIWL2019003, IMIDC2019002)资助 |
| |
摘 要: | 多视角图像生成即基于某个视角图像生成其他多个视角图像, 是多视角展示和虚拟现实目标建模等领域的基本问题, 已引起研究人员的广泛关注. 近年来, 生成对抗网络(Generative adversarial network, GAN)在多视角图像生成任务上取得了不错的成绩, 但目前的主流方法局限于固定领域, 很难迁移至其他场景, 且生成的图像存在模糊、失真等弊病. 为此本文提出了一种基于混合对抗生成网络的多视角图像生成模型ViewGAN, 它包括多个生成器和一个多类别判别器, 可灵活迁移至多视角生成的多个场景. 在ViewGAN中, 多个生成器被同时训练, 旨在生成不同视角的图像. 此外, 本文提出了一种基于蒙特卡洛搜索的惩罚机制来促使每个生成器生成高质量的图像, 使得每个生成器更专注于指定视角图像的生成. 在DeepFashion, Dayton, ICG Lab6数据集上的大量实验证明: 我们的模型在Inception score和Top-k accuracy上的性能优于目前的主流模型, 并且在结构相似性(Structural similarity, SSIM)上的分数提升了32.29%, 峰值信噪比(Peak signal-to-noise ratio, PSNR)分数提升了14.32%, SD (Sharpness difference)分数提升了10.18%.
|
关 键 词: | 深度学习 计算机视觉 图像翻译 多视角图像生成 |
收稿时间: | 2019-10-25 |
Cross-view Image Generation via Mixture Generative Adversarial Network |
| |
Affiliation: | 1.School of Computer and Information, Hefei University of Technology, Hefei 230601 |
| |
Abstract: | Cross-view image translation aims at synthesizing new image from one to another. It is a fundamental issue in areas such as multi-view presentations and object modeling in virtual reality, which has been gaining a lot interest from researchers around the world. Recently, generative adversarial network (GAN) has shown promising results in image generation. However, the state-of-the-arts are limited to flxed flelds, and it is di–cult to migrate to other scenes, and the generated images are ambiguous and distorted. In this paper, we propose a novel framework- ViewGAN that makes it possible to generate realistic-looking images with difierent views. The ViewGAN can be flexibly migrated to multiple scenarios of the multi-view image generation task, which has multiple generators and one multi-class discriminator. The multiple generators are trained simultaneously, aiming at generating images from difierent views. Moreover, we propose a penalty mechanism based on Monte Carlo search to make each generator focus on generating its own images of a speciflc view accurately. Extensive experiments on DeepFashion, Dayton and ICG Lab6 datasets demonstrate that our model performs better on inception score and top-k accuracy than several state-of-the-arts, and the SSIM (structural similarity) increased by 32.29%, the PSNR (peak signal-to-noise ratio) increased by 14.32%, and the SD (sharpness difference) increased by 10.18%. |
| |
Keywords: | |
|
| 点击此处可从《自动化学报》浏览原始摘要信息 |
|
点击此处可从《自动化学报》下载全文 |
|