首页 | 本学科首页   官方微博 | 高级检索  
     

多层次结构生成对抗网络的文本生成图像方法
引用本文:孙钰,李林燕,叶子寒,胡伏原,奚雪峰.多层次结构生成对抗网络的文本生成图像方法[J].计算机应用,2019,39(11):3204-3209.
作者姓名:孙钰  李林燕  叶子寒  胡伏原  奚雪峰
作者单位:苏州科技大学电子与信息工程学院,江苏苏州215009;苏州市大数据与信息服务重点实验室,江苏苏州215009;苏州经贸职业技术学院,江苏苏州,215009;苏州科技大学电子与信息工程学院,江苏苏州215009;江苏省建筑智慧节能重点实验室,江苏苏州215009;苏州科技大学电子与信息工程学院,江苏苏州,215009;苏州科技大学电子与信息工程学院,江苏苏州215009;苏州市虚拟现实智能交互及应用技术重点实验室,江苏苏州215009
基金项目:国家自然科学基金资助项目(61876121,61472267);江苏省重点研发计划项目(BE2017663);苏州市科技发展计划项目(SZS201609);江苏省研究生科研创新项目(KYCX18_2549)。
摘    要:近年来,生成对抗网络(GAN)在从文本描述到图像的生成中已经取得了显著成功,但仍然存在图像边缘模糊、局部纹理不清晰以及生成样本方差小等问题。针对上述不足,在叠加生成对抗网络模型(StackGAN++)基础上,提出了一种多层次结构生成对抗网络(MLGAN)模型,该网络模型由多个生成器和判别器以层次结构并列组成。首先,引入层次结构编码方法和词向量约束来改变网络中各层次生成器的条件向量,使图像的边缘细节和局部纹理更加清晰生动;然后,联合训练生成器和判别器,借助多个层次的生成图像分布共同逼近真实图像分布,使生成样本方差变大,增加生成样本的多样性;最后,从不同层次的生成器生成对应文本的不同尺度图像。实验结果表明,在CUB和Oxford-102数据集上MLGAN模型的Inception score分别达到了4.22和3.88,与StackGAN++相比,分别提高了4.45%和3.74%。MLGAN模型在解决生成图像的边缘模糊和局部纹理不清晰方面有了一定提升,其生成的图像更接近真实图像。

关 键 词:生成对抗网络  文本生成图像  多层次结构生成对抗网络  多层次图像分布  层次结构编码
收稿时间:2019-05-24
修稿时间:2019-06-28

Text-to-image synthesis method based on multi-level structure generative adversarial networks
SUN Yu,LI Linyan,YE Zihan,HU Fuyuan,XI Xuefeng.Text-to-image synthesis method based on multi-level structure generative adversarial networks[J].journal of Computer Applications,2019,39(11):3204-3209.
Authors:SUN Yu  LI Linyan  YE Zihan  HU Fuyuan  XI Xuefeng
Abstract:In recent years, the Generative Adversarial Network (GAN) has achieved remarkable success in text-to-image synthesis, but there are still problems such as edge blurring of images, unclear local textures, small sample variance. In view of the above shortcomings, based on Stack Generative Adversarial Network model (StackGAN++), a Multi-Level structure Generative Adversarial Networks (MLGAN) model was proposed, which is composed of multiple generators and discriminators in a hierarchical structure. Firstly, hierarchical structure coding method and word vector constraint were introduced to change the condition vector of generator of each level in the network, so that the edge details and local textures of the image were clearer and more vivid. Then, the generator and the discriminator were jointed by trained to approximate the real image distribution by using the generated image distribution of multiple levels, so that the variance of the generated sample became larger, and the diversity of the generated sample was increased. Finally, different scale images of the corresponding text were generated by generators of different levels. The experimental results show that the Inception scores of the MLGAN model reached 4.22 and 3.88 respectively on CUB and Oxford-102 datasets, which were respectively 4.45% and 3.74% higher than that of StackGAN++. The MLGAN model has improvement in solving edge blurring and unclear local textures of the generated image, and the image generated by the model is closer to the real image.
Keywords:Generative Adversarial Network (GAN)  text-to-image synthesis  Multi-Level structure Generative Adversarial Networks (MLGAN)  multi-level image distribution  hierarchical coding  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号