首页 | 本学科首页   官方微博 | 高级检索  
     


HIGSA: Human image generation with self-attention
Affiliation:1. School of Reliability and Systems Engineering, Beijing University of Aeronautics and Astronautics, Beijing, PR China;2. Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing, PR China;3. State Key Laboratory of Virtual Reality Technology and System, Beijing, PR China;1. School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, PR China;2. College of Water Conservancy and Hydropower Engineering, Hohai University, Nanjing 210098, PR China;1. School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, China;2. Beijing Xinghang Mechanical-Electrical Eqiupment Co., Ltd., Beijing 100074, China;3. AVIC Manufacturing Technology Institute, Beijing 100024, China;4. School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, China
Abstract:The goal of human image generation (HIG) is to synthesize a human image in a novel pose. HIG can potentially benefit various computer vision applications and engineering tasks. The recently-developed CNN-based approach applies the attention architecture to vision tasks. However, owing to the locality in CNNs, extracting and maintaining the long-range pixel interactions input images is difficult. Thus, existing human image generation methods face limited content representation. In this paper, we propose a novel human image generation framework called HIGSA that can utilize the position information from the input source image. The proposed HIGSA contains two complementary self-attention blocks to generate photo-realistic human images, named as stripe self-attention block (SSAB) and content attention block (CAB), respectively. In SSAB, this paper establishes global dependencies of human images and computes the attention map for each pixel based on its relative spatial positions concerning other pixels. In CAB, this paper introduces an effective feature extraction module to interactively enhance both person’s appearance and shape feature representations. Therefore, the HIGSA framework inherently preserves the better appearance consistency and shape consistency with sharper details. Extensive experiments on mainstream datasets demonstrate that HIGSA achieves the state-of-the-art (SOTA) results.
Keywords:Deep learning  GAN  Human image generation  Attention
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号