首页 | 本学科首页   官方微博 | 高级检索  
     


CrowdFormer: Weakly-supervised crowd counting with improved generalizability
Affiliation:1. Faculty of Information Science and Engineering, Ningbo University, Ningbo, China;2. School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China;1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, PR China;2. Department of Computer Science and Engineering, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, PR China;3. Business Intelligence and Visualization Research Center, National Engineering Laboratory for Big Data Distribution and Exchange Technologies, Shanghai, 200436, PR China;4. Shanghai Engineering Research Center of Big Data & Internet Audience, Shanghai, 200072, PR China;5. Innovation College North-Chiang Mai University, 169 Moo3, Nong Kaew, Hang Dong, Chiang Mai 50230 Thailand;6. International College of Digital Innovation, Chiang Mai University, Chiang Mai, 50200, Thailand;1. Department of Electronics and Communication, Dr B R Ambedkar National Institute of Technology, Jalandhar, Punjab 144011, India;2. Department of Computer Science and Engg., UIET, Sector 25, Panjab University, Chandigarh 160023, India;1. School of Computer and Software Engineer, Xihua University, Chengdu Sichuan 610039, China;2. Department of Convergence Contents and Media Design, Kyungil University, Gyeongsan 38428, Gyeongsangbuk-do, South Korea
Abstract:Convolutional neural networks (CNNs) have dominated the field of computer vision for nearly a decade. However, due to their limited receptive field, CNNs fail to model the global context. On the other hand, transformers, an attention-based architecture, can model the global context easily. Despite this, there are limited studies that investigate the effectiveness of transformers in crowd counting. In addition, the majority of the existing crowd-counting methods are based on the regression of density maps which requires point-level annotation of each person present in the scene. This annotation task is laborious and also error-prone. This has led to an increased focus on weakly-supervised crowd-counting methods, which require only count-level annotations. In this paper, we propose a weakly-supervised method for crowd counting using a pyramid vision transformer. We have conducted extensive evaluations to validate the effectiveness of the proposed method. Our method achieves state-of-the-art performance. More importantly, it shows remarkable generalizability.
Keywords:Crowd counting  Vision transformers  Weakly-supervised method  Generalizability
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号