基于单列多尺度卷积神经网络的人群计数 Crowd Counting Based on Single-column Multi-scale Convolutional Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于单列多尺度卷积神经网络的人群计数

引用本文：	彭贤,彭玉旭,汤强,宋砚琪.基于单列多尺度卷积神经网络的人群计数[J].计算机科学,2020,47(4):150-156.

作者姓名：	彭贤彭玉旭汤强宋砚琪

作者单位：	长沙理工大学计算机与通信工程学院长沙 410000;长沙理工大学计算机与通信工程学院长沙 410000;长沙理工大学计算机与通信工程学院长沙 410000;长沙理工大学计算机与通信工程学院长沙 410000

基金项目：	长沙理工大学青年教师成长计划项目;湖南省教育厅优秀青年项目

摘要：	单张图片和监控视频中的人群计数问题在近年来受到了越来越多的关注。尺度的变化和人群遮挡等问题,导致人群计数是一项十分具有挑战性的任务,但是深度卷积神经网络被证明能有效地解决这一问题。文中提出了一种单列多尺度的卷积神经网络,该网络提供了一种数据驱动的深度学习方法,能够理解各种不同的场景,并能进行精确的计数估计。该网络模型主要由作为二维特征提取的前端与中端,和用来还原密度图的后端组成。其中,使用堆叠池代替最大池化层,在不引入额外参数的前提下增加了模型的尺度不变性。网络模型前端采用部分VGG-16结构;中端采用FME(特征聚合模块),用来打破不同列之间的独立,以更好地提取多尺度特征信息;后端采用3列5层的不同扩张率的空洞卷积,在保持分辨率不变的情况下增加感受野,生成更高质量的人群密度图,并引入一种相对人数损失,以提升稀疏密度人群情况下模型的性能。该模型在两个最具挑战性的人群计数数据集上都取得了很好的效果。实验结果表明,在公开人群计数数据集ShanghaiTech的两个子集和UCF_CC_50上,该方法的平均绝对误差(MAE)和均方误差(MSE)分别是66.2和103.0、8.7和13.4、251.0和329.5,性能比传统人群计数方法更好。与其他模型相比,该模型拥有更高的精度和更好的鲁棒性,对稀疏人数图像有着更好的计数效果。
关键词：	卷积神经网络人群计数堆叠池空洞卷积特征聚合相对人数损失
Crowd Counting Based on Single-column Multi-scale Convolutional Neural Network

PENG Xian,PENG Yu-xu,TANG Qiang,SONG Yan-qi.Crowd Counting Based on Single-column Multi-scale Convolutional Neural Network[J].Computer Science,2020,47(4):150-156.

Authors:	PENG Xian PENG Yu-xu TANG Qiang SONG Yan-qi

Affiliation:	(School of Computer and Communication Engineering,Changsha University of Science&Technology,Changsha 410000,China)

Abstract:	The problem of crowd counting in single images and monitoring videos has received increasing attention in recent years.Due to the scale change and crowd occlusion,crowd counting is a very challenging problem,but deep convolutional neural network has been proved to be effective in solving this problem.In this paper,a single-column multi-scale convolutional neural network is proposed,which provides a data-driven deep learning method that can understand various scenarios and perform accurate counting and estimation.The proposed network model is mainly composed of the front end and the middle end,for two-dimensional features extraction,as well as the back end,which is used to restore the density map.Stack pools are used to replace the maximum pooling layer,and scale invariance of the model is increased without introducing additional parameters.Partial vgg-16 structure is adopted at the front end of the network model,and FME(feature aggregation module)is adopted in the middle to break the independence between different columns,to better extract multi-scale feature information.At the back end,three columns and five layers of cavity convolution with different expansion rates are adopted to increase the sensing field while keeping the resolution unchanged,generating a crowd density map with higher quality.A relative population loss is introduced to improve the model performance in the case of sparse population density.This model works well on two of the most challenging crowd counting data sets.The results show that on two subsets of ShanghaiTech and UCF_CC_50,the mean absolute error(MAE)and mean square error(MSE)of the proposed method are 66.2 and 103.0,8.7 and 13.4,251.0 and 329.5,respectively,achieving better performance than the traditional crowd counting methods.Compared with other models,the proposed model has higher accuracy,better robustness and better counting effect for images with sparse population.

Keywords:	Convolutional neural networks Crowd counting Stacked-pooling Dilated convolution Feature combination Relative head loss
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏