首页 | 本学科首页   官方微博 | 高级检索  
     

多层次特征融合的人群密度估计
引用本文:陈朋,汤一平,王丽冉,何霞. 多层次特征融合的人群密度估计[J]. 中国图象图形学报, 2018, 23(8): 1181-1192
作者姓名:陈朋  汤一平  王丽冉  何霞
作者单位:浙江工业大学信息工程学院, 杭州 310023,浙江工业大学信息工程学院, 杭州 310023,浙江工业大学信息工程学院, 杭州 310023,浙江工业大学信息工程学院, 杭州 310023
基金项目:国家自然科学基金项目(61070134,61379078)
摘    要:目的 人群数量和密度估计在视频监控、智能交通和公共安全等领域有着极其重要的应用价值。现有技术对人群数量大,复杂环境下人群密度的估计仍存在较大的改进空间。因此,针对密度大、分布不均匀、遮挡严重的人群密度视觉检测,提出一种基于多层次特征融合网络的人群密度估计方法,用来解决人群密度估计难的问题。方法 首先,利用多层次特征融合网络进行人群特征的提取、融合、生成人群密度图;然后,对人群密度图进行积分计算求出对应人群的数量;最后,通过还原密度图上人群空间位置信息并结合估算出的人群数量,对人群拥挤程度做出量化判断。结果 在Mall数据集上本文方法平均绝对误差(MAE)降至2.35,在ShanghaiTech数据集上MAE分别降至20.73和104.86,与现有的方法进行对比估计精度得到较大提升,尤其是在环境复杂、人数较多的场景下提升效果明显。结论 本文提出的多层次特征融合的人群密度估计方法能有效地对不同尺度的特征进行提取,具有受场景约束小,人群数量估计精度高,人群拥挤程度评估简单可靠等优点,实验的对比结果验证了本文方法的有效性。

关 键 词:人群密度估计  拥挤程度评估  层次特征融合  卷积神经网络  深度学习  智能视频分析
收稿时间:2018-01-08
修稿时间:2018-03-07

Crowd density estimation based on multi-level feature fusion
Chen Peng,Tang Yiping,Wang Liran and He Xia. Crowd density estimation based on multi-level feature fusion[J]. Journal of Image and Graphics, 2018, 23(8): 1181-1192
Authors:Chen Peng  Tang Yiping  Wang Liran  He Xia
Affiliation:School of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China,School of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China,School of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China and School of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
Abstract:Objective With the noticeable growth in population, large-scale collective activities have become increasingly frequent. In recent years, a series of social problems have become progressively prominent due to crowding of crowds. In particular, frequent accidents occur in densely populated areas, such as scenic spots, railway stations, and shopping malls. Crowd analysis has become an important research topic of intelligent video surveillance. Crowd density estimation has also become the focus of crowd safety control and management research. Crowd density estimation can help the staff to optimize the management of the statistics of the crowd in the current situation. Preventing overcrowding and detecting potential safety issues are important contributions of such a process. However, several of the available technologies are only applicable to a small number of people, and the environment is relatively static scene. Aiming at the visual detection of crowd density, uneven distribution, and occlusion crowd density, this study proposes a crowd density estimation method based on multi-level feature fusion network. Method First, we generate the feature map of each level using the convolutional pooling of the network. After five out of eight convolution layers are generated, a feature map that is 1/32 of the original size and 128 dimensions is generated and then perform three deconvolution operations. Thereafter, the convolutional layer features of the previous stage are fused together. Finally, the convolution layer is convoluted using a 1×1 volumetric kernel to form a density feature map of 1/4 of the original size. For the image, each convolution operation is an abstraction of the image features of the previous layer, and its different depths correspond to different levels of semantic features. Moreover, if convolution'' shallow network resolution is high, the additional image details are found. However, if its deep network resolution is low, then deep semantic and some key features should be learned. Low-level features can be suitably used to extract small target features, whereas high-level features can be used to extract large target features. We solve the problem of inconsistent image scales by combining the feature information of different layers. Second, we use the public dataset to generate the corresponding density label map using our artificial calibration and then train the network to independently predict the density map of the test image. Finally, by integrating the density map, on the basis of the generated density map, we propose a quantitative method of crowd extent, and the crowd crowding is calculated through the reduction and combination of crowd spatial information on density map. Result The proposed method reduces the MAE to 2.35 on the mall dataset and reduces the MAE to 20.73 and 104.86 on the ShanghaiTech dataset. Compared with the existing methods, the crowd density estimation accuracy is improved, having a noticeable effect on the environment with complex number of scenes. In addition, the experimental results of different network structures show an improvement of the test results after adding the deconvolution layer compared with pure convolutional networks. Under the complex scene of ShanghaiTech dataset, after the feature fusion network, the performance has further improved, especially the integration of 1, 2 features, which generates a more prominent effect. When the integration of the characteristics of the three layer basically does not improve the effect, the main reason is the level is too high and contains additional details. Moreover, several redundant information affects the generalization of the network capacity. The effect of network improvements is also not noticeable for the mall dataset with the standard scenario. However, when we use a pure convolutional network, the result is noticeable. Conclusion This study proposes a crowd density estimation method based on multi-level feature fusion network. Through the extraction and fusion of the features of different semantic layers, the network can extract the features of people in different scales and sizes, which effectively improves the robustness of the algorithm. Using the complete picture as the input better preserves the overall picture information, the feature space location information is considered in network training. This algorithm is more scientific and efficient when using the density map generated by forecasting in combination with the spatial information in the estimation of the number of people and the degree of congestion. The algorithm also has the advantages in small scene constraints, high crowd estimation accuracy, and simple and reliable crowd congestion assessment. The effectiveness of the proposed multi-level feature fusion network and crowd congestion evaluation method is verified through experiments.
Keywords:crowd density estimation  crowded degree assessment  hierarchical feature fusion  convolutional neural network(CNN)  deep learning  intelligent video analysis
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号