首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Crowd counting has become a hot topic because of its wide applications in video surveillance and public security. However, one main problem of the deep learning methods for crowd counting is that the location information about the crowd is degraded irreversibly due to the spatial down-sampling of convolutional neural networks, which degrades the quality of generated density maps. To remedy the above problem, we propose an attention guided feature pyramid network (AG-FPN) for crowd counting, which can adaptively generate a high-quality density map with accurate spatial locations by combining the high- and low-level features. An attention block is added to each encoder layer to further emphasize the crowd regions and suppress the background clutters in feature extraction. Experimental results on the ShanghaiTech, UCF_CC_50, WorldExpo’10 and UCF-QNRF datasets demonstrate the superiority of the proposed method over state-of-the-art approaches.  相似文献   

The human visual system has the ability to rapidly identify and redirect attention to important visual information in high complexity scenes such as the human crowd. Saliency prediction in the human crowd scene is the process using computer vision techniques to imitate the human visual system, predicting which areas in a human crowd scene may attract human attention. However, it is a challenging task to identify which factors may attract human attention due to the high complexity of the human crowd scene. In this work, we propose Multiscale DenseNet — Dilated and Attention (MSDense-DAt), a convolutional neural network (CNN) using self-attention to integrate the result of knowledge-driven gaze in the human visual system to identify salient areas in the human crowd scene. Our method combines various state-of-the-art deep learning architectures to deal with the high complexity in human crowd image, such as multiscale DenseNet for multiscale deep features extraction, self-attention, and dilated convolution. Then the effectiveness of each component in our CNN architecture is evaluated by comparing different components combinations. Finally, the proposed method is further evaluated in different crowd density levels to appraise the effect of crowd density on model performance.  相似文献   

Crowd counting with density estimation has been an active research community due to its significant applications in the fields of public security, video surveillance, traffic monitoring. However, Crowd counting for congested scenes often suffers from some obstacles including severe occlusions, large scale variations, noise interference, etc. In this paper, using the first ten layers of a modified VGG16 and dilated convolution layers as the framework, we have proposed a CNN based crowd counting and density estimation model improved by the attention aware modules with residual connections. To tackle the problem of noise interference, convolutional block attention modules have been introduced into the deep network to segment the foreground and background to focus on interest information, refining deeper features of the input image. To improve information transmission and reuse, residual connections are utilized to link 3 attention blocks. Meanwhile, dilated convolution layers keep larger reception fields and obtain high-resolution density maps. The proposed method has been evaluated on three public benchmarks, i.e. Shanghai Tech A & B, UCF-QNRF and MALL, achieving the mean absolute errors of 64.6 & 8.3, 113.8 and 1.68, respectively. The results outperform some existing excellent approaches. This indicates that the proposed model has high accuracy and better robustness, which is suitable for crowd counting and density estimation in various congested scenes.  相似文献   

现有的人群计数方法不能够完全适用于轨道交通场景中,为此,提出一种基于卷积神经网络的人群计数模型。模型采用VGG16作为前端网络提取浅层特征,提出一种基于Inception结构改进的M-Inception结构,结合空洞卷积构成后端网络,增大感受野,适应多监控角度下不同尺寸的行人目标;并提出一种融合行人总数估计损失和密度图损失的加权损失函数。将本文模型与4种现有模型进行对比实验,结果表明,提出的人群计数算法在地铁场景中的平均绝对误差和均方误差仅为1.46和2.13,优于4种对比模型。考虑到模型的实际应用,将模型部署到海思嵌入式芯片上,实测结果表明,模型可在嵌入式芯片上取得较高的计算速度和准确率,满足实际应用场景的需求。  相似文献   

沈宁静  袁健 《电子科技》2022,35(6):6-12
现有人群计数算法采用多列融合结构来解决单一图像的多尺度问题,但该处理方法不能有效利用低层特征信息,从而导致最终人群计数结果不准确。针对这一缺陷,文中提出一种基于残差密集连接与注意力融合的人群计数算法。该算法的前端利用改进VGG16网络提取低级特征信息。算法后端主分支基于残差密集连接结构,利用残差网络和密集网络结合方式捕获层与层间的特征信息,可高效捕获多尺度信息。侧分支通过引入注意力机制,生成对应尺度注意力图,有效区分特征图的背景和前景,降低了背景噪声的影响。采用3个主流公开数据集对该算法进行验证。实验结果表明,该算法计数有效且计数准确率优于其他算法。  相似文献   

人群自动计数问题在视频监控领域引起了广泛关注。近年来,卷积神经网络(CNN)模型在人群计数方面取得了良好效果。然而,当前对于基于深度学习的人群计数的研究主要停留在PC端上对单幅静止图片的人群计数,网络模型参数量巨大,网络结构复杂,消耗的计算资源巨大,难以部署于实际的监控视频人群计数系统。因此,本文采用深度学习的方法,通过对网络模型进行裁剪压缩,同时使用TensorRT对模型进行加速,在嵌入式平台上实现了接近实时的人群计数。提出的人群计数平均绝对误差(MAE)为21.6且平均每秒帧数(FPS)为22,在精确度和速度方面达到了一个很好的平衡,在嵌入式平台上运行速度较快,能达到实时的效果。  相似文献   

亢洁  田野  杨刚 《红外技术》2022,44(12):1316-1323
针对人群异常行为检测任务中存在的算法复杂度较高,重叠遮挡等带来的检测精度低等问题,本文提出一种基于改进SSD(Single Shot Multi-box Detector)的人群异常行为检测算法。首先采用轻量级网络MobileNet v2代替原始特征提取网络VGG-16,并通过可变形卷积模块构建卷积层来增强感受野,然后通过将位置信息整合到通道注意力中来进行特征增强,能够捕获空间位置之间的远程依赖关系,从而可以较好处理重叠遮挡问题。实验结果表明,本文提出的算法对人群异常行为具有较好的检测效果。  相似文献   

雷翰林  张宝华 《激光技术》2019,43(4):476-481
为了避免景深和遮挡的干扰, 提高人群计数的准确性, 采用了LeNet-5, AlexNet和VGG-16 3种模型, 提取图像中不同景深目标的特性, 调整上述模型的卷积核尺寸和网络结构, 并进行了模型融合。构造出一种基于多模型融合的深度卷积神经网络结构, 网络最后两层采用卷积核大小为1×1的卷积层取代传统的全连接层, 对提取的特征图进行信息整合并输出密度图, 极大地降低了网络参量且取得了一定提升的数据, 兼顾了算法效率和精度, 进行了理论分析和实验验证。结果表明, 在公开人群计数数据集shanghaitech两个子集和UCF_CC_50子集上, 本文中计数方法的平均绝对误差和均方误差分别是97.99和158.02, 23.36和41.86, 354.27和491.68, 取得比现有传统人群计数方法更好的性能; 通过迁移实验证明所提出的人群计数模型具有良好的泛化能力。该研究对人群计数精度的提高是有帮助的。  相似文献   

Crowd counting algorithms have recently incorporated attention mechanisms into convolutional neural networks (CNNs) to achieve significant progress. The channel attention model (CAM), as a popular attention mechanism, calculates a set of probability weights to select important channel-wise feature responses. However, most CAMs roughly assign a weight to the entire channel-wise map, which makes useful and useless information being treat indiscriminately, thereby limiting the representational capacity of networks. In this paper, we propose a multi-scale and spatial position-based channel attention network (MS-SPCANet), which integrates spatial position-based channel attention models (SPCAMs) with multiple scales into a CNN. SPCAM assigns different channel attention weights to different positions of channel-wise maps to capture more informative features. Furthermore, an adaptive loss, which uses adaptive coefficients to combine density map loss and headcount loss, is constructed to improve network performance in sparse crowd scenes. Experimental results on four public datasets verify the superiority of the scheme.  相似文献   

闫昭宇  王晶 《信号处理》2020,36(6):863-870
语音增强的目的是从带有噪声的语音中分离出纯净语音,实现语音的质量和可懂度的提高。近年来,采用有监督学习的深度神经网络已经成为了语音增强的主流方法。卷积循环网络是一种新型的神经网络结构,包含编码层、中间层、解码层三个主要模块,其已经在语音增强任务中取得了较好的效果。时频注意力机制是一个由数个相连的卷积层通过跳跃连接构成的简单网络模块,在训练过程中可以计算语音幅度谱特征图的非邻域相关性,从而更加有利于网络关注到语音的谐波特性。本文将时频注意力机制引入卷积循环网络的编码层和解码层中,实验结果表明,在不同信噪比条件下,该方法相比基线卷积循环网络能够进一步提高语音质量和可懂度,且增强后的语音信号可以保留更多的语谱谐波信息,实现更低程度的语音失真。   相似文献   

针对卷积神经网络中存在的学习效率低、收敛速度慢、训练时间长等问题,文中提出一种改进的LeNet卷积神经网络模型。该模型使用卷积核大小为3,步幅为2的卷积层代替原有的池化层,并在每层激活函数之前添加批量归一化层。在Mnist和Cifar-10数据集上放入实验证明,相比于传统的LeNet网络,所提出的卷积神经网络提高了分类准确率,并且具有更快的收敛速度及更短的训练时间。  相似文献   

In spite of the fact that convolutional neural network-based stereo matching models have shown good performance in both accuracy and robustness, the issue of image feature loss in regions of texture-less, complex scenes and occlusions remains. In this paper, we present a dense convolutional neural network-based stereo matching method with multiscale feature connection, named Dense-CNN. First, we construct a novel densely connected network with multiscale convolutional layers to extract rich image features, in which the merged multiscale features with context information are utilized to estimate the cost volume for stereo matching. Second, we plan a novel loss-function strategy to learn the network parameters more reasonably, which can develop the performance of the proposed Dense-CNN model on disparity computation. Finally, we run our Dense-CNN model on the Middlebury and KITTI databases to conduct a comprehensive comparison with several state-of-the-art approaches. The experimental results demonstrate that the proposed method achieved superior performance on computational accuracy and robustness of disparity estimation, especially achieving the significant benefit of feature preservation in ill-posed regions.  相似文献   

金鑫  胡英 《红外技术》2020,42(11):1103-1110
针对现有以雷达技术和红外热成像技术为代表的HOV(High occupancy vehiclelane)车道车辆乘员数量检测方法可靠性差、准确率低等问题,提出一种基于多光谱红外图像与改进Faster R-CNN(Region-Convolutional Neural Networks)的车辆乘员数量检测方法。通过多光谱红外成像系统获得汽车内部空间图像,结合Faster R-CNN深度学习算法实现乘员数量检测,通过采用全卷积网络结构、多尺度特征预测、使用ROI-Align代替ROI-Pooling等方式增强网络的泛化能力。通过对样据进行K-means聚类得到目标框长宽几何比例先验分布,提高区域生成(region proposal network,RPN)网络训练速度和位置回归准确性。测试结果表明,获得的汽车内部空间图像较为清晰,算法可以实现对乘员数量的检测。经过改进,网络的泛化能力得到增强,单乘员检测的准确率达到88.6%,相比于改进前提高了13.8%,能够满足行业规定大于80%的要求。  相似文献   

密集人群计数是计算机视觉领域的一个经典问题,仍然受制于尺度不均匀、噪声和遮挡等因素的影响.该文提出一种基于新型多尺度注意力机制的密集人群计数方法.深度网络包括主干网络、特征提取网络和特征融合网络.其中,特征提取网络包括特征支路和注意力支路,采用由并行卷积核函数组成的新型多尺度模块,能够更好地获取不同尺度下的人群特征,以...  相似文献   

卷积层平移等变性与线性谱不适配,卷积网络对高维特征的长距离依赖建模能力不足。该文提出一种双对数谱特征用于船舶辐射噪声分类。双对数谱通过重新排列对数谱频点,保证高频端分辨率的同时,规避使用太深的卷积网络。利用双对数谱各行表征同一目标的先验知识,构建卷积网络和目标函数。DeepShip数据集上的试验结果表明,特征维数相同情况下,提出的算法分类正确率比以线性谱为输入的卷积网络提高2.4%以上。  相似文献   

针对现有的高分辨率遥感图像居民地信息提取精度和效率不够高的问题,提出了一种基于改进全卷积网络的“高分一号”(GF-1)遥感影像居民地提取方法。首先,通过专业的目视解译制备大量居民地训练样本;然后,将预训练过的深度卷积神经网络进行全卷积网络的改造,并以具有多尺度卷积核的Inception模块代替由全连接层改造的卷积层,达到减小网络模型参数量、增加特征表达能力的目的;最后,用制作好的高分辨率遥感图像居民地数据集进行训练和验证,生成可直接进行居民地信息提取的全卷积网络。实验结果表明,基于改进全卷积网络的方法可以实现精确有效的居民地信息提取,Kappa系数超过94%。  相似文献   

It is becoming increasingly easier to obtain more abundant supplies for hyperspectral images ( HSIs). Despite this, achieving high resolution is still critical. In this paper, a method named hyperspectral images super-resolution generative adversarial network ( HSI-RGAN ) is proposed to enhance the spatial resolution of HSI without decreasing its spectral resolution. Different from existing methods with the same purpose, which are based on convolutional neural networks ( CNNs) and driven by a pixel-level loss function, the new generative adversarial network (GAN) has a redesigned framework and a targeted loss function. Specifically, the discriminator uses the structure of the relativistic discriminator, which provides feedback on how much the generated HSI looks like the ground truth. The generator achieves more authentic details and textures by removing the place of the pooling layer and the batch normalization layer and presenting smaller filter size and two-step upsampling layers. Furthermore, the loss function is improved to specially take spectral distinctions into account to avoid artifacts and minimize potential spectral distortion, which may be introduced by neural networks. Furthermore, pre-training with the visual geometry group (VGG) network helps the entire model to initialize more easily. Benefiting from these changes, the proposed method obtains significant advantages compared to the original GAN. Experimental results also reveal that the proposed method performs better than several state-of-the-art methods.  相似文献   

针对全卷积神经网络多次下采样操作导致的道路边缘细节信息损失和道路提取不准确的问题,本文提出了多尺度特征融合的膨胀卷积残差网络高分一号影像道路提取方法。首先,通过目视解译的方法制作大量的道路提取标签数据;其次,在残差网络ResNet-101的各个残差块中引入膨胀卷积和多尺度特征感知模块,扩大特征点的感受野,避免特征图分辨率减小和道路边缘细节特征的损失;然后,通过叠加融合和上采样操作将各个尺寸的道路特征图进行融合,得到原始分辨率大小的特征图;最后,将特征图输入Sigmoid分类器中进行分类。实验结果表明:本文方法的提取精度优于经典全卷积神经网络模型,准确率达到了98%以上,有效保留了道路的完整性及其边缘的细节信息。  相似文献   

Similar to acyclic networks, over cyclic networks, there also exist four classes of optimal convolutional network codes, which are referred to as basic convolutional network code (BCNC), convolutional dispersion (CD), convolutional broadcast (CB), and convolutional multicast (CM), respectively. And from the perspective of linear independence among the global encoding kernels (GEKs), BCNC is with the best strength. In this paper, we present an efficient construction algorithm for BCNC over cyclic networks. Our algorithm can positively provide the maximal required cardinality of the local encoding kernels (LEKs). Another advantage of this algorithm is that for an existing code, when some non-source nodes and associated edges are added, our algorithm can correspondingly modify the already assigned LEKs in a localized manner. And we can just reset the LEKs along some special flow paths educed by the added nodes and edges, rather than reconstructing the whole code in its expanding network.  相似文献   

Crowd counting is a conspicuous task in computer vision owing to scale variations, perspective distortions, and complex backgrounds. Existing research usually adopts the dilated convolution network to enlarge the receptive fields to solve the problem of scale variations. However, these methods easily bring background information into the large receptive fields to generate poor quality density maps. To address this problem, we propose a novel backbone called Context-guided Dense Attentional Dilated Network (CDADNet). CDADNet contains three components: an attentional module, a context-guided module and a dense attentional dilated module. The attentional module is used to provide attention maps which can remove background information, while the context-guided module is proposed to extract multi-scale contextual information. Moreover, the dense attentional dilated module aims to generate high-granularity density maps and the cascaded strategy is used to preserve information from changing scales. To verify the feasibility of our method, we compare it to the existing approaches on five crowd counting datasets (ShanghaiTech (Part_A and Part_B), WorldEXPO’10, UCSD, UCF_CC_50). The comparison results demonstrate that CDADNet is effective and robust for various scenes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号