融合多尺度特征与全局上下文信息的X光违禁物品检测 Integrated multi-scale features and global context in X-ray detection for prohibited items期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合多尺度特征与全局上下文信息的X光违禁物品检测

引用本文：	李晨,张辉,张邹铨,车爱博,王耀南. 融合多尺度特征与全局上下文信息的X光违禁物品检测[J]. 中国图象图形学报, 2022, 27(10): 3043-3057

作者姓名：	李晨张辉张邹铨车爱博王耀南

作者单位：	长沙理工大学, 长沙 410114;长沙理工大学, 长沙 410114;湖南大学, 长沙 410082

基金项目：	国家重点研发计划资助（2018YFB1308200）；国家自然科学基金项目（61971071，62027810，62133005）；湖南省杰出青年科学基金项目（2021JJ10025）；湖南省重点研发计划资助（2021GK4011，2022GK2011）；长沙市科技重大专项项目（kh2003026）

摘要：	目的 X光图像违禁物品检测一直是安检领域的一个基础问题,安检违禁物品形式各异,尺度变化大,以及透视性导致大量物体堆放时出现重叠遮挡现象,传统图像处理模型很容易出现漏检误检,召回率低。针对以上问题,提出一种融合多尺度特征与全局上下文信息的特征增强融合网络(feature enhancement fusion network, FEFNet)用于X光违禁物品检测。方法首先针对特征主干网络darknet53,加入空间坐标的注意力机制,将位置信息嵌入到通道注意力中,分别沿两个空间方向聚合特征,增强特征提取器对违禁目标的特征提取能力,抑制背景噪声干扰。然后,将特征提取主干网络输出的特征编码为1维向量,利用自监督二阶融合获取特征空间像素相关性矩阵,进而获取完整的全局上下文信息,为视觉遮挡区域提供全局信息指导。针对违禁物品尺度不一的问题,提出多尺度特征金字塔融合模块,增加一层小感受野预测特征用于提高对小尺度违禁目标的检测能力。最后,通过融合全局上下文特征信息和局部多尺度细节特征解决违禁物品之间的视觉遮挡问题。结果在SIXRay-Lite(security inspection X-ray)数据集...
关键词：	违禁品检测 X光图像特征增强融合注意力机制多尺度融合全局上下文信息
收稿时间：	2021-06-01
修稿时间：	2021-11-03
Integrated multi-scale features and global context in X-ray detection for prohibited items

Li Chen,Zhang Hui,Zhang Zouquan,Che Aibo,Wang Yaonan. Integrated multi-scale features and global context in X-ray detection for prohibited items[J]. Journal of Image and Graphics, 2022, 27(10): 3043-3057

Authors:	Li Chen Zhang Hui Zhang Zouquan Che Aibo Wang Yaonan

Affiliation:	Changsha University of Science and Technology, Changsha 410114, China;Changsha University of Science and Technology, Changsha 410114, China;Hunan University, Changsha 410082, China

Abstract:	Objective X-ray image detection is essential for prohibited items in the context of security inspection of those are different types, large-scale changes and most unidentified prohibited items. Traditional image processing models are concerned of to the status of missed and false inspections, resulting in a low model recall rate, and non-ideal analysis in real-time detection. Differentiated from regular optical images, X-ray images tends to the overlapping phenomena derived of a large number of stacked objects. It is challenged to extract effective multiple overlapping objects information for the deep learning models. The multiple overlapping objects are checked as a new object, resulting in poor classification effect and low detection accuracy. Our feature enhancement fusion network(FEFNet) is facilitated to the issue of X-ray detection of prohibited items based on multi-scale features and global context.Method First, the feature enhancement fusion model improves you only look once v3(YOLOv3)''s feature extractor darknet53 through adding a spatial coordinated attention mechanism. The improved feature extractor is called coordinate darknet, which embeds in situ information into the channel attention and aggregates features in two spatial directions. Coordinate darknet can extract more salient and discriminatory information to improve the feature extractor''s ability. Specifically, the coordinated attention module is melted into the last four residual stages of the original darknet53, including two pooling modules. To obtain feature vectors in different directions, the width and height of the feature map are pooled adaptively. To obtain attention vectors in different directions, the feature vectors are processed in different directions through the batch normalization layer and activation layer. What''s more, the obtained attention vector is applied to the input feature map to yield the model to the detailed information. Next, our bilinear second-order fusion module extracts global context features. The module encodes the highest-dimensional semantic feature information output by a melted one-dimensional vector into the feature extraction backbone network. To obtain a spatial pixel correlation matrix, the bilinear pooling is used to a two-dimensional feature undergoes second-order fusion. To output the final global context features information, the correlation matrix is multiplied by the input features up-sampled and spliced with the feature pyramid. Among them, the bilinear pool operation first obtains the fusion matrix by bilinear fusion (multiplication) of two one-dimensional vectors at the same position, and sums and pools all positions following, and obtains final L2 normalization and softmax operation after the fusion feature. Finally, the feature pyramid layer is improved in response to the problem of different scales of prohibited items. Our cross-scale fusion feature pyramid module improves the ability of multi-scale prohibited items. The multi-scale feature pyramid outputs a total of 4 feature maps of different scales as predictions, and the sizes from small to large are 13×13 pixels, 26×26 pixels, 52×52 pixels, and 104×104 pixels. Small-scale feature maps can predict large-scale targets, and large-scale feature maps are used to improve the predicting ability of small targets. In addition, the concatenate operation is replaced with adding, which can keep more activation mapping from the coordinate darknet. Meanwhile, the global context feature is connected to other local features straightforward derived of second-order fusion, and this information optimizes the obscure and occlusion phenomenon.Result Our experiment is trained and verified on the security inspection X-ray(SIXRay-Lite) dataset, which include 7 408 samples of training data and 1 500 test data samples. Our EFENet is compared to other object detection models as well, such as single shot detection(SSD), Faster R-CNN, RetinaNet, YOLOv5, and asymmetrical convolution multi-view neural network(ACMNet). This experimental results show that our method achieves 85.64% mean average precision(mAP) on the SIXray-Lite dataset, which is 11.24% higher than the original YOLOv3. Among them, the average detection accuracy of gun is 95.15%, the average detection accuracy of knife is 81.43%, the average detection accuracy of wrench is 81.65%, the average detection accuracy of plier is 85.95%, and the average detection accuracy of scissor is 84.00%. Our comparative analyses demonstrate the priority of our proposed model as mentioned below:1) in comparison with the SSD model, the mAP of the FEFNet model is increased by 13.97%; 2) compared to the RetinaNet model, the mAP of the FEFNet model is increased by 7.40%; 3) compared to the Faster R-CNN model, the mAP of the FEFNet model is increased by 5.48%; 4) compared to the YOLOv5 model, the mAP of the FEFNet model is only increased 3.61%, and 5) compared to the ACMNet model, the mAP of the FEFNet model is increased by 1.34%.Conclusion Our FEFNet can be optimized to extract significant difference features, reduce background noise interference, and improve the detection ability of multi-scale and small prohibited items. The combination of global context feature information and multi-scale local feature can alleviate the visual occlusion and obscure phenomenon between prohibited items effectively, and improve the overall detection accuracy of the model while ensuring the real-time performance.

Keywords:	prohibited items detection X-ray image features enhancement fusion attention mechanism multi-scale fusion global context features

	点击此处可从《中国图象图形学报》浏览原始摘要信息
	点击此处可从《中国图象图形学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏