共查询到20条相似文献,搜索用时 15 毫秒
1.
人类视觉注意机制在目标检测中的应用 总被引:22,自引:1,他引:22
根据人类视觉感知理论,在介绍了两种比较有代表性的视觉注意模型的基础上,采用bottom—up控制策略的预注意机制和top—down控制策略的注意机制,提出了一种适用于自动目标识别的目标检测算法。从输入图像出发,采用Gabor算子建立多尺度、多方位的多通道图像,通过全波整流和各通道间的对比度增益控制,得到多尺度、多方位的方位特征图,这些特征图的线性组合则为显著性图。给出了仅采用bottom—up控制策略的船舶目标检测实验结果,待检测目标在显著性图中得到明显增强,有利于检测的实现。 相似文献
2.
Saliency detection has been researched for conventional images with standard aspect ratios, however, it is a challenging problem for panoramic images with wide fields of view. In this paper, we propose a saliency detection algorithm for panoramic landscape images of outdoor scenes. We observe that a typical panoramic image includes several homogeneous background regions yielding horizontally elongated distributions, as well as multiple foreground objects with arbitrary locations. We first estimate the background of panoramic images by selecting homogeneous superpixels using geodesic similarity and analyzing their spatial distributions. Then we iteratively refine an initial saliency map derived from background estimation by computing the feature contrast only within local surrounding area whose range and shape are changed adaptively. Experimental results demonstrate that the proposed algorithm detects multiple salient objects faithfully while suppressing the background successfully, and it yields a significantly better performance of panorama saliency detection compared with the recent state-of-the-art techniques. 相似文献
3.
To solve the problem of low sign language recognition rate under the condition of small samples, a simple and effective static gesture recognition method based on an attention mechanism is proposed. The method proposed in this paper can enhance the features of both the details and the subject of the gesture image. The input of the proposed method depends on the intermediate feature map generated by the original network. Also, the proposed convolutional model is a lightweight general module, which can be seamlessly integrated into any CNN(Convolutional Neural Network) architecture and achieve significant performance gains with minimal overhead. Experiments on two different datasets show that the proposed method is effective and can improve the accuracy of sign language recognition of the benchmark model, making its performance better than the existing methods. 相似文献
4.
5.
Recently significant progress has been made in the field of person detection and tracking. However, crowded scenes remain particularly challenging and can deeply affect the results due to overlapping detections and dynamic occlusions. In this paper, we present a method to enhance human detection and tracking in crowded scenes. It is based on introducing additional information about crowds and integrating it into the state-of-the-art detector. This additional information cue consists of modeling time-varying dynamics of the crowd density using local features as an observation of a probabilistic function. It also involves a feature tracking step which allows excluding feature points attached to the background. This process is favorable for the later density estimation since the influence of features irrelevant to the underlying crowd density is removed. Our proposed approach applies a scene-adaptive dynamic parametrization using this crowd density measure. It also includes a self-adaptive learning of the human aspect ratio and perceived height in order to reduce false positive detections. The resulting improved detections are subsequently used to boost the efficiency of the tracking in a tracking-by-detection framework. Our proposed approach for person detection is evaluated on videos from different datasets, and the results demonstrate the advantages of incorporating crowd density and geometrical constraints into the detection process. Also, its impact on tracking results have been experimentally validated showing good results. 相似文献
6.
《Digital Communications & Networks》2023,9(1):14-21
The attacks on in-vehicle Controller Area Network (CAN) bus messages severely disrupt normal communication between vehicles. Therefore, researches on intrusion detection models for CAN have positive business value for vehicle security, and the intrusion detection technology for CAN bus messages can effectively protect the in-vehicle network from unlawful attacks. Previous machine learning-based models are unable to effectively identify intrusive abnormal messages due to their inherent shortcomings. Hence, to address the shortcomings of the previous machine learning-based intrusion detection technique, we propose a novel method using Attention Mechanism and AutoEncoder for Intrusion Detection (AMAEID). The AMAEID model first converts the raw hexadecimal message data into binary format to obtain better input. Then the AMAEID model encodes and decodes the binary message data using a multi-layer denoising autoencoder model to obtain a hidden feature representation that can represent the potential features behind the message data at a deeper level. Finally, the AMAEID model uses the attention mechanism and the fully connected layer network to infer whether the message is an abnormal message or not. The experimental results with three evaluation metrics on a real in-vehicle CAN bus message dataset outperform some traditional machine learning algorithms, demonstrating the effectiveness of the AMAEID model. 相似文献
7.
Recently, very deep convolution neural network (CNN) has shown strong ability in single image super-resolution (SISR) and has obtained remarkable performance. However, most of the existing CNN-based SISR methods rarely explicitly use the high-frequency information of the image to assist the image reconstruction, thus making the reconstructed image looks blurred. To address this problem, a novel contour enhanced Image Super-Resolution by High and Low Frequency Fusion Network (HLFN) is proposed in this paper. Specifically, a contour learning subnetwork is designed to learn the high-frequency information, which can better learn the texture of the image. In order to reduce the redundancy of the contour information learned by the contour learning subnetwork during fusion, the spatial channel attention block (SCAB) is introduced, which can select the required high-frequency information adaptively. Moreover, a contour loss is designed and it is used with the loss to optimize the network jointly. Comprehensive experiments demonstrate the superiority of our HLFN over state-of-the-art SISR methods. 相似文献
8.
The existing deraining methods based on convolutional neural networks (CNNs) have made great success, but some remaining rain streaks can degrade images drastically. In this work, we proposed an end-to-end multi-scale context information and attention network, called MSCIANet. The proposed network consists of multi-scale feature extraction (MSFE) and multi-receptive fields feature extraction (MRFFE). Firstly, the MSFE can pick up features of rain streaks in different scales and propagate deep features of the two layers across stages by skip connections. Secondly, the MRFFE can refine details of the background by attention mechanism and the depthwise separable convolution of different receptive fields with different scales. Finally, the fusion of these outputs of two subnetworks can reconstruct the clean background image. Extensive experimental results have shown that the proposed network achieves a good effect on the deraining task on synthetic and real-world datasets. The demo can be available at https://github.com/CoderLi365/MSCIANet. 相似文献
9.
Convolutional neural network (CNN) based methods have recently achieved extraordinary performance in single image super-resolution (SISR) tasks. However, most existing CNN-based approaches increase the model’s depth by stacking massive kernel convolutions, bringing expensive computational costs and limiting their application in mobile devices with limited resources. Furthermore, large kernel convolutions are rarely used in lightweight super-resolution designs. To alleviate the above problems, we propose a multi-scale convolutional attention network (MCAN), a lightweight and efficient network for SISR. Specifically, a multi-scale convolutional attention (MCA) is designed to aggregate the spatial information of different large receptive fields. Since the contextual information of the image has a strong local correlation, we design a local feature enhancement unit (LFEU) to further enhance the local feature extraction. Extensive experimental results illustrate that our proposed MCAN can achieve better performance with lower model complexity compared with other state-of-the-art lightweight methods. 相似文献
10.
针对当前全景图像显著性检测方法存在检测精度偏低、模型收敛速度慢和计算量大等问题,该文提出一种基于鲁棒视觉变换和多注意力的U型网络(URMNet)模型。该模型使用球形卷积提取全景图像的多尺度特征,减轻了全景图像经等矩形投影后的失真。使用鲁棒视觉变换模块提取4种尺度特征图所包含的显著信息,采用卷积嵌入的方式降低特征图的分辨率,增强模型的鲁棒性。使用多注意力模块,根据空间注意力与通道注意力间的关系,有选择地融合多维度注意力。最后逐步融合多层特征,形成全景图像显著图。纬度加权损失函数使该文模型具有更快的收敛速度。在两个公开数据集上的实验表明,该文所提模型因使用了鲁棒视觉变换模块和多注意力模块,其性能优于其他6种先进方法,能进一步提高全景图像显著性检测精度。 相似文献
11.
12.
To translate in real time, a simultaneous translation system should determine when to stop reading source tokens and generate target tokens corresponding to a partial source sentence read up to that point. However, conventional attention-based neural machine translation (NMT) models cannot produce translations with adequate latency in online scenarios because they wait until a source sentence is completed to compute alignment between the source and target tokens. To address this issue, we propose a reinforced learning (RL)-based attention mechanism, the reinforced attention mechanism, which allows a neural translation model to jointly train the stopping criterion and a partial translation model. The proposed attention mechanism comprises two modules, one to ensure translation quality and the other to address latency. Different from previous RL-based simultaneous translation systems, which learn the stopping criterion from a fixed NMT model, the modules can be trained jointly with a novel reward function. In our experiments, the proposed model has better translation quality and comparable latency compared to previous models. 相似文献
13.
Breast cancer is the most common cancer among women worldwide.Ultrasound is widely used as a harmless test for early breast cancer screening.The ultrasound network(USNet) model is presented.It is an improved object detection model specifically for breast nodule detection on ultrasound images.USNet improved the backbone network,optimized the generation of feature maps,and adjusted the loss function.Finally,USNet trained with real clinical data.The evaluation results show that the trained model has strong nodule detection ability.The mean average precision(mAP) value can reach 0.734 9.The nodule detection rate is 95.11%,and the in situ cancer detection rate is 79.65%.At the same time,detection speed can reach 27.3 frame per second(FPS),and the video data can be processed in real time. 相似文献
14.
15.
已有的实时定位与地图构建(simultaneous localization and mapping,SLAM)方案采用的特征点匹配方法普遍会受视角变化的影响使得特征点的匹配比较困 难,进而 干扰到特征点匹配的精度,最终影响到三维(three-dimensional, 3D) 点云地图构建以及相机运动位姿估计的精度。为 此,本文提出 一种基于注意力机制的特征点匹配网络的SLAM方法。相比于现有的SLA M方法,本 文将SLAM中视觉里程计模块的特征点匹配的方法替换成了一个全新的、基于注意力机制的 特征点匹配网 络的特征点匹配方法,并和传统的特征点提取方法做了一个全新的特征点提取与匹配的组合 ,形成了一个 新的视觉里程计,进而形成了一个新的SLAM方法。首先,通过传统的特征点提取算法进行 特征点的提取, 对提取的特征点及描述子向量进行编码,通过图注意力神经网络进行学习得到匹配描述子, 根据匹配描述 子创建得分矩阵,采用最优传输算法求解最优得分矩阵,计算得到最优匹配点对,到这里就 完成了特征点 提取与匹配的整个过程;基于匹配点对完成相机的定位、建图和回环检测。本文采用KITT I公开数据集 进行实验,实验结果表明采用基于注意力机制特征点匹配网络的SLAM方案,在视角变化不 稳定的情况下,相机运动轨迹误差和相机位姿估计误差的精度明显有所提升。 相似文献
16.
At present, the main super-resolution (SR) method based on convolutional neural network (CNN) is to increase the layer number of the network by skip connection so as to improve the nonlinear expression ability of the model. However, the network also becomes difficult to be trained and converge. In order to train a smaller but better performance SR model, this paper constructs a novel image SR network of multiple attention mechanism(MAMSR), which includes channel attention mechanism and spatial attention mechanism. By learning the relationship between the channels of the feature map and the relationship between the pixels in each position of the feature map, the network can enhance the ability of feature expression and make the reconstructed image more close to the real image. Experiments on public datasets show that our network surpasses some current state-of-the-art algorithms in PSNR, SSIM, and visual effects. 相似文献
17.
Camera-based transmission line detection (TLD) is a fundamental and crucial task for automatically patrolling powerlines by aircraft. Motivated by instance segmentation, a TLD algorithm is proposed in this paper with a novel deep neural network, i.e., CableNet. The network structure is designed based on fully convolutional networks (FCNs) with two major improvements, considering the specific appearance characteristics of transmission lines. First, overlaying dilated convolutional layers and spatial convolutional layers are configured to better represent continuous long and thin cable shapes. Second, two branches of outputs are arranged to generate multidimensional feature maps for instance segmentation. Thus, cable pixels can be detected and assigned cable IDs simultaneously. Multiple experiments are conducted on aerial images, and the results show that the proposed algorithm obtains reliable detection performance and is superior to traditional TLD methods. Meanwhile, segmented pixels can be accurately identified as cable instances, contributing to line fitting for further applications. 相似文献
18.
人耳特征具有良好的唯一性与稳定性等特点,近年来被广泛应用于身份识别领域。针对人耳采集易受头发、耳饰等物品遮挡问题,本文提出了一种基于ERNet的人耳识别方法。该方法在IResNet网络的基础上,引入改进的SE模块,通过融合最大池化与均值池化的统计特性,增强身份相关特征的表示,抑制非相关特征的影响,以此解决在非受控环境下由于遮挡原因造成的识别困难问题。大量实验结果表明,相比较于原网络,改进后的方法识别性能提高较为明显。在同等遮挡条件下,本文所提出的模型具有较好的鲁棒性能。 相似文献
19.
The involvement of external vendors in semiconductor industries increases the chance of hardware Trojan (HT) insertion in different phases of the integrated circuit (IC) design. Recently, several partial reverse engineering (RE) based HT detection techniques are reported, which attempt to reduce the time and complexity involved in the full RE process by applying machine learning or image processing techniques in IC images. However, these techniques fail to extract the relevant image features, not robust to image variations, complicated, less generalizable, and possess a low detection rate. Therefore, to overcome the above limitations, this paper proposes a new partial RE based HT detection technique that detects Trojans from IC layout images using Deep Convolutional Neural Network (DCNN). The proposed DCNN model consists of stacking several convolutional and pooling layers. It layer-wise extracts and selects the most relevant and robust features automatically from the IC images and eliminates the need to apply the feature extraction algorithm separately. To prevent the over-training of the DCNN model, a new stopping condition method and two new metrics, namely Accuracy difference measure (ADM) and Loss difference measure (LDM), are proposed that halts the training only when the performance of our model genuinely drops. Further, to combat the issue of process variations and fabrication noise generated during the RE process, we include noisy images with varying parameters in the training process of the model. We also apply the data augmentation and regularization techniques in the model to address the issues of underfitting and overfitting. Experimental evaluation shows that the proposed technique provides 99% and 97.4% accuracy on Trust-Hub and synthetic ISCAS dataset, respectively, which is on-an-average 15.83% and 21.69% higher than the existing partial RE based techniques. 相似文献
20.
Detection of salient objects in image and video is of great importance in many computer vision applications. In spite of the fact that the state of the art in saliency detection for still images has been changed substantially over the last few years, there have been few improvements in video saliency detection. This paper proposes a novel non-local fully convolutional network architecture for capturing global dependencies more efficiently and investigates the use of recently introduced non-local neural networks in video salient object detection. The effect of non-local operations is studied separately on static and dynamic saliency detection in order to exploit both appearance and motion features. A novel deep non-local fully convolutional network architecture is introduced for video salient object detection and tested on two well-known datasets DAVIS and FBMS. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods. 相似文献