首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Visual attention for the diagnosis of Autism Spectrum Disorder (ASD) which is a kind of mental disorder has attracted the interests of increasing number of researchers. Although multiple visual attention prediction models have been proposed, this problem is still open. In this paper, considering the shift of visual attention, we propose that an image can be viewed as a pseudo sequence. Besides, we propose a novel visual attention prediction method for ASD with hierarchical semantic fusion (ASD-HSF). Specifically, the proposed model mainly contains a Spatial Feature Module (SFM) and a Pseudo Sequential Feature Module (PSFM). SFM is designed to extract spatial semantic features with a fully convolutional network, while PSFM implemented by two Convolutional Long Short-Term Memory networks (ConvLSTMs) is applied to learn pseudo sequential features. And the outputs of these two modules are fused to extract the final saliency map which simultaneously includes spatial semantic information and pseudo sequential information. Experimental results show that the proposed model not only outperforms ten state-of-the-art general saliency prediction counterparts, but also reaches the first and the second ranks under four metrics and the rest ones of ASD saliency prediction respectively.  相似文献   

2.
3.
董波  周燕  王永雄 《电子科技》2009,34(1):23-30
当前的显著性检测算法在复杂场景下难以分割出完整显著性区域以及锐利的边缘细节。针对这一问题,文中提出了一种新颖的特征融合算法。该方法利用全卷积神经网络获取多个层次粗糙的初始特征并结合特征金字塔结构对其深度解析。设计渐进结构感受野模块将特征转换至不同尺度的空间进行优化,实现特征的渐进融合与传递,有选择性地增强显著性区域。采用全局注意力机制消除背景噪声并建立显著性像素之间的长距离依赖,以提高显著性区域的有效性,突出显著性目标,再通过学习融合个层次特征得到显著图。综合实验表明,在绝对误差减小的情况下,F-measure指标远超出其他7种主流方法。所提的显著性模型综合了全卷积神经网络和特征金字塔结构的优点,结合文中设计的渐进结构感受野和全局注意力机制,使得显著图更接近真值图。  相似文献   

4.
在车辆重识别(re-identification,Re-ID) 任务中,通过对全局及局部信息的联合提取已成为目前主流的方法,是许多重识别模型在提取局部信息时只关注了丰富程度而忽略了完整性。针对该问题,提出了一种基于关系融合和特征分解的算法。该算法从空间与通道维度出发,设计对骨干网络所提取的特征沿垂直、水平、通道3维度分割,首先,为了更好地凸显车辆的前景区域,提出一种混合注意力模块(mixed attention module,MAM) ,之后,为了在空间维度上挖掘丰富特征信息的同时使得网络关注更完整的感兴趣区域,设计对垂直及水平方向的分割后的特征实现基于图的关系融合。为了赋予网络捕捉更具判别性信息的能力,在通道方向上对分割后的局部特征实现特征分解。最后,在全局分支特征与局部分支下所提取的鲁棒性特征的共同作用下实现车辆重识别。实验结果表明,本文算法在两个主流车辆重识别数据集上取得了更先进的性能。  相似文献   

5.
Recently, vision transformer has gained a breakthrough in image recognition. Its self-attention mechanism (MSA) can extract discriminative tokens information from different patches to improve image classification accuracy. However, the classification token in its deep layer ignore the local features between layers. In addition, the patch embedding layer feeds fixed-size patches into the network, which inevitably introduces additional image noise. Therefore, we propose a hierarchical attention vision transformer (HAVT) based on the transformer framework. We present a data augmentation method for attention cropping to crop and drop image noise and force the network to learn key features. Second, the hierarchical attention selection (HAS) module is proposed, which improves the network's ability to learn discriminative tokens between layers by filtering and fusing tokens between layers. Experimental results show that the proposed HAVT outperforms state-of-the-art approaches and significantly improves the accuracy to 91.8% and 91.0% on CUB-200–2011 and Stanford Dogs, respectively. We have released our source code on GitHub https://github.com/OhJackHu/HAVT.git.  相似文献   

6.
7.
Video summarization aims at selecting valuable clips for browsing videos with high efficiency. Previous approaches typically focus on aggregating temporal features while ignoring the potential role of visual representations in summarizing videos. In this paper, we present a global difference-aware network(GDANet) that exploits the feature difference across frame and video as guidance to enhance visual features. Initially, a difference optimization module(DOM) is devised to enhance the discrimina...  相似文献   

8.
目前主流的深度融合方法仅利用卷积运算来提取图像局部特征,但图像与卷积核之间的交互过程与内容无关,且不能有效建立特征长距离依赖关系,不可避免地造成图像上下文内容信息的丢失,限制了红外与可见光图像的融合性能。为此,本文提出了一种红外与可见光图像多尺度Transformer融合方法。以Swin Transformer为组件,架构了Conv Swin Transformer Block模块,利用卷积层增强图像全局特征的表征能力。构建了多尺度自注意力编码-解码网络,实现了图像全局特征提取与全局特征重构;设计了特征序列融合层,利用SoftMax操作计算特征序列的注意力权重系数,突出了源图像各自的显著特征,实现了端到端的红外与可见光图像融合。在TNO、Roadscene数据集上的实验结果表明,该方法在主观视觉描述和客观指标评价都优于其他典型的传统与深度学习融合方法。本方法结合自注意力机制,利用Transformer建立图像的长距离依赖关系,构建了图像全局特征融合模型,比其他深度学习融合方法具有更优的融合性能和更强的泛化能力。  相似文献   

9.
In this paper, an end-to-end convolutional neural network is proposed to recover haze-free image named as Attention-Based Multi-Stream Feature Fusion Network (AMSFF-Net). The encoder-decoder network structure is used to construct the network. An encoder generates features at three resolution levels. The multi-stream features are extracted using residual dense blocks and fused by feature fusion blocks. AMSFF-Net has ability to pay more attention to informative features at different resolution levels using pixel attention mechanism. A sharp image can be recovered by the good kernel estimation. Further, AMSFF-Net has ability to capture semantic and sharp textural details from the extracted features and retain high-quality image from coarse-to-fine using mixed-convolution attention mechanism at decoder. The skip connections decrease the loss of image details from the larger receptive fields. Moreover, deep semantic loss function emphasizes more semantic information in deep features. Experimental findings prove that the proposed method outperforms in synthetic and real-world images.  相似文献   

10.
Quality assessment of natural images is influenced by perceptual mechanisms, e.g., attention and contrast sensitivity, and quality perception can be generated in a hierarchical process. This paper proposes an architecture of Attention Integrated Hierarchical Image Quality networks (AIHIQnet) for no-reference quality assessment. AIHIQnet consists of three components: general backbone network, perceptually guided neck network, and head network. Multi-scale features extracted from the backbone network are fused to simulate image quality perception in a hierarchical manner. The attention and contrast sensitivity mechanisms modelled by an attention module capture essential information for quality perception. Considering that image rescaling potentially affects perceived quality, appropriate pooling methods in the non-convolution layers in AIHIQnet are employed to accept images with arbitrary resolutions. Comprehensive experiments on publicly available databases demonstrate outstanding performance of AIHIQnet compared to state-of-the-art models. Ablation experiments were performed to investigate the variants of the proposed architecture and reveal importance of individual components.  相似文献   

11.
黄晨  裴继红  赵阳 《信号处理》2022,38(1):64-73
目前绝大多数的行人属性识别任务都是基于单张图像的,单张图像所含信息有限,而图像序列中包含丰富的有用信息和时序特征,利用序列信息是提高行人属性识别性能的一个重要途径.本文提出了结合时序注意力机制的多特征融合行人序列图像属性识别网络,该网络除了使用常见的空-时二次平均池化特征聚合和空-时平均最大池化特征聚合提取序列的特征外...  相似文献   

12.
Brain CT image classification is critical for assisting brain disease diagnosis. The brain CT images contain much noisy information, and the lesions are unstable in shape and location, making the classification task more difficult when using conventional CNN models. In this paper, we propose a novel Multi-scale Superpixel based Hierarchical Attention (MSHA) model for brain CT classification by introducing the multi-scale superpixels to a hierarchical fusion structure to remove noise and help the model focus on the lesion areas. MSHA contains three modules: (1) a Semantic-level Information Extractor that extracts appearance and geometry information based on the superpixel of the image, (2) a Mixed Multi-head Attention module that obtains the mixed attention features from the semantic-level information, and (3) a Hierarchical Fusion Structure that fuses the multi-scale attention features from coarse to fine. Experiments on the brain CT dataset demonstrate the effectiveness of the proposed model.  相似文献   

13.
针对遥感图像场景分类面临的类内差异性大、类间相似性高导致的部分场景出现分类混淆的问题,该文提出了一种基于双重注意力机制的强鉴别性特征表示方法。针对不同通道所代表特征的重要性程度以及不同局部区域的显著性程度不同,在卷积神经网络提取的高层特征基础上,分别设计了一个通道维和空间维注意力模块,利用循环神经网络的上下文信息提取能力,依次学习、输出不同通道和不同局部区域的重要性权重,更加关注图像中的显著性特征和显著性区域,而忽略非显著性特征和区域,以提高特征表示的鉴别能力。所提双重注意力模块可以与任意卷积神经网络相连,整个网络结构可以端到端训练。通过在两个公开数据集AID和NWPU45上进行大量的对比实验,验证了所提方法的有效性,与现有方法对比,分类准确率取得了明显的提升。  相似文献   

14.
糖尿病视网膜病变(diabetic retinopathy, DR)是目前人类的主要致盲疾病之一。针对DR数据集中样本类间差异小和类分布不均衡等制约分级性能提高的问题,本文提出一种融合注意力线性特征多样化(fusion of attention linear feature diversification, FALFD)的分级算法。该算法首先用改进的Res2Net残差网络作为模型骨干来增大感受野,进一步提高网络捕捉特征信息的能力;其次引入自适应特征多样化模块(adaptive feature diversification module, AFDM)对眼底图像可分辨的微小病理特征进行识别,获得具有高语义信息的局部特征,避免单一特征区域的限制,进而提高分级准确度;再后利用双线性注意力融合模块(bilinear attention fusion module, BAFM)增加可判别区域特征的网络权重占比;最后采用正则化焦点损失(focal loss, FL)进一步提升算法的分类性能。在IDRID数据集上,灵敏度和特异性分别为94.20%和97.05%,二次加权系数为87.83%;在APTO...  相似文献   

15.
针对骨架行为识别对时空特征提取不充分以及难以捕捉全局上下文信息的问题,研究了一种将时空注意力机制和自适应图卷积网络相结合的人体骨架行为识别方案。首先,构建基于非局部操作的时空注意力模块,辅助模型关注骨架序列中最具判别性的帧和区域;其次,利用高斯嵌入函数和轻量级卷积神经网络的特征学习能力,并考虑人体先验知识在不同时期的影响,构建自适应图卷积网络;最后,将自适应图卷积网络作为基本框架,并嵌入时空注意力模块,与关节信息、骨骼信息以及各自的运动信息构建双流融合模型。该算法在NTU RGB+D数据集的两种评价标准下分别达到了90.2%和96.2%的准确率,在大规模的数据集Kinetics上体现出模型的通用性,验证了该算法在提取时空特征和捕捉全局上下文信息上的优越性。   相似文献   

16.
针对合成孔径雷达(Synthetic Aperture Radar, SAR)图像中飞机目标尺度多样性及背景强散射干扰的问题,提出了一种基于坐标注意力和自适应特征融合的YOLOv4 SAR图像飞机目标检测算法。该方法首先在主干网络引入坐标注意力机制,以增强对于飞机散射点组合结构的聚焦能力以及抗背景干扰能力。其次,在特征增强网络中引入自适应特征融合机制,提高了对不同大小飞机的特征提取能力,同时改善了YOLOv4算法召回率和精确率不平衡的问题。最后,通过改进的K-Means聚类针对飞机目标调整先验框的尺寸,提高了模型的定位精度。实验结果表明,改进算法召回率达到91.01%,精确率达到90.09%,AP0.5达到92.34%,分别较原YOLOv4算法提高2.49%,6.56%和3.62%。  相似文献   

17.
Crowd counting algorithms have recently incorporated attention mechanisms into convolutional neural networks (CNNs) to achieve significant progress. The channel attention model (CAM), as a popular attention mechanism, calculates a set of probability weights to select important channel-wise feature responses. However, most CAMs roughly assign a weight to the entire channel-wise map, which makes useful and useless information being treat indiscriminately, thereby limiting the representational capacity of networks. In this paper, we propose a multi-scale and spatial position-based channel attention network (MS-SPCANet), which integrates spatial position-based channel attention models (SPCAMs) with multiple scales into a CNN. SPCAM assigns different channel attention weights to different positions of channel-wise maps to capture more informative features. Furthermore, an adaptive loss, which uses adaptive coefficients to combine density map loss and headcount loss, is constructed to improve network performance in sparse crowd scenes. Experimental results on four public datasets verify the superiority of the scheme.  相似文献   

18.
In this paper, we exploit features extracted from convolutional neural network (CNN) to be better utilized for visual tracking. It is observed that CNN features in higher levels provide semantic information which is robust to appearance variations. Thus we integrate the hierarchical features in different layers of a deep model to correlation filter tracking framework. More specifically, correlation filters are learned on each layer to encode the object appearance. The peak-to-sidelobe ratio (PSR) is employed to measure the differences between image patches. To leverage the robustness of our model, we develop an adaptive model updating scheme to train the correlation filters according to different response maps. Extensive experimental results on three large scale benchmark datasets show that the proposed algorithm performs favorably against state-of-the-art methods.  相似文献   

19.
There have been remarkable improvements in the salient object detection in the recent years. During the past few years, graph-based saliency detection algorithms have been proposed and made advances. Nevertheless, most of the state-of-the-art graph-based approaches are usually designed with low-level features, misleading assumption, fixed predefined graph structure and weak affinity matrix, which determine that they are not robust enough to handle images with complex or cluttered background.In this paper, we propose a robust label propagation-based mechanism for salient object detection throughout an adaptive graph to tackle above issues. Low-level features as well as deep features are integrated into the proposed framework to measure the similarity between different nodes. In addition, a robust mechanism is presented to calculate seeds based on the distribution of salient regions, which can achieve desirable results even if the object is in contact with the image boundary and the image scene is complex. Then, an adaptive graph with multiview connections is constructed based on different cues to learn the graph affinity matrix, which can better capture the characteristics between spatially adjacent and distant regions. Finally, a novel RLP-AGMC model, i.e. robust label propagation throughout an adaptive graph with multiview connections, is put forward to calculate saliency maps in combination with the obtained seed vectors. Comprehensive experiments on six public datasets demonstrate the proposed method outperforms fourteen existing state-of-the-art methods in terms of various evaluation metrics.  相似文献   

20.
通过引入基于卷积神经网络(convolutional neural network, CNN)的分类算法,高光谱图像(hyperspectral image, HSI)分类任务的精度取得显著的提升,但目前主流CNN算法往往较为复杂且参数量大,从而导致网络难以训练以及容易产生过拟合问题。为在保证网络分类性能的前提下实现轻量化,本文提出一个轻量级架构的基于光谱-空间注意力交互机制的CNN网络用于HSI分类。为实现HSI的光谱-空间特征提取,构建了一个轻量化的双路径骨干网络用于两种特征的提取和融合。其次,为提高特征的表征能力,设计了两个注意力模块分别用于光谱和空间特征的权重再调整。同时,为加强双路径特征之间的关联以实现特征的更好融合,注意力交互机制被引入到网络中以进一步提升网络性能。在3个真实HSI数据集上的分类结果表明,本文所提网络可达到99.5%的分类准确度,并相比于其他网络至少减少50%的参数量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号