首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Saliency prediction on RGB-D images is an underexplored and challenging task in computer vision. We propose a channel-wise attention and contextual interaction asymmetric network for RGB-D saliency prediction. In the proposed network, a common feature extractor provides cross-modal complementarity between the RGB image and corresponding depth map. In addition, we introduce a four-stream feature-interaction module that fully leverages multiscale and cross-modal features for extracting contextual information. Moreover, we propose a channel-wise attention module to highlight the feature representation of salient regions. Finally, we refine coarse maps through a corresponding refinement block. Experimental results show that the proposed network achieves a performance comparable with state-of-the-art saliency prediction methods on two representative datasets.  相似文献   

2.
Schemes to complement context relationships by cross-scale feature fusion have appeared in many RGB-D scene parsing algorithms; however, most of these works conduct multi-scale information interaction after multi-modal feature fusion, which ignores the information loss of the two modes in the original coding. Therefore, a cross-complementary fusion network (CCFNet) is designed in this paper to calibrate the multi-modal information before feature fusion, so as to improve the feature quality of each mode and the information complementarity ability of RGB and the depth map. First, we divided the features into low, middle, and high levels, among which the low-level features contain the global details of the image and the main learning features include texture, edge, and other features. The middle layer features contain not only some global detail features but also some local semantic features. Additionally, the high-level features contain rich local semantic features. Then, the feature information lost in the coding process of low and middle level features is supplemented and extracted through the designed cross feature enhancement module, and the high-level features are extracted through the feature enhancement module. In addition, the cross-modal fusion module is designed to integrate multi-modal features of different levels. The experimental results verify that the proposed CCFNet achieves excellent performance on the RGB-D scene parsing dataset containing clothing images, and the generalization ability of the model is verified by the dataset NYU Depth V2.  相似文献   

3.
Recently, Convolutional Neural Networks (CNNs) have achieved great success in Single Image Super-Resolution (SISR). In particular, the recursive networks are now widely used. However, existing recursion-based SISR networks can only make use of multi-scale features in a layer-wise manner. In this paper, a Deep Recursive Multi-Scale Feature Fusion Network (DRMSFFN) is proposed to address this issue. Specifically, we propose a Recursive Multi-Scale Feature Fusion Block (RMSFFB) to make full use of multi-scale features. Besides, a Progressive Feature Fusion (PFF) technique is proposed to take advantage of the hierarchical features from the RMSFFB in a global manner. At the reconstruction stage, we use a deconvolutional layer to upscale the feature maps to the desired size. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed DRMSFFN in comparison with the state-of-the-art methods in both quantitative and qualitative evaluations.  相似文献   

4.
刘亚灵  郭敏  马苗 《光电子.激光》2021,32(12):1271-1277
针对声音事件检测中仅在时频维度使用注意力机制的局限性以及卷积层单一导致的特征提取不足问题,本文提出基于多尺度注意力特征融合的卷积循环神经网络(convolutional recurrent neural network,CRNN)模型,以提高声音事件检测性能.首先,提出多尺度注意力模块,实现对局部时频单元和全局通道特征...  相似文献   

5.
Driver distraction has currently been a global issue causing the dramatic increase of road accidents and casualties. However, recognizing distracted driving action remains a challenging task in the field of computer vision, since inter-class variations between different driver action categories are quite subtle. To overcome this difficulty, in this paper, a novel deep learning based approach is proposed to extract fine-grained feature representation for image-based driver action recognition. Specifically, we improve the existing convolutional neural network from two aspects: (1) we employ multi-scale convolutional block with different receptive fields of kernel sizes to generate hierarchical feature map and adopt maximum selection unit to adaptively combine multi-scale information; (2) we incorporate an attention mechanism to learn pixel saliency and channel saliency between convolutional features so that it can guide the network to intensify local detail information and suppress global background information. For experiment, we evaluate the designed architecture on multiple driver action datasets. The quantitative experiment result shows that the proposed multi-scale attention convolutional neural network (MSA-CNN) obtains the state of the art performance in image-based driver action recognition.  相似文献   

6.
Aiming at the under-segmentation of 3D point cloud semantic segmentation caused by the lack of contextual fine-grained information of the point cloud,an algorithm based on contextual attention CNN was proposed for 3D point cloud semantic segmentation.Firstly,the fine-grained features in local area of the point cloud were mined through the attention coding mechanism.Secondly,the contextual features between multi-scale local areas were captured by the contextual recurrent neural network coding mechanism and compensated with the fine-grained local features.Finally,the multi-head mechanism was used to enhance the generalization ability of the network.Experiments show that the mIoU of the proposed algorithm on the three standard datasets of ShapeNet Parts,S3DIS and vKITTI are 85.4%,56.7% and 38.1% respectively,which has good segmentation performance and good generalization ability.  相似文献   

7.
Aggregation of local and global contextual information by exploiting multi-level features in a fully convolutional network is a challenge for the pixel-wise salient object detection task. Most existing methods still suffer from inaccurate salient regions and blurry boundaries. In this paper, we propose a novel edge-aware global and local information aggregation network (GLNet) to fully exploit the integration of side-output local features and global contextual information and utilization of contour information of salient objects. The global guidance module (GGM) is proposed to learn discriminative multi-level information with the direct guidance of global semantic knowledge for more accurate saliency prediction. Specifically, the GGM consists of two key components, where the global feature discrimination module exploits the inter-channel relationship of global semantic features to boost representation power, and the local feature discrimination module enables different side-output local features to selectively learn informative locations by fusing with global attentive features. Besides, we propose an edge-aware aggregation module (EAM) to employ the correlation between salient edge information and salient object information for generating estimated saliency maps with explicit boundaries. We evaluate our proposed GLNet on six widely-used saliency detection benchmark datasets by comparing with 17 state-of-the-art methods. Experimental results show the effectiveness and superiority of our proposed method on all the six benchmark datasets.  相似文献   

8.
针对现有去运动模糊网络在图像恢复过程中出现的纹理细节丢失、无法抑制噪声、产生振铃伪影等问题,提出一种基于多尺度密集连接和U-Net改进的动态场景去模糊算法。首先,借助U-Net网络中空洞卷积下采样有效扩大感受野,在不增加参数量的情况下避免图片产生不可逆损伤,并利用亚像素卷积在上采样过程中以小的卷积核获得清晰的图像细节,降低运算复杂度;其次,设计多尺度密集特征提取模块(multi-scale dense feature extraction, MDFE),通过密集连接的卷积层加强深层次特征提取和复用,运用空间金字塔池化(spatial pyramid pooling, SPP)分支引导多尺度特征的传递和融合,促进图像细节纹理的有效保留;最后,采用ConvLSTM双向连通结构(bidirectional convolution LSTM unit, BCLU)以非线性方式从编码路径补偿简单级联流失的上下文特征,推动深度特征跨阶段相互作用,弱化边缘伪影和噪声干扰。与现有先进方法对比,验证了本文所提算法在性能上的优势。  相似文献   

9.
目前主流的深度融合方法仅利用卷积运算来提取图像局部特征,但图像与卷积核之间的交互过程与内容无关,且不能有效建立特征长距离依赖关系,不可避免地造成图像上下文内容信息的丢失,限制了红外与可见光图像的融合性能。为此,本文提出了一种红外与可见光图像多尺度Transformer融合方法。以Swin Transformer为组件,架构了Conv Swin Transformer Block模块,利用卷积层增强图像全局特征的表征能力。构建了多尺度自注意力编码-解码网络,实现了图像全局特征提取与全局特征重构;设计了特征序列融合层,利用SoftMax操作计算特征序列的注意力权重系数,突出了源图像各自的显著特征,实现了端到端的红外与可见光图像融合。在TNO、Roadscene数据集上的实验结果表明,该方法在主观视觉描述和客观指标评价都优于其他典型的传统与深度学习融合方法。本方法结合自注意力机制,利用Transformer建立图像的长距离依赖关系,构建了图像全局特征融合模型,比其他深度学习融合方法具有更优的融合性能和更强的泛化能力。  相似文献   

10.
To extract decisive features from gesture images and solve the problem of information redundancy in the existing gesture recognition methods, we propose a new multi-scale feature extraction module named densely connected Res2Net (DC-Res2Net) and design a feature fusion attention module (FFA). Firstly, based on the new dimension residual network (Res2Net), the DC-Res2Net uses channel grouping to extract fine-grained multi-scale features, and dense connection has been adopted to extract stronger features of different scales. Then, we apply a selective kernel network (SK-Net) to enhance the representation of effective features. Afterwards, the FFA has been designed to remove redundant information in features by fusing low-level location features with high-level semantic features. Finally, experiments have been conducted to validate our method on the OUHANDS, ASL, and NUS-II datasets. The results demonstrate the superiority of DC-Res2Net and FFA, which can extract more decisive features and remove redundant information while ensuring high recognition accuracy and low computational complexity.  相似文献   

11.
该文提出了一种结合区域和深度残差网络的语义分割模型。基于区域的语义分割方法使用多尺度提取相互重叠的区域,可识别多种尺度的目标并得到精细的物体分割边界。基于全卷积网络的方法使用卷积神经网络(CNN)自主学习特征,可以针对逐像素分类任务进行端到端训练,但是这种方法通常会产生粗糙的分割边界。该文将两种方法的优点结合起来:首先使用区域生成网络在图像中生成候选区域,然后将图像通过带扩张卷积的深度残差网络进行特征提取得到特征图,结合候选区域以及特征图得到区域的特征,并将其映射到区域中每个像素上;最后使用全局平均池化层进行逐像素分类。该文还使用了多模型融合的方法,在相同的网络模型中设置不同的输入进行训练得到多个模型,然后在分类层进行特征融合,得到最终的分割结果。在SIFT FLOW和PASCAL Context数据集上的实验结果表明该文方法具有较高的平均准确率。  相似文献   

12.
Because salient objects usually have fewer data in a scene, the problem of class imbalance is often encountered in salient object detection (SOD). In order to address this issue and achieve the consistent salient objects, we propose an adversarial focal loss network with improving generative adversarial networks for RGB-D SOD (called AFLNet), in which color and depth branches constitute the generator to achieve the saliency map, and adversarial branch with high-order potentials, instead of pixel-wise loss function, refines the output of the generator to obtain contextual information of objects. We infer the adversarial focal loss function to solve the problem of foreground–background class imbalance. To sufficiently fuse the high-level features of color and depth cues, an inception model is adopted in deep layers. We conduct a large number of experiments using our proposed model and its variants, and compare them with state-of-the-art methods. Quantitative and qualitative experimental results exhibit that our proposed approach can improve the accuracy of salient object detection and achieve the consistent objects.  相似文献   

13.
杨勇  吴峥  张东阳  刘家祥 《信号处理》2020,36(9):1598-1606
为了在图像重建质量和网络参数之间取得较好的平衡,本文提出一种基于渐进式特征增强网络的超分辨率(Super-Resolution,SR)重建算法。该方法主要包含两个模块:浅层信息增强模块和深层信息增强模块。在浅层信息增强模块中,首先利用单层卷积层提取低分辨率(Low-Resolution,LR)图像的浅层信息,再通过我们设计的多尺度注意力块来实现特征的提取和增强。深层信息增强模块先利用残差学习块学习图像的深度信息,然后将得到的深层信息通过设计的多尺度注意力块来获得增强后的深层多尺度信息。最后我们利用跳转连接的方式将首层得到的浅层信息和深层多尺度信息进行像素级相加得到融合特征图,再对其进行上采样操作,得到最终的高分辨率(High-Resolution, HR)图像。实验结果表明,相比于一些主流的深度学习超分辨率方法,本文方法重建得到的图像无论是主观效果还是客观指标,都取得了更好的效果。   相似文献   

14.
针对红外图像细节分辨率不高、目标边缘模糊等,提出一种基于改进生成对抗网络的红外图像增强算法。首先,基于编码解码网络U-Net构建生成器,优化U-Net跳跃连接方式,融合全局上下文模块,实现全局和局部特征的上下文建模;然后,基于胶囊网络构建鉴别器,结合Res2Net改进胶囊网络结构,并对胶囊网络全连接层进行反卷积重构,实现多尺度图像特征提取,减少模型参数冗余。实验表明,与当前主流算法相比,该算法能有效突出细节信息、抑制噪声,提高图像分辨率和视觉效果。  相似文献   

15.
Convolutional neural network (CNN) based methods have recently achieved extraordinary performance in single image super-resolution (SISR) tasks. However, most existing CNN-based approaches increase the model’s depth by stacking massive kernel convolutions, bringing expensive computational costs and limiting their application in mobile devices with limited resources. Furthermore, large kernel convolutions are rarely used in lightweight super-resolution designs. To alleviate the above problems, we propose a multi-scale convolutional attention network (MCAN), a lightweight and efficient network for SISR. Specifically, a multi-scale convolutional attention (MCA) is designed to aggregate the spatial information of different large receptive fields. Since the contextual information of the image has a strong local correlation, we design a local feature enhancement unit (LFEU) to further enhance the local feature extraction. Extensive experimental results illustrate that our proposed MCAN can achieve better performance with lower model complexity compared with other state-of-the-art lightweight methods.  相似文献   

16.
目前,基于深度学习的融合方法依赖卷积核提取局部特征,而单尺度网络、卷积核大小以及网络深度的限制无法满足图像的多尺度与全局特性.为此,本文提出了红外与可见光图像注意力生成对抗融合方法.该方法采用编码器和解码器构成的生成器以及两个判别器.在编码器中设计了多尺度模块与通道自注意力机制,可以有效提取多尺度特征,并建立特征通道长...  相似文献   

17.
为提高单幅图像去雾方法的准确性及其去雾结果的细节可见性,该文提出一种基于多尺度特征结合细节恢复的单幅图像去雾方法。首先,根据雾在图像中的分布特性及成像原理,设计多尺度特征提取模块及多尺度特征融合模块,从而有效提取有雾图像中与雾相关的多尺度特征并进行非线性加权融合。其次,构造基于所设计多尺度特征提取模块和多尺度特征融合模块的端到端去雾网络,并利用该网络获得初步去雾结果。再次,构造基于图像分块的细节恢复网络以提取细节信息。最后,将细节恢复网络提取出的细节信息与去雾网络得到的初步去雾结果融合得到最终清晰的去雾图像,实现对去雾后图像视觉效果的增强。实验结果表明,与已有代表性的图像去雾方法相比,所提方法能够对合成图像及真实图像中的雾进行有效去除,且去雾结果细节信息保留完整。  相似文献   

18.
RGB-D图像显著性检测是在一组成对的RGB和Depth图中识别出视觉上最显著突出的目标区域。已有的双流网络,同等对待多模态的RGB和Depth图像数据,在提取特征方面几乎一致。然而,低层的Depth特征存在较大噪声,不能很好地表征图像特征。因此,该文提出一种多模态特征融合监督的RGB-D图像显著性检测网络,通过两个独立流分别学习RGB和Depth数据,使用双流侧边监督模块分别获取网络各层基于RGB和Depth特征的显著图,然后采用多模态特征融合模块来融合后3层RGB和Depth高维信息生成高层显著预测结果。网络从第1层至第5层逐步生成RGB和Depth各模态特征,然后从第5层到第3层,利用高层指导低层的方式产生多模态融合特征,接着从第2层到第1层,利用第3层产生的融合特征去逐步地优化前两层的RGB特征,最终输出既包含RGB低层信息又融合RGB-D高层多模态信息的显著图。在3个公开数据集上的实验表明,该文所提网络因为使用了双流侧边监督模块和多模态特征融合模块,其性能优于目前主流的RGB-D显著性检测模型,具有较强的鲁棒性。  相似文献   

19.
Bottom-up and top-down visual cues are two types of information that helps the visual saliency models. These salient cues can be from spatial distributions of the features (space-based saliency) or contextual/task-dependent features (object-based saliency). Saliency models generally incorporate salient cues either in bottom-up or top-down norm separately. In this work, we combine bottom-up and top-down cues from both space- and object-based salient features on RGB-D data. In addition, we also investigated the ability of various pre-trained convolutional neural networks for extracting top-down saliency on color images based on the object dependent feature activation. We demonstrate that combining salient features from color and dept through bottom-up and top-down methods gives significant improvement on the salient object detection with space-based and object-based salient cues. RGB-D saliency integration framework yields promising results compared with the several state-of-the-art-models.  相似文献   

20.
赵倩  周冬明  杨浩  王长城  李淼 《红外与激光工程》2022,51(10):20220018-1-20220018-13
针对相机抖动、拍摄物体快速运动以及低快门速度等因素造成的图像非均匀模糊,提出一种结合多尺度特征融合和多输入多输出编-解码器的去模糊算法。首先使用多尺度特征提取模块来提取较小尺度模糊图像的初始特征,该模块使用扩张卷积来以较少的参数量获得更大的感受野。其次,通过特征注意力模块来自适应地学习不同尺度特征中的有效信息,该模块利用小尺度图像的特征来生成注意图,能够有效地减少冗余特征。最后,使用多尺度特征渐进融合模块逐步融合不同尺度的特征,使得不同尺度特征信息能够进行互补。相比以往的使用多个子网堆叠的多尺度方法,文中使用单个网络就能提取多尺度特征,从而降低了训练难度。为了评估网络的去模糊效果和泛化性能,提出的算法在基准数据集GoPro、HIDE和真实数据集RealBlur上均进行了测试。在GoPro和HIDE数据集上的峰值信噪比值分别为31.73 dB和29.39 dB,结构相似度值分别为0.951和0.923,其结果均高于目前先进的去模糊算法,并且在真实数据集RealBlur上也取得了最佳效果。实验结果表明,提出的去模糊算法相比现有算法去模糊更为彻底,能有效地复原图像的边缘轮廓和纹理细节信息,并且能够提升后续高级计算机视觉任务的鲁棒性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号