首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recently, Convolutional Neural Networks (CNNs) have achieved great success in Single Image Super-Resolution (SISR). In particular, the recursive networks are now widely used. However, existing recursion-based SISR networks can only make use of multi-scale features in a layer-wise manner. In this paper, a Deep Recursive Multi-Scale Feature Fusion Network (DRMSFFN) is proposed to address this issue. Specifically, we propose a Recursive Multi-Scale Feature Fusion Block (RMSFFB) to make full use of multi-scale features. Besides, a Progressive Feature Fusion (PFF) technique is proposed to take advantage of the hierarchical features from the RMSFFB in a global manner. At the reconstruction stage, we use a deconvolutional layer to upscale the feature maps to the desired size. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed DRMSFFN in comparison with the state-of-the-art methods in both quantitative and qualitative evaluations.  相似文献   

2.
本文提出了一种场景文本检测方法,用于应对复杂自然场景中文本检测的挑战。该方法采用了双重注意力和多尺度特征融合的策略,通过双重注意力融合机制增强了文本特征通道之间的关联性,提升了整体检测性能。在考虑到深层特征图上下采样可能引发的语义信息损失的基础上,提出了空洞卷积多尺度特征融合金字塔(dilated convolution multi-scale feature fusion pyramid structure, MFPN),它采用双融合机制来增强语义特征,有助于加强语义特征,克服尺度变化的影响。针对不同密度信息融合引发的语义冲突和多尺度特征表达受限问题,创新性地引入了多尺度特征融合模块(multi-scale feature fusion module, MFFM)。此外,针对容易被冲突信息掩盖的小文本问题,引入了特征细化模块(feature refinement module, FRM)。实验表明,本文的方法对复杂场景中文本检测有效,其F值在CTW1500、ICDAR2015和Total-Text 3个数据集上分别达到了85.6%、87.1%和86.3%。  相似文献   

3.
刘亚灵  郭敏  马苗 《光电子.激光》2021,32(12):1271-1277
针对声音事件检测中仅在时频维度使用注意力机制的局限性以及卷积层单一导致的特征提取不足问题,本文提出基于多尺度注意力特征融合的卷积循环神经网络(convolutional recurrent neural network,CRNN)模型,以提高声音事件检测性能.首先,提出多尺度注意力模块,实现对局部时频单元和全局通道特征...  相似文献   

4.
针对复杂道路场景下行人检测精度与速度难以提升的问题,提出一种融合多尺度信息和跨维特征引导的轻量级行人检测算法。首先以高性能检测器YOLOX为基础框架,构建多尺度轻量卷积并嵌入主干网络中,以获取多尺度特征信息。然后设计了一种端到端的轻量特征引导注意力模块,采用跨维通道加权的方式将空间信息与通道信息融合,引导模型关注行人的可视区域。最后为减少模型在轻量化过程中特征信息的损失,使用增大感受野的深度可分离卷积构建特征融合网络。实验结果表明,相比于其他主流检测算法,所提算法在KITTI数据集上达到了71.03%的检测精度和80 FPS的检测速度,在背景复杂、密集遮挡、尺度不一等场景中都具有较好的鲁棒性和实时性。  相似文献   

5.
Visual domain adaptation has attracted much attention and has made great achievement in recent years. It deals with the problem of distribution divergence between source and target domains. Current methods mostly focus on transforming images from different domains into a common space to minimize the distribution divergence. However, there are many irrelevant source samples for target domain even after the transformation. In order to eliminate the irrelevant samples, we develop a sample selection algorithm using sparse coding theory. We do the sample selection in a common subspace of source and target data to find as many as relevant source samples. In the common subspace, data characteristics are preserved by using graph regularization. Therefore, we can select the most relevant samples for our target image classification task. Moreover, in order to build a discriminative classifier for the target domain, we use not only the common part of source and target domains learned in the common subspace but also the specific part of target domain. The algorithm can be extended to handle samples from multiple source domains. Experimental results show that our visual domain adaptation method on the image classification tasks can be very effective for the state-of-the-art datasets.  相似文献   

6.
Recent advances in unsupervised domain adaptation mainly focus on learning shared representations by global statistics alignment, such as the Maximum Mean Discrepancy (MMD) which matches the Mean statistics across domains. The lack of class information, however, may lead to partial alignment (or even misalignment) and poor generalization performance. For robust domain alignment, we argue that the similarities across different features in the source domain should be consistent with that in the target domain. Based on this assumption, we propose a new domain discrepancy metric, i.e., Self-similarity Consistency (SSC), to enforce the pairwise relationship between different features being consistent across domains. The Gram matrix matching and Correlation Alignment is proven to be a special case, and a sub-optimal measure of our proposed SSC. Furthermore, we also propose to mitigate the side effect of the partial alignment and misalignment by incorporating the discriminative information of the deep representations. Specifically, a simple yet effective feature norm constraint is exploited to enlarge the discrepancy of inter-class samples. It relieves the requirements of strict alignment when performing adaptation, therefore improving the adaptation performance significantly. Extensive experiments on visual domain adaptation tasks demonstrate the effectiveness of our proposed SSC metric and feature discrimination approach.  相似文献   

7.
While many efforts have been devoted to addressing image denoising and achieve continuously improving results during the past few decades, it is fair to say that no a stand-alone method is consistently better than others. Nonetheless, many existing denoising methods, each having a different denoising capability, can yield various but complementary denoised images with respect to specific local areas. To effectively exploit the complementarity and diversity among the denoised images obtained with different denoisers, in this work we fuse them to produce an overall better result, which is fundamental to achieve robust and competitive denoising performance especially for complex scenes. A framework called deep fusion network (DFNet) is proposed to generate a consistent estimation about the final denoised image, taking advantage of the complementarity of denoisers and suppressing the bias. Specifically, given a noisy image, we first exploit a set of representative image denoisers to denoise it respectively, and obtain the corresponding initial denoised images. Then these initial denoised images are concatenated and fed into the proposed DFNet, and the proposed DFNet seeks to adjust its network parameters to produce the fused image (as the final denoised image) with an unsupervised training strategy through minimizing the carefully designed loss function. The experimental results show that our approach outperforms the stand-alone methods as well as the ones using combination strategy by large margin both in objective and subjective evaluations. Compared to the those methods that are relatively close to our strategy, the proposed DFNet is extensible and parameter free, which means it can cope with a variable number of different denoisers and avoid the manual intervention during the fusion process. The proposed DFNet has greater flexibility and better practicality.  相似文献   

8.
Formulating steganalysis as a binary classification problem has been highly successful. However, the existing detection algorithms are difficult to obtain high detection accuracy when applied in real-world circumstances. Because so-called model mismatch problem often occurs owing to unknown cover source and embedding parameters. To avoid the mess of model mismatch, we propose a new unsupervised universal steganalysis framework to detect individual stego images. First, cover images with statistical properties similar to those of the given test image are searched from a retrieval cover database to establish an aided cover sample set. Second, unsupervised outlier detection is performed on a test set composed of the given test image and its aided cover sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called Similarity Retrieval of Image Statistical Properties (SRISP)-aided unsupervised outlier detection, requires no training, and thus it does not suffer from model mismatch. The framework employs standard steganalysis features and detects each test image individually. Experimental results illustrate that the framework substantially outperforms one-class support vector machine and the traditional unsupervised outlier detectors without considering SRISP; its detection performance is independent of the proportion of stego images in the test samples.  相似文献   

9.
在实际工业环境下,光线昏暗、文本不规整、设备有限等因素,使得文本检测成为一项具有挑战性的任务。针对此问题,设计了一种基于双线性操作的特征向量融合模块,并联合特征增强与半卷积组成轻量级文本检测网络RGFFD(ResNet18+GhostModule+特征金字塔增强模块(feature pyramid enhancement module,FPEM)+特征融合模块(feature fusion module,FFM)+可微分二值化(differenttiable binarization,DB))。其中,Ghost模块内嵌特征增强模块,提升特征提取能力,双线性特征向量融合模块融合多尺度信息,添加自适应阈值分割算法提高DB模块分割能力。在实际工厂环境下,采用嵌入式设备UP2 board对货箱编号进行文本检测,RGFFD检测速度达到6.5 f/s。同时在公共数据集ICDAR2015、Total-text上检测速度分别达到39.6 f/s和49.6 f/s,在自定义数据集上准确率达到88.9%,检测速度为30.7 f/s。  相似文献   

10.
针对行人重识别无监督跨域迁移问题,提出一种 基于域鉴别网络和域自适应的行人重识别算法。首先,使用改 进ResNet-50训练监督域鉴别网络模型,加入共享空间组件得到特征 不变属性,用于区分类间图像,并基 于对比损失和差异损失来提高模型的分类性能。其次,利用域自适应无监督迁移方法由源域 数据集导出特 征不变属性,并应用到未标记的目标域数据集上。最后,匹配查询图像和共享空间中的图库 图像执行跨域 行人重识别。为验证算法有效性,在CUHK03、Market-1501和DukeMTMC-reID数据集上进行了实验,算法 在Rank-1准确度分别达到34.1%、38.1%和28.3%,在mAP分别达到34.2%、17. 1%和17.5%,最后还验证了 模型各个组件在训练阶段的必要性。结果表明本文算法在大规模数据集上的性能优于现有的 一些无监督行人重识别方法,甚至接近于某些传统监督学习方法的性能。  相似文献   

11.
针对自然场景中任意形状文本容易漏检、错检的问题,提出了一种基于双重注意力融合和空洞残差特征增强的场景文本检测方法.为了增强文本特征通道之间的潜在联系,提出了双重注意力融合(DAF)模块,采用双向特征金字塔与双重注意力融合模块相结合的方式进行多层的特征融合;另外针对深层特征图在降维的过程中可能造成语义丢失的现象,提出了空...  相似文献   

12.
龙华  杨明亮  邵玉斌 《通信学报》2020,41(4):134-142
针对语音通话中语音段的起始检测性能不佳,检测语音连续性结构受到破坏的问题,提出了一种基于特征流融合的带噪语音检测算法。首先,根据语音特性分别提取时域特征流、谱图特征流和统计特征流;其次,利用不同的语音特征流分别对带噪音频中的语音段进行概率估测;最后,将各个特征流估测得到的语音估测概率进行加权融合,并利用隐马尔可夫模型对语音估测概率进行短时状态处理。通过对复合语音数据库在多类型噪声与不同信噪比条件下的性能测试表明,所提算法相对于基于贝叶斯与DNN分类器的基线模型相比,语音检测正确率分别提高了21.26%与11.01%,显著提高了目标语音的质量。  相似文献   

13.
14.
Learning-based shadow detection methods have achieved an impressive performance, while these works still struggle on complex scenes, especially ambiguous soft shadows. To tackle this issue, this work proposes an efficient shadow detection network (ESDNet) and then applies uncertainty analysis and graph convolutional networks for detection refinement. Specifically, we first aggregate global information from high-level features and harvest shadow details in low-level features for obtaining an initial prediction. Secondly, we analyze the uncertainty of our ESDNet for an input shadow image and then take its intensity, expectation, and entropy into account to formulate a semi-supervised graph learning problem. Finally, we solve this problem by training a graph convolution network to obtain the refined detection result for every training image. To evaluate our method, we conduct extensive experiments on several benchmark datasets, i.e., SBU, UCF, ISTD, and even on soft shadow scenes. Experimental results demonstrate that our strategy can improve shadow detection performance by suppressing the uncertainties of false positive and false negative regions, achieving state-of-the-art results.  相似文献   

15.
Single image deblurring aims to restore the single blurry image to its sharp counterpart and remains an active topic of enduring interest. Recently, deep Convolutional Neural Network (CNN) based methods have achieved promising performance. However, two primary limitations mainly exist on those CNNs-based image deblurring methods: most of them simply focus on increasing the complexity of the network, and rarely make full use of features extracted by encoder. Meanwhile, most of the methods perform the deblurred image reconstruction immediately after the decoder, and the roles of the decoded features are always underestimated. To address these issues, we propose a single image deblurring method, in which two modules to fuse multiple features learned in encoder (the Cross-layer Feature Fusion (CFF) module) and manipulate the features after decoder (the Consecutive Attention Module (CAM)) are specially designed, respectively. The CFF module is to concatenate different layers of features from encoder to enhance rich structural information to decoder, and the CAM module is able to generate more important and correlated textures to the reconstructed sharp image. Besides, the ranking content loss is employed to further restore more realistic details in the deblurred images. Comprehensive experiments demonstrate that our proposed method can generate less blur and more textures in deblurred image on both synthetic datasets and real-world image examples.  相似文献   

16.
针对传统编解码结构的医学图像分割网络存在特征信息利用率低、泛化能力不足等问题,该文提出了一种结合编解码模式的多尺度语义感知注意力网络(multi-scale semantic perceptual attention network,MSPA-Net) 。首先,该网络在解码路径加入双路径多信息域注意力模块(dual-channel multi-information domain attention module,DMDA) ,提高特征信息的提取能力;其次,网络在级联处加入空洞卷积模块(dense atrous convolution module,DAC) ,扩大卷积感受野;最后,借鉴特征融合思想,设计了可调节多尺度特征融合模块 (adjustable multi-scale feature fusion,AMFF) 和双路自学习循环连接模块(dual self-learning recycle connection module,DCM) ,提升网络的泛化性和鲁棒性。为验证网络的有效性,在CVC-ClinicDB、ETIS-LaribPolypDB、COVID-19 CHEST X-RAY、Kaggle_3m、ISIC2017和Fluorescent Neuronal Cells等数据 集上进行验证,实验结果表明,相似系数分别达到了94.96%、92.40%、99.02%、90.55%、92.32%和75.32%。因此,新的分割网络展现了良好的泛化能力,总体性能优于现有网络,能够较好实现通用医学图像的有效分割。  相似文献   

17.
遥感图像的检测在监察自然环境、军事、国土安全等方面具有极其广阔的应用前景,而遥感图像具有背景复杂、目标面积小、特征提取困难等缺点,进行检测时容易产生小目标漏检问题。本文提出一种基于多尺度特征选择性融合的遥感图像检测算法。所提算法采用改进的Resnet50作为主干网络,将Resnet50第一个卷积替换成动态卷积,并将其ConvBlock模块中的卷积替换成金字塔卷积,提高特征提取能力。同时,为了避免遗漏底层信息,在动态卷积层后加入所提有效空间通道注意力机制模块。最后,选取基于上下文信息的不同尺度特征进行融合,提高了模型对目标物体的定位能力。实验结果表明,本文算法在保证速度的同时提高了对遥感图像的检测精度,在遥感图像公开数据集RSOD和NWPUVHR-10上平均精度均值(mean average precision,mAP)分别达到91.88%和90.23%,检测速度达到33 FPS。  相似文献   

18.
Image shadow detection and removal can effectively recover image information lost in the image due to the existence of shadows, which helps improve the accuracy of object detection, segmentation and tracking. Thus, aiming at the problem of the scale of the shadow in the image, and the inconsistency of the shadowed area with the original non-shadowed area after the shadow is removed, the multi-scale and global feature (MSGF) is used in the proposed method, combined with the non-local network and dense dilated convolution pyramid pooling network. Besides, aiming at the problem of inaccurate detection of weak shadows and complicated shape shadows in existing methods, the direction feature (DF) module is adopted to enhance the features of the shadow areas, thereby improving shadow segmentation accuracy. Based on the above two methods, an end-to-end shadow detection and removal network SDRNet is proposed. SDRNet completes the task of sharing two feature heights in a unified network without adding additional calculations. Experimental results on the two public datasets ISDT and SBU demonstrate that the proposed method achieves more than 10% improvement in the BER index for shadow detection and the RMSE index for shadow removal, which proves that the proposed SDRNet based on the MSGF module and DF module can achieve the best results compared with other existing methods.  相似文献   

19.
The task of object tracking is very important since its various applications. However, most object tracking methods are based on visible images, which may fail when visible images are unreliable, for example when the illumination conditions are poor. To address this issue, in this paper a fusion tracking method which combines information from RGB and thermal infrared images (RGB-T) is presented based on the fact that infrared images reveal thermal radiation of objects thus providing complementary features. Particularly, a fusion tracking method based on dynamic Siamese networks with multi-layer fusion, termed as DSiamMFT, is proposed. Visible and infrared images are firstly processed by two dynamic Siamese Networks, namely visible and infrared network, respectively. Then, multi-layer feature fusion is performed to adaptively integrate multi-level deep features between visible and infrared networks. Response maps produced from different fused layer features are then combined through an elementwise fusion approach to produce the final response map, based on which the target can be located. Extensive experiments on large datasets with various challenging scenarios have been conducted. The results demonstrate that the proposed method shows very competitive performance against the-state-of-art RGB-T trackers. The proposed approach also improves tracking performance significantly compared to methods based on images of single modality.  相似文献   

20.
Objects that occupy a small portion of an image or a frame contain fewer pixels and contains less information. This makes small object detection a challenging task in computer vision. In this paper, an improved Single Shot multi-box Detector based on feature fusion and dilated convolution (FD-SSD) is proposed to solve the problem that small objects are difficult to detect. The proposed network uses VGG-16 as the backbone network, which mainly includes a multi-layer feature fusion module and a multi-branch residual dilated convolution module. In the multi-layer feature fusion module, the last two layers of the feature map are up-sampled, and then they are concatenated at the channel level with the shallow feature map to enhance the semantic information of the shallow feature map. In the multi-branch residual dilated convolution module, three dilated convolutions with different dilated ratios based on the residual network are combined to obtain the multi-scale context information of the feature without losing the original resolution of the feature map. In addition, deformable convolution is added to each detection layer to better adapt to the shape of small objects. The proposed FD-SSD achieved 79.1% mAP and 29.7% mAP on PASCAL VOC2007 dataset and MS COCO dataset respectively. Experimental results show that FD-SSD can effectively improve the utilization of multi-scale information of small objects, thus significantly improve the effect of the small object detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号