首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
目标识别是合成孔径雷达(Synthetic Aperture Radar,SAR)图像解译的重要步骤。鉴于卷积神经网络(Convolutional Neural Network, CNN)在自然图像分类领域表现优越,基于CNN的SAR图像目标识别方法成为了当前的研究热点。SAR图像目标的散射特征往往存在于多个尺度当中,且存在固有的噪声斑,含有冗余信息,因此,SAR图像目标智能识别成为了一项挑战。针对以上问题,本文提出一种多尺度注意力卷积神经网络,结合多尺度特征提取和注意力机制,设计了基于注意力的多尺度残差特征提取模块,实现了高精度的SAR遥感图像目标识别。该方法在MSTAR数据集10类目标识别任务中的总体准确率达到了99.84%,明显优于其他算法。在测试集加入4种型号变体后,10类目标识别任务中的总体准确率达到了99.28%,验证了该方法在复杂情况下的有效性。  相似文献   

2.
显著区域检测可应用在对象识别、图像分割、视 频/图像压缩中,是计算机视觉领域的重要研究主题。然而,基于不 同视觉显著特征的显著区域检测法常常不能准确地探测出显著对象且计算费时。近来,卷积 神经网络模型在图像分析和处理 领域取得了极大成功。为提高图像显著区域检测性能,本文提出了一种基于监督式生成对抗 网络的图像显著性检测方法。它 利用深度卷积神经网络构建监督式生成对抗网络,经生成器网络与鉴别器网络的不断相互对 抗训练,使卷积网络准确学习到 图像显著区域的特征,进而使生成器输出精确的显著对象分布图。同时,本文将网络自身误 差和生成器输出与真值图间的 L1距离相结合,来定义监督式生成对抗网络的损失函数,提升了显著区域检测精度。在MSRA 10K与ECSSD数据库上的实 验结果表明,本文方法 分别获得了94.19%与96.24%的准确率和93.99%与90.13%的召回率,F -Measure值也高达94.15%与94.76%,优于先 前常用的显著性检测模型。  相似文献   

3.
缪冉  李菲菲  陈虬 《电子科技》2009,33(12):54-58
场景图像往往是由一些前景物体与背景环境以一定的空间布局组成。同类场景图片由于采样时的尺度、视角以及背景的不同而具有严重的类内差异性;存在于异类场景间的共有物体也导致异类场景图像间具有一定的相似性。据此,文中提出了基于CNN与多尺度空间编码的场景描述及识别方法。该方法结合了多尺度密集采样方法、卷积网络算法与多尺度空间编码方法。多尺度空间的编码方法是将采样网络进行多次空间划分,且对不同子区域中的CNN特征进行聚合,生成多尺度空间VLAD。文中在Scene15场景数据集上进行了实验,结果显示测试精度达到了94.67%。  相似文献   

4.
Driver distraction has currently been a global issue causing the dramatic increase of road accidents and casualties. However, recognizing distracted driving action remains a challenging task in the field of computer vision, since inter-class variations between different driver action categories are quite subtle. To overcome this difficulty, in this paper, a novel deep learning based approach is proposed to extract fine-grained feature representation for image-based driver action recognition. Specifically, we improve the existing convolutional neural network from two aspects: (1) we employ multi-scale convolutional block with different receptive fields of kernel sizes to generate hierarchical feature map and adopt maximum selection unit to adaptively combine multi-scale information; (2) we incorporate an attention mechanism to learn pixel saliency and channel saliency between convolutional features so that it can guide the network to intensify local detail information and suppress global background information. For experiment, we evaluate the designed architecture on multiple driver action datasets. The quantitative experiment result shows that the proposed multi-scale attention convolutional neural network (MSA-CNN) obtains the state of the art performance in image-based driver action recognition.  相似文献   

5.
汤磊  丁博  何勇军 《电子学报》2021,49(1):64-71
目前基于视图的三维模型检索已经成为一个研究热点.该方法首先将三维模型表示为二维视图的集合,然后采用深度学习技术进行分类和检索.但是现有的方法在精度和效率方面都有待提升.本文提出了一种新的三维模型检索方法,该方法包括索引建立和模型检索.在索引建立阶段,选择代表性视图输入到训练好的卷积神经网络(Convolutional ...  相似文献   

6.
为有效识别视觉系统采集的可见光图像中的舰船目标,提出了基于YOLO(You Only Look Once)网络模型改进的10层的卷积神经网络(Convolutional Neural Network,CNN)用于水面舰船目标的智能识别,通过反卷积的方法可视化CNN中不同卷积层提取到的舰船目标特征。按照传统目标识别方法提取了舰船目标的四类典型人工设计特征,将所提CNN的舰船目标识别结果与YOLO网络模型及四类人工设计特征结合支持向量机用于舰船目标识别的结果进行比较。实验结果表明,与YOLO网络模型相比,综合精确率、召回率和效率3个舰船目标识别的性能指标,改进后的CNN性能更好,从而验证了所提方法的有效性。不同数据量下采用典型特征识别舰船目标与基于深度CNN识别舰船目标的识别结果比较说明了不同类型目标识别算法的优劣势,有利于推动综合性视觉感知框架的构建。  相似文献   

7.
The design, analysis and application of a volumetric convolutional neural network (VCNN) are studied in this work. Although many CNNs have been proposed in the literature, their design is empirical. In the design of the VCNN, we propose a feed-forward K-means clustering algorithm to determine the filter number and size at each convolutional layer systematically. For the analysis of the VCNN, the cause of confusing classes in the output of the VCNN is explained by analyzing the relationship between the filter weights (also known as anchor vectors) from the last fully-connected layer to the output. Furthermore, a hierarchical clustering method followed by a random forest classification method is proposed to boost the classification performance among confusing classes. For the application of the VCNN, we examine the 3D shape classification problem and conduct experiments on a popular ModelNet40 dataset. The proposed VCNN offers the state-of-the-art performance among all volume-based CNN methods.  相似文献   

8.
Convolutional neural network (CNN) based methods have recently achieved extraordinary performance in single image super-resolution (SISR) tasks. However, most existing CNN-based approaches increase the model’s depth by stacking massive kernel convolutions, bringing expensive computational costs and limiting their application in mobile devices with limited resources. Furthermore, large kernel convolutions are rarely used in lightweight super-resolution designs. To alleviate the above problems, we propose a multi-scale convolutional attention network (MCAN), a lightweight and efficient network for SISR. Specifically, a multi-scale convolutional attention (MCA) is designed to aggregate the spatial information of different large receptive fields. Since the contextual information of the image has a strong local correlation, we design a local feature enhancement unit (LFEU) to further enhance the local feature extraction. Extensive experimental results illustrate that our proposed MCAN can achieve better performance with lower model complexity compared with other state-of-the-art lightweight methods.  相似文献   

9.
卷积神经网络在高级计算机视觉任务中展现出强 大的特征学习能力,已经在图像语义 分割任务 中取得了显著的效果。然而,如何有效地利用多尺度的特征信息一直是个难点。本文提出一 种有效 融合多尺度特征的图像语义分割方法。该方法包含4个基础模块,分别为特征融合模块(feature fusion module,FFM)、空 间信息 模块(spatial information module,SIM)、全局池化模块(global pooling module,GPM)和边界细化模块(boundary refinement module,BRM)。FFM采用了注意力机制和残差结构,以提高 融合多 尺度特征的效率,SIM由卷积和平均池化组成,为模型提供额外的空间细节信息以 辅助定 位对象的边缘信息,GPM提取图像的全局信息,能够显著提高模型的性能,BRM以残差结构为核心,对特征图进行边界细化。本文在全卷积神经网络中添加4个基础模块, 从而有 效地利用多尺度的特征信息。在PASCAL VOC 2012数据集上的实验结 果表明该方法相比全卷积神 经网络的平均交并比提高了8.7%,在同一框架下与其他方法的对比结 果也验证了其性能的有效性。  相似文献   

10.

To develop an automated pulmonary fibrosis (PF) segmentation methodology using a 3D multi-scale convolutional encoder-decoder approach following the robust atlas-based active volume model in thoracic CT for Rhesus Macaques with radiation-induced lung damage. 152 thoracic computed tomography scans of Rhesus Macaques with radiation-induced lung damage were collected. The 3D input data are randomly augmented with the Gaussian blurring when applying the 3D multi-scale convolutional encoder-decoder (3D MSCED) segmentation method.PF in each scan was manually segmented in which 70% scans were used as training data, 20% scans were used as validation data, and 10% scans were used as testing data. The performance of the method is assessed based on a10-fold cross validation method. The workflow of the proposed method has two parts. First, the compromised lung volume with acute radiation-induced PF was segmented using a robust atlas-based active volume model. Next, a 3D multi-scale convolutional encoder-decoder segmentation method was developed which merged the higher spatial information from low-level features with the high-level object knowledge encoded in upper network layers. It included a bottom-up feed-forward convolutional neural network and a top-down learning mask refinement process. The quantitative results of our segmentation method achieved mean Dice score of (0.769, 0.853), mean accuracy of (0.996, 0.999), and mean relative error of (0.302, 0.512) with 95% confidence interval. The qualitative and quantitative comparisons show that our proposed method can achieve better segmentation accuracy with less variance in testing data. This method was extensively validated in NHP datasets. The results demonstrated that the approach is more robust relative to PF than other methods. It is a general framework which can easily be applied to segmentation other lung lesions.

  相似文献   

11.
Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network (CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion ( MMFF) is proposed. Specifically, first residual network ( Resnet )-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network (FPN), finally squeeze-and-excitation fusion ( SEF) module and self-attention network ( SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.  相似文献   

12.
Currently, video-based Sign language recognition (SLR) has been extensively studied using deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, using multi view attention mechanism along with CNNs could be an appealing solution that can be considered in order to make the machine interpretation process immune to finger self-occlusions. The proposed multi stream CNN mixes spatial and motion modelled video sequences to create a low dimensional feature vector at multiple stages in the CNN pipeline. Hence, we solve the view invariance problem into a video classification problem using attention model CNNs. For superior network performance during training, the signs are learned through a motion attention network thus focusing on the parts that play a major role in generating a view based paired pooling using a trainable view pair pooling network (VPPN). The VPPN, pairs views to produce a maximally distributed discriminating features from all the views for an improved sign recognition. The results showed an increase in recognition accuracies on 2D video sign language datasets. Similar results were obtained on benchmark action datasets such as NTU RGB D, MuHAVi, WEIZMANN and NUMA as there is no multi view sign language dataset except ours.  相似文献   

13.
王小宇  李凡  曹琳  李军  张驰  彭圆  丛丰裕 《信号处理》2020,36(6):958-965
由于水声信号的高度复杂性,基于特征工程的传统水下目标识别方法表现欠佳。基于深度学习模型的水下目标识别方法可有效减少由于特征提取过程带来的水声信号信息损失,进而提高水下目标识别效果。本文提出一种适用于水下目标识别场景的卷积神经网络结构,即在卷积模块化设计中引入卷积核为1的卷积层,更大程度地保留水声信号局部特征,且降低模型的复杂程度;同时,以全局平均池化层替代全连接层的方式构造基于特征图对应的特征向量主导分类结果的网络结构,使结果更具可解释性,且减少训练参数降低过拟合风险。实验结果表明该方法得到的水下目标识别准确率(91.7%)要优于基于传统卷积神经网络(69.8%)和基于高阶统计量特征的传统方法识别表现(85%)。这说明本文提出的模型能更好保留水声信号的时域结构,进而提高分类识别效果。   相似文献   

14.
In this paper we propose a novel deep spatial transformer convolutional neural network (Spatial Net) framework for the detection of salient and abnormal areas in images. The proposed method is general and has three main parts: (1) context information in the image is captured by using convolutional neural networks (CNN) to automatically learn high-level features; (2) to better adapt the CNN model to the saliency task, we redesign the feature sub-network structure to output a 6-dimensional transformation matrix for affine transformation based on the spatial transformer network. Several local features are extracted, which can effectively capture edge pixels in the salient area, meanwhile embedded into the above model to reduce the impact of highlighting background regions; (3) finally, areas of interest are detected by means of the linear combination of global and local feature information. Experimental results demonstrate that Spatial Nets obtain superior detection performance over state-of-the-art algorithms on two popular datasets, requiring less memory and computation to achieve high performance.  相似文献   

15.
脑电信号(electroencephalography,EEG)已成为医生诊断神经系统疾病最 广泛使用的工具,实现癫痫EEG的自动识别对 于癫痫患者的临床诊断和治疗具有重要意义。为了提高癫痫EEG的识别精度,提出了一 种基于多尺 度卷积特征融合的癫痫EEG自动识别模型。首先采用多尺度卷积特征融合方法提取多粒 度数据特征, 实现卷积神经网络(convolutional neural network,CNN)中不同层次的信息互补;然后经过长短期记忆网络(long short-term memory network,LSTM)提取时间 特征,利用 softmax分类器给出最终的识别结果。为了评估提出方法的识别性能,在波恩大学癫痫病研 究中心数据集 中进行实验,并与CNN-LSTM模型、单一的LSTM等模型的识别性能进行了比较,实验结果表 明,提出 方法的识别精度明显高于其余方法, 平均可达到99.19%。该模型能够 有效识别癫痫EEG类别,具有较高的识别性能和临床应用潜力。  相似文献   

16.
该文提出了一种结合区域和深度残差网络的语义分割模型。基于区域的语义分割方法使用多尺度提取相互重叠的区域,可识别多种尺度的目标并得到精细的物体分割边界。基于全卷积网络的方法使用卷积神经网络(CNN)自主学习特征,可以针对逐像素分类任务进行端到端训练,但是这种方法通常会产生粗糙的分割边界。该文将两种方法的优点结合起来:首先使用区域生成网络在图像中生成候选区域,然后将图像通过带扩张卷积的深度残差网络进行特征提取得到特征图,结合候选区域以及特征图得到区域的特征,并将其映射到区域中每个像素上;最后使用全局平均池化层进行逐像素分类。该文还使用了多模型融合的方法,在相同的网络模型中设置不同的输入进行训练得到多个模型,然后在分类层进行特征融合,得到最终的分割结果。在SIFT FLOW和PASCAL Context数据集上的实验结果表明该文方法具有较高的平均准确率。  相似文献   

17.
3-D object recognition using 2-D views   总被引:1,自引:0,他引:1  
We consider the problem of recognizing 3-D objects from 2-D images using geometric models and assuming different viewing angles and positions. Our goal is to recognize and localize instances of specific objects (i.e., model-based) in a scene. This is in contrast to category-based object recognition methods where the goal is to search for instances of objects that belong to a certain visual category (e.g., faces or cars). The key contribution of our work is improving 3-D object recognition by integrating Algebraic Functions of Views (AFoVs), a powerful framework for predicting the geometric appearance of an object due to viewpoint changes, with indexing and learning. During training, we compute the space of views that groups of object features can produce under the assumption of 3-D linear transformations, by combining a small number of reference views that contain the object features using AFoVs. Unrealistic views (e.g., due to the assumption of 3-D linear transformations) are eliminated by imposing a pair of rigidity constraints based on knowledge of the transformation between the reference views of the object. To represent the space of views that an object can produce compactly while allowing efficient hypothesis generation during recognition, we propose combining indexing with learning in two stages. In the first stage, we sample the space of views of an object sparsely and represent information about the samples using indexing. In the second stage, we build probabilistic models of shape appearance by sampling the space of views of the object densely and learning the manifold formed by the samples. Learning employs the Expectation-Maximization (EM) algorithm and takes place in a "universal," lower-dimensional, space computed through Random Projection (RP). During recognition, we extract groups of point features from the scene and we use indexing to retrieve the most feasible model groups that might have produced them (i.e., hypothesis generation). The likelihood of each hypothesis is then computed using the probabilistic models of shape appearance. Only hypotheses ranked high enough are considered for further verification with the most likely hypotheses verified first. The proposed approach has been evaluated using both artificial and real data, illustrating promising performance. We also present preliminary results illustrating extensions of the AFoVs framework to predict the intensity appearance of an object. In this context, we have built a hybrid recognition framework that exploits geometric knowledge to hypothesize the location of an object in the scene and both geometrical and intesnity information to verify the hypotheses.  相似文献   

18.
陈昊  郭文普  康凯 《电讯技术》2023,63(12):1869-1875
针对低信噪比条件下自动调制识别准确率不高的问题,提出了通道门控Res2Net卷积神经网络自动调制识别模型。该模型主要由二维卷积神经(Two-dimensional Convolutional Neural Network, 2D-CNN)网络、多尺度残差网络(Residual 2-network, Res2Net)、压缩与激励网络(Squeeze-and-Excitation Network, SENet)和长短期记忆(Long Short-Term Memory, LSTM)网络组成,通过卷积从原始I/Q数据中提取多尺度特征,结合门控机制对特征通道进行权重调整,并利用LSTM对卷积所得特征进行序列建模,确保数据特征被有效挖掘,从而提升自动调制识别的准确率。在基准数据集RML2016.10a下的调制识别实验表明,所提模型在信噪比为12 dB时识别精度为92.68%,在信噪比2 dB以上时平均识别精度大于91%,较经典CLDNN模型、LSTM模型和同类型PET-CGDNN模型、CGDNet模型能取得更高的调制类型识别准确率。  相似文献   

19.
Video frame interpolation is a technology that generates high frame rate videos from low frame rate videos by using the correlation between consecutive frames. Presently, convolutional neural networks (CNN) exhibit outstanding performance in image processing and computer vision. Many variant methods of CNN have been proposed for video frame interpolation by estimating either dense motion flows or kernels for moving objects. However, most methods focus on estimating accurate motion. In this study, we exhaustively analyze the advantages of both motion estimation schemes and propose a cascaded system to maximize the advantages of both the schemes. The proposed cascaded network consists of three autoencoder networks, that process the initial frame interpolation and its refinement. The quantitative and qualitative evaluations demonstrate that the proposed cascaded structure exhibits a promising performance compared to currently existing state-of-the-art-methods.  相似文献   

20.
基于实数域的卷积神经网络(CNN)模型无法充分利用极化合成孔径雷达(PolSAR)图像丰富的相位信息,并且逐像素切片预测存在大量冗余计算,导致分类效率低下.针对以上问题,本文提出一种改进编解码网络模型.首先构建复数域CNN模型,并进行低采样率下的模型训练;然后构建复数域双通道编解码网络模型,引入改进空洞空间金字塔池化(...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号