首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recently, vision transformer has gained a breakthrough in image recognition. Its self-attention mechanism (MSA) can extract discriminative tokens information from different patches to improve image classification accuracy. However, the classification token in its deep layer ignore the local features between layers. In addition, the patch embedding layer feeds fixed-size patches into the network, which inevitably introduces additional image noise. Therefore, we propose a hierarchical attention vision transformer (HAVT) based on the transformer framework. We present a data augmentation method for attention cropping to crop and drop image noise and force the network to learn key features. Second, the hierarchical attention selection (HAS) module is proposed, which improves the network's ability to learn discriminative tokens between layers by filtering and fusing tokens between layers. Experimental results show that the proposed HAVT outperforms state-of-the-art approaches and significantly improves the accuracy to 91.8% and 91.0% on CUB-200–2011 and Stanford Dogs, respectively. We have released our source code on GitHub https://github.com/OhJackHu/HAVT.git.  相似文献   

2.
With the development of multimedia technology, fine-grained image retrieval has gradually become a new hot topic in computer vision, while its accuracy and speed are limited due to the low discriminative high-dimensional real-valued embedding. To solve this problem, we propose an end-to-end framework named DFMH (Discriminative Feature Mining Hashing), which consists of the DFEM (Discriminative Feature Extracting Module) and SHCM (Semantic Hash Coding Module). Specifically, DFEM explores more discriminative local regions by attention drop and obtains finer local feature expression by attention re-sample. SHCM generates high-quality hash codes by combining the quantization loss and bit balance loss. Validated by extensive experiments and ablation studies, our method consistently outperforms both the state-of-the-art generic retrieval methods as well as fine-grained retrieval methods on three datasets, including CUB Birds, Stanford Dogs and Stanford Cars.  相似文献   

3.
Fine-grained visual classification (FGVC) is a critical task in the field of computer vision. However, FGVC is full of challenges due to the large intra-class variation and small inter-class variation of the classes to be classified on an image. The key in dealing with the problem is to capture subtle visual differences from the image and effectively represent the discriminative features. Existing methods are often limited by insufficient localization accuracy and insufficient feature representation capabilities. In this paper, we propose a cross-layer progressive attention bilinear fusion (CPABF in short) method, which can efficiently express the characteristics of discriminative regions. The CPABF method involves three components: 1) Cross-Layer Attention (CLA) locates and reinforces the discriminative region with low computational costs; 2) The Cross-Layer Bilinear Fusion Module (CBFM) effectively integrates the semantic information from the low-level to the high-level 3) Progressive Training optimizes the parameters in the network to the best state in a delicate way. The CPABF shows excellent performance on the four FGVC datasets and outperforms some state-of-the-art methods.  相似文献   

4.
The underlining task for fine-grained image recognition captures both the inter-class and intra-class discriminate features. Existing methods generally use auxiliary data to guide the network or a complex network comprising multiple sub-networks. They have two significant drawbacks: (1) Using auxiliary data like bounding boxes requires expert knowledge and expensive data annotation. (2) Using multiple sub-networks make network architecture complex and requires complicated training or multiple training steps. We propose an end-to-end Spatial Self-Attention Network (SSANet) comprising a spatial self-attention module (SSA) and a self-attention distillation (Self-AD) technique. The SSA encodes contextual information into local features, improving intra-class representation. Then, the Self-AD distills knowledge from the SSA to a primary feature map, obtaining inter-class representation. By accumulating classification losses from these two modules enables the network to learn both inter-class and intra-class features in one training step. The experiment findings demonstrate that SSANet is effective and achieves competitive performance.  相似文献   

5.
The key to fine-grained image classification is to find discriminative regions. Most existing methods only use simple baseline networks or low-recognition attention modules to discover object differences, which will limit the model to finding discriminative regions hidden in images. This article proposes an effective method to solve this problem. The first is a novel layered training method, which uses a new training method to enhance the feature extraction ability of the baseline model. The second step focuses on key regions of the image based on improved long short-term memory (LSTM) and multi-head attention. In the third step, based on the feature map obtained by the dual attention network, spatial mapping is performed by a multi-layer perceptron (MLP). Then the element-by-element mutual multiplication calculation of the channel is performed to obtain a feature map with finer granularity. Finally, the CUB-200-2011, FGVC Aircraft, Stanford Cars, and MedMNIST v2 datasets achieved good performance.  相似文献   

6.
7.
Performances of fine-grained recognition have been greatly improved thanks to the fast developments of deep convolutional neural networks (DCNN). DCNN methods often treat each image region equally. Besides, researchers often rely on visual information for classification. To solve these problems, we propose a novel discriminative semantic region selection method for fine-grained recognition (DSRS). We first select a few image regions and then use the pre-trained DCNN models to predict their semantic correlations with corresponding classes. We use both visual and semantic representations to represent image regions. The visual and semantic representations are then linearly combined for joint representation. The combination parameters are determined by considering both semantic distinctiveness and spatial-semantic correlations. We use the joint representations for classifier training. A testing image can be classified by obtaining the visual and semantic representations and encoded for joint representation and classification. Experiments on several publicly available datasets demonstrate the proposed method's superiority.  相似文献   

8.
Existing point cloud classification researches are usually conducted on datasets with complete structure and clear semantics. However, in real point cloud scenes, the occlusion and truncation may destroy the completeness of objects affecting the classification performance. To solve this problem, we propose an incomplete point cloud classification network (IPC-Net) with data augmentation and similarity measurement. The proposed network learns the feature representation of incomplete point clouds and the semantic differences compared to the complete ones for classification. Specifically, IPC-Net adopts a random erasing-based data augmentation to deal with incomplete point clouds. IPC-Net also introduces an auxiliary loss function weighted by attention scores to measure the similarity between the incomplete and the complete point clouds. Extensive experiments verify that IPC-Net has the ability to classify incomplete point clouds and significantly improves the robustness of point cloud classification under different completeness.  相似文献   

9.
This paper presents a scheme for feature extraction that can be applied for classification of corals in submarine coral reef images. In coral reef image classification, texture features are extracted using the proposed Improved Local Derivative Pattern (ILDP). ILDP determines diagonal directional pattern features based on local derivative variations which can capture full information. For classification, three classifiers, namely Convolutional Neural Network (CNN), K-Nearest Neighbor (KNN) with four distance metrices, namely Euclidean distance, Manhattan distance, Canberra distance and Chi-Square distance, and Support Vector Machine (SVM) with three kernel functions, namely Polynomial, Radial basis function, Sigmoid kernel are used. The accuracy of the proposed method is compared with Local Binary pattern (LBP), Local Tetra Pattern (LTrP), Local Derivative Pattern (LDP) and Robust Local Ternary Pattern (RLTP) on five coral data sets and four texture data sets. Experimental results indicate that ILDP feature extraction method when tested with five coral data sets, namely EILAT, RSMAS, EILAT2, MLC2012 and SDMRI and four texture data sets, namely KTH-TIPS, UIUCTEX, CURET and LAVA achieves the highest overall classification accuracy, minimum execution time when compared to the other methods.  相似文献   

10.
In this paper a new classification method called locality-sensitive kernel sparse representation classification (LS-KSRC) is proposed for face recognition. LS-KSRC integrates both sparsity and data locality in the kernel feature space rather than in the original feature space. LS-KSRC can learn more discriminating sparse representation coefficients for face recognition. The closed form solution of the l1-norm minimization problem for LS-KSRC is also presented. LS-KSRC is compared with kernel sparse representation classification (KSRC), sparse representation classification (SRC), locality-constrained linear coding (LLC), support vector machines (SVM), the nearest neighbor (NN), and the nearest subspace (NS). Experimental results on three benchmarking face databases, i.e., the ORL database, the Extended Yale B database, and the CMU PIE database, demonstrate the promising performance of the proposed method for face recognition, outperforming the other used methods.  相似文献   

11.
针对遥感图像监督分类方法需要人工提取训练样本的缺陷,提出一种模糊K均值聚类(FCM)提取训练样本、支持向量机(SVM)进行分类的方法。算法首先用FCM进行初步分类得到隶属度矩阵并判断每个样本的类别号;然后根据隶属度矩阵提取每类样本中密集程度较高的样本作为训练样本;最后用SVM对样本进行训练、再次分类。该方法克服了SVM算法需要人工样本的缺点,改善了传统非监督分类算法的性能,UCI标准数据库Iris数据和遥感数据样本的实验结果证明了该方法的可行性。  相似文献   

12.
Fine-grained Visual Categorization (FGVC) in computer vision aims to recognize images belonging to multiple subordinate categories of a super-category. The difficulty of FGVC lies in the close resemblance among inter-classes and large variations among intra-classes. Most existing networks only focus on a few discriminative regions, while ignoring many subtle complementary features. So we propose a Progressive Erasing Network (PEN). In PEN, a Multi-Grid Erasure mechanism augments data samples and assists in capturing the local discriminative features, where the overall structure of the image is destroyed indirectly through pixel-wise erasure. Cross-layer feature aggregation by extracting salient class features is of great significance in FGVC. However, the capability of cross-layer feature representation based on a simple aggregation strategy is still inefficient. To this end, the proposed Consistency loss explores the cross-layer semantic affinity, which guides the Cross-Layer Incentive (CLI) block to mine more efficient feature representations of different granularity. We also integrate Cross Entropy and Complementary Entropy to take the distribution of negative classes into account for better classification performance. Our method uses end-to-end training with only classification labels. Experimental results show that our model outperforms the state-of-the-art on three fine-grained benchmarks.  相似文献   

13.
Most binary networks apply full precision convolution at the first layer. Changing the first layer to the binary convolution will result in a significant loss of accuracy. In this paper, we propose a new approach to solve this problem by widening the data channel to reduce the information loss of the first convolutional input through the sign function. In addition, widening the channel increases the computation of the first convolution layer, and the problem is solved by using group convolution. The experimental results show that the accuracy of applying this paper''s method to state-of-the-art (SOTA) binarization method is significantly improved, proving that this paper''s method is effective and feasible.  相似文献   

14.
人体外周血白细胞五分类在医学临床诊断中有重要 的作用。本文提出一种基于深度卷积神经网络(CNN)的人体外周血白细胞显微图像五分类 方法。首先以ResNet为原型结构设计了一种适用于白细胞 显微图像分类的深度卷积神经网络,并提出了一种基于特征集中的新的数据增强的方法来丰 富数据集。由 于图像的背景对物体识别有很大影响,用图像处理的方法改变同一白细胞的背景,可以生成 新的样本。经 过数据增强后的样本总量为42300。最后,针对数据集中五类白细胞样 本不均衡问题,在神 经网络训练策 略中,提出一种改进的批次(batch)随机梯度下降算法(MBGD)。通过将批次随机梯度下 降算法每个批 次中五类白细胞所占比例设置为1∶1∶1∶1∶1,可以使CNN均衡地获 取五类白细胞的特征。实验结果表明, 本文所设计的CNN结构、所提出数据增强方法和改进的批次随机梯度下降算法均可提高白细 胞图像分类 正确率。所提白细胞五分类方法可以达到95.7%的训练正确率。对8400张白细胞图像进行测试,得到95.0% 的平均分类正确率,嗜中性粒细胞、淋巴细胞、单核细胞、嗜酸性粒细胞和嗜碱性粒细胞的 分类正确率分别为:92.2%,91.5%, 94.6%,93.3%和97.4%。  相似文献   

15.
Low-rank representation (LRR) is a useful tool for seeking the lowest rank representation among all the coefficient matrices that represent the images as linear combinations of the basis in the given dictionary. However, it is an unsupervised method and has poor applicability and performance in real scenarios because of the lack of image information. In this paper, based on LRR, we propose a novel semi-supervised approach, called label constrained sparse low-rank representation (LCSLRR), which incorporates the label information as an additional hard constraint. Specifically, this paper develops an optimization process in which the improvement of the discriminating power of the low-rank decomposition is presented explicitly by adding the label information constraint. We construct LCSLRR-graph to represent data structures for semi-supervised learning and provide the weights of edges in the graph by seeking a low-rank and sparse matrix. We conduct extensive experiments on publicly available databases to verify the effectiveness of our novel algorithm in comparison to the state-of-the-art approaches through a set of evaluations.  相似文献   

16.
综述了字典学习算法的主要研究方向之一,即以图像分类为目标的稀疏表示字典学习算法。从空间变换法和类别指示法两个角度,分析各种算法的优缺点,并对相应的实验结果进行比较。总结了利用这类算法进行图像分类时所面临的其他一些关键问题,如模式识别中的旋转不变性和计算速度等。依据目前已有的技术和应用需求,探寻该领域未来的研究方向。  相似文献   

17.
This paper introduces vector-scalar classification (VSC) for discrete cosine transform (DCT) coding of images. Two main characteristics of VSC differentiate it from previously proposed classification methods. First, pattern classification is effectively performed in the energy domain of the DCT subvectors using vector quantization. Second, the subvectors, instead of the DCT vectors, are mapped into a prescribed number of classes according to a pattern-to-class link established by scalar quantization. Simulation results demonstrate that the DCT coding systems based on VSC are superior to the other proposed DCT coding systems and are competitive compared to the best subband and wavelet coding systems reported in the literature.  相似文献   

18.
Wavelet feature selection for image classification   总被引:2,自引:0,他引:2  
Energy distribution over wavelet subbands is a widely used feature for wavelet packet based texture classification. Due to the overcomplete nature of the wavelet packet decomposition, feature selection is usually applied for a better classification accuracy and a compact feature representation. The majority of wavelet feature selection algorithms conduct feature selection based on the evaluation of each subband separately, which implicitly assumes that the wavelet features from different subbands are independent. In this paper, the dependence between features from different subbands is investigated theoretically and simulated for a given image model. Based on the analysis and simulation, a wavelet feature selection algorithm based on statistical dependence is proposed. This algorithm is further improved by combining the dependence between wavelet feature and the evaluation of individual feature component. Experimental results show the effectiveness of the proposed algorithms in incorporating dependence into wavelet feature selection.  相似文献   

19.
Census transformation and its variants have gained popularity in image classification for their simplicity and better performance. To describe a texture pattern, these approaches generally use sign information while comparing neighboring pixels. However, our observation is that sign and magnitude in a single color channel as well as in different color channels hold complementary information where sign component captures texture in an image and the saliency of that texture can be captured by the magnitude component. Considering these issues, a multi-channel complementary census transform (MCCT) is proposed in this paper by combining all of these information to capture more discriminating features. Rigorous experiments on nine different datasets which belong to six different applications such as flower, gender, aerial orthoimagery, event, leaf, indoor and outdoor scene classification demonstrate that MCCT outperforms existing state-of-the-art techniques.  相似文献   

20.
Multi-label classification with region-free labels is attracting increasing attention compared to that with region-based labels due to the time-consuming manual region-labeling process. Existing methods usually employ attention-based technology to discover the conspicuous label-related regions in a weakly-supervised manner with only image-level region-free labels, while the region covering is not precise without exploring global clues of multi-level features. To address this issue, a novel Global-guided Weakly-Supervised Learning (GWSL) method for multi-label classification is proposed. The GWSL first extracts the multi-level features to estimate their global correlation map which is further utilized to guide feature disentanglement in the proposed Feature Disentanglement and Localization (FDL) networks. Specifically, the FDL networks then adaptively combine the different correlated features and localize the fine-grained features for identifying multiple labels. The proposed method is optimized in an end-to-end manner under weakly supervision with only image-level labels. Experimental results demonstrate that the proposed method outperforms the state-of-the-arts for multi-label learning problems on several publicly available image datasets. To facilitate similar researches in the future, the codes are directly available online at https://github.com/Yong-DAI/GWSL.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号