首页 | 本学科首页   官方微博 | 高级检索  
     

基于知识蒸馏与目标区域选取的细粒度图像分类方法
引用本文:赵婷婷.基于知识蒸馏与目标区域选取的细粒度图像分类方法[J].计算机应用研究,2023,40(9).
作者姓名:赵婷婷
作者单位:天津科技大学
基金项目:国家自然科学基金资助项目(61976156);天津市企业科技特派员项目(20YDTPJC00560)
摘    要:细粒度图像分类任务由于自身存在的细微的类间差别和巨大的类内差别使其极具挑战性,为了更好地学习细粒度图像的潜在特征,该算法将知识蒸馏引入到细粒度图像分类任务中,提出基于知识蒸馏与目标区域选取的细粒度图像分类方法(TRS-DeiT),能使其兼具CNN模型和Transformer模型的各自优点。此外,TRS-DeiT的新型目标区域选取模块能够获取最具区分性的区域;为了区分任务中的易混淆类,引入对抗损失函数计算不同类别图像间的相似度。最终,在三个经典细粒度数据集CUB-200-2011、Stanford Cars和Stanford Dogs上进行训练测试,分别达到90.8%、95.0%、95.1%的准确率。实验结果表明,该算法相较于传统模型具有更高的准确性,通过可视化结果进一步证实该算法的注意力主要集中在识别对象,从而使其更擅长处理细粒度图像分类任务。

关 键 词:细粒度图像分类    知识蒸馏    Transformer    深度学习
收稿时间:2022/12/4 0:00:00
修稿时间:2023/8/6 0:00:00

Fine-grained visual classification method based on knowledge distillation and target regions selection
zhaotingting.Fine-grained visual classification method based on knowledge distillation and target regions selection[J].Application Research of Computers,2023,40(9).
Authors:zhaotingting
Affiliation:Tianjin University of science & Technology
Abstract:Fine-grained visual classification(FGVC) is extremely challenging due to the subtle inter-class differences and the large intra-class differences. In order to learn the embedded features of fine-grained images efficiently, this paper attempted to introduce the idea of knowledge distillation to FGVC, and proposed TRS-DeiT, which was equipped with the common advantages of CNN models and Transformer models simultaneously. Besides, it proposed a novel target regions selection module in TRS-DeiT to obtain the most discriminative regions. It employed a contrastive loss function that measured the similarity of images to distinguish the confusable classes in the task. Finally, it demonstrated the effectiveness of the proposed TRS-DeiT model through three datasets, CUB-200-2011, Stanford Cars and Stanford Dogs, which achieved the accuracy of 90.8%, 95.0% and 95.1% respectively. The experimental results show that the proposed model outperforms the traditional models. Furthermore, the visualization results further illustrate that the attention learned by the proposed model mainly focuses on recognizing objects, thus contributes to fine-grained visual classification tasks.
Keywords:fine-grained visual classification  knowledge distillation  Transformer  deep learning
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号