首页 | 本学科首页   官方微博 | 高级检索  
     

基于模态相关性学习的细粒度分类
引用本文:张天舒,刘凡,戴雯雯,高瑞琢.基于模态相关性学习的细粒度分类[J].计算机应用研究,2023,40(11).
作者姓名:张天舒  刘凡  戴雯雯  高瑞琢
作者单位:河海大学 计算机与信息学院,河海大学 计算机与信息学院,河海大学 计算机与信息学院,河海大学 计算机与信息学院
基金项目:装备预研教育部联合基金资助项目(8091B032157);信息系统需求重点实验室开放基金资助项目(LHZZ2021-M04);水下机器人技术重点实验室研究基金资助项目(2021JCJQ-SYSJJ-LB06905)
摘    要:针对单模态细粒度分类方法难以区分图像间细微差异的问题,将多模态融合方法引入到细粒度分类任务中,充分利用多模态数据的相关性和互补性,提出了一种基于模态相关性学习的细粒度分类方法。该方法分为两个阶段,首先考虑到图像和文本数据之间的对应关系,利用它们的匹配程度作为约束来进行模型的预训练;接着,加载上一步得到的网络参数,先提取多模态特征,再利用文本特征指导图像特征的生成;最后,基于融合后的特征进行细粒度分类。该方法在UPMC-Food101、MEP-3M-MEATS和MEP-3M-OUTDOORS数据集上进行训练测试,分别达到91.13%、82.39%和93.17%的准确率。实验结果表明,该方法相对于传统的多模态融合方法具有更好的性能,是一种有效的细粒度分类方法。

关 键 词:细粒度分类    多模态融合    相关性学习
收稿时间:2023/3/28 0:00:00
修稿时间:2023/10/13 0:00:00

Fine-grained classification based on modal correlation learning
Zhang Tianshu,Liu Fan,Dai Wenwen and Gao Ruizhuo.Fine-grained classification based on modal correlation learning[J].Application Research of Computers,2023,40(11).
Authors:Zhang Tianshu  Liu Fan  Dai Wenwen and Gao Ruizhuo
Affiliation:School of Computer and Information, Hohai University,,,
Abstract:To address the problem of difficulty in distinguishing subtle differences between images in single-modal fine-grained classification methods, this paper introduced a multimodal fusion approach into the task of fine-grained classification. By fully utilizing the correlations and complementarity of multimodal data, this paper proposed a modality correlation learning-based fine-grained classification method. The method consisted of two stages. Firstly, considering the correspondence between image and text data, it used their matching degree as a constraint for model pretraining. Subsequently, with the loaded network parameters from the previous step, it first extracted multimodal features, and followed by utilizing text features to guide the generation of image features. Finally, it performed fine-grained classification based on the fused features. The method was trained and tested on the UPMC-Food101, MEP-3M-MEATS, and MEP-3M-OUTDOORS datasets, achieving accuracies of 91.13%, 82.39%, and 93.17%, respectively. Experimental results demonstrate that this method outperforms traditional multimodal fusion methods, making it an effective fine-grained classification approach.
Keywords:fine-grained classification  multimodal fusion  correspondence learning
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号