首页 | 本学科首页   官方微博 | 高级检索  
     

多尺度拼图重构网络的食品图像识别
引用本文:刘宇昕,闵巍庆,蒋树强,芮勇. 多尺度拼图重构网络的食品图像识别[J]. 软件学报, 2022, 33(11): 4379-4395
作者姓名:刘宇昕  闵巍庆  蒋树强  芮勇
作者单位:中国科学院智能信息处理重点实验室(中国科学院计算技术研究所),北京100190;中国科学院大学,北京100049;联想集团,北京100085
基金项目:国家自然科学基金(61972378,U1936203,U19B2040)
摘    要:近年来,食品图像识别由于在健康饮食管理、无人餐厅等领域的广泛应用而受到了越来越多的关注.不同于其他物体识别任务,食品图像属于细粒度图像,具有较高的类内差异性和类间相似性,而且食品图像没有固定的语义模式和空间布局,这些特点使得食品图像识别更具挑战性.为此,提出了一种用于食品图像识别的多尺度拼图重构网络(multi-scale jigsaw and reconstruction network,MJR-Net).MJR-Net由拼图重构模块、特征金字塔模块和通道注意力模块这3部分组成.拼图重构模块使用破坏重构学习方法将原始图像进行破坏和重构,以提取局部的判别性细节特征;特征金字塔模块可以融合不同尺寸的中层特征,以捕获多尺度的局部判别性特征;通道注意力模块对不同特征通道的重要程度进行建模,以增强判别性的视觉模式,减弱噪声干扰.此外,还使用A-softmax和Focal损失,分别从增大类间差异和修正分类样本的角度优化网络.MJR-Net在ETH Food-101,Vireo Food-172和ISIA Food-500这3个食品数据集上进行实验,分别取得了90.82%,91.37%和64.95%的识别准确率.实验结果表明,与其他食品图像识别方法相比,MJR-Net表现出较大的竞争力,并在Vireo Food-172和ISIA Food-500上取得了最优识别性能.全面的消融实验和可视化分析证明了该方法的有效性.

关 键 词:食品图像识别  深度学习  拼图重构  特征金字塔  注意力机制
收稿时间:2020-09-23
修稿时间:2021-01-11

Food Image Recognition via Multi-scale Jigsaw and Reconstruction Network
LIU Yu-Xin,MIN Wei-Qing,JIANG Shu-Qiang,RUI Yong. Food Image Recognition via Multi-scale Jigsaw and Reconstruction Network[J]. Journal of Software, 2022, 33(11): 4379-4395
Authors:LIU Yu-Xin  MIN Wei-Qing  JIANG Shu-Qiang  RUI Yong
Affiliation:Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China; Lenovo Group, Beijing 100085, China
Abstract:Recently, food image recognition has received more and more attention for its wide applications in healthy diet management, smart restaurant, and so on. Unlike other object recognition tasks, food images belong to fine-grained ones with high intra-class variability and inter-class similarity. Furthermore, food images do not have fixed semantic patterns and specific spatial layout. These make food recognition more challenging. This study proposes a multi-scale jigsaw and reconstruction network (MJR-Net) for food recognition. MJR-Net is composed of three parts. The jigsaw and reconstruction module uses a method called destruction and reconstruction learning to destroy and reconstruct the original image to extract local discriminative details. Feature pyramid module can fuse mid-level features of different sizes to capture multi-scale local discriminative features. Channel-wise attention module can model the importance of different feature channels to enhance the discriminative visual patterns and weaken the noise patterns. The study also uses both A-softmax loss and Focal loss to optimize the network by increasing the inter-class variability and reweighting samples respectively. MJR-Net is evaluated on three food datasets (ETH Food-101, Vireo Food-172, and ISIA Food-500). The proposed method achieves 90.82%, 91.37%, and 64.95% accuracy, respectively. Experimental results show that, compared with other food recognition methods, MJR-Net shows greater competitiveness and especially achieves the state-of-the-art recognition performance on Vireo Food-172 and ISIA Food-500. Comprehensive ablation studies and visual analysis also prove the effectiveness of the proposed method.
Keywords:food image recognition  deep learning  jigsaw and reconstruction  feature pyramid  attention mechanism
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号