首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于融合重构的子空间学习的零样本图像分类方法
引用本文:赵鹏,汪纯燕,张思颖,刘政怡.一种基于融合重构的子空间学习的零样本图像分类方法[J].计算机学报,2021,44(2):409-421.
作者姓名:赵鹏  汪纯燕  张思颖  刘政怡
作者单位:安徽大学计算智能与信号处理教育部重点实验室 合肥 230601;安徽大学计算机科学与技术学院 合肥 230601;安徽大学计算机科学与技术学院 合肥 230601;安徽大学计算机科学与技术学院 合肥 230601;安徽大学计算智能与信号处理教育部重点实验室 合肥 230601;安徽大学计算机科学与技术学院 合肥 230601
基金项目:本课题得到国家自然科学基金;安徽省自然科学基金;安徽省高校自然科学研究重点项目;安徽省重点研究、开发计划项目
摘    要:图像分类是计算机视觉中一个重要的研究子领域.传统的图像分类只能对训练集中出现过的类别样本进行分类.然而现实应用中,新的类别不断涌现,因而需要收集大量新类别带标记的数据,并重新训练分类器.与传统的图像分类方法不同,零样本图像分类能够对训练过程中没有见过的类别的样本进行识别,近年来受到了广泛的关注.零样本图像分类通过语义空间建立起已见类别和未见类别之间的关系,实现知识的迁移,进而完成对训练过程中没有见过的类别样本进行分类.现有的零样本图像分类方法主要是根据已见类别的视觉特征和语义特征,学习从视觉空间到语义空间的映射函数,然后利用学习好的映射函数,将未见类别的视觉特征映射到语义空间,最后在语义空间中用最近邻的方法实现对未见类别的分类.但是由于已见类和未见类的类别差异,以及图像的分布不同,从而容易导致域偏移问题.同时直接学习图像视觉空间到语义空间的映射会导致信息损失问题.为解决零样本图像分类知识迁移过程中的信息损失以及域偏移的问题,本文提出了一种图像分类中基于子空间学习和重构的零样本分类方法.该方法在零样本训练学习阶段,充分利用未见类别已知的信息,来减少域偏移,首先将语义空间中的已见类别和未见类别之间的关系迁移到视觉空间中,学习获得未见类别视觉特征原型.然后根据包含已见类别和未见类别在内的所有类别的视觉特征原型所在的视觉空间和语义特征原型所在的语义空间,学习获得一个潜在类别原型特征空间,并在该潜在子空间中对齐视觉特征和语义特征,使得所有类别在潜在子空间中的表示既包含视觉空间下的可分辨性信息,又包含语义空间下的类别关系信息,同时在子空间的学习过程中利用重构约束,减少信息损失,同时也缓解了域偏移问题.最后零样本分类识别阶段,在不同的空间下根据最近邻算法对未见类别样本图像进行分类.本文的主要贡献在于:一是通过对语义空间中类别间关系的迁移,学习获得视觉空间中未见类别的类别原型,使得在训练过程中充分利用未见类别的信息,一定程度上缓解域偏移问题.二是通过学习一个共享的潜在子空间,该子空间既包含了图像视觉空间中丰富的判别性信息,也包含了语义空间中的类别间关系信息,同时在子空间学习过程中,通过重构,缓解知识迁移过程中信息损失的问题.本文在四个公开的零样本分类数据集上进行对比实验,实验结果表明本文提出的零样本分类方法取得了较高的分类平均准确率,证明了本文方法的有效性.

关 键 词:零样本图像分类  迁移学习  子空间学习  重构  特征原型

A Zero-Shot Image Classification Method Based on Subspace Learning with the Fusion of Reconstruction
ZHAO Peng,WANG Chun-Yan,ZHANG Si-Ying,LIU Zheng-Yi.A Zero-Shot Image Classification Method Based on Subspace Learning with the Fusion of Reconstruction[J].Chinese Journal of Computers,2021,44(2):409-421.
Authors:ZHAO Peng  WANG Chun-Yan  ZHANG Si-Ying  LIU Zheng-Yi
Affiliation:(Key Laboratory of Intelligent Computing and Signal Processing,Ministry of Education,Anhui University,Hefei 230601;School of Computer Science and Technology,Anhui University,Hefei 230601)
Abstract:Image classification is an important research subfield in the computer vision.Traditional image classification can only classify the samples of the seen categories which have appeared in the training dataset.However,new categories continue to emerge in real-world applications.The samples of the new categories should be collected and the classifier should be retrained.Different from traditional classification methods,zero-shot image classification aims at classifying the samples of the unseen categories which have not appeared in the training dataset.Zero-shot classification is a very challenging task and has attracted much attention in recent years.Zero-shot image classification bridges the seen categories and the unseen categories through the semantic embedding space,which transfers knowledge from the seen categories to the unseen categories and classifies the samples from the unseen categories.Firstly,the existing zero-shot classification methods typically learn a mapping function from the visual space to the semantic embedding space only according to the information of the samples from the training seen categories.Then,the learned mapping function is utilized to map the visual feature of the test sample from the unseen categories to the semantic space.Finally,zero-shot recognition classify the test samples from the unseen categories by a simple nearest neighbor search in the semantic embedding space.But the seen categories and the unseen categories are different,which will lead to the domain shift.Moreover,directly learning the mapping function from visual space to semantic embedding space will lead to the information loss.In order to solve the problems of the information loss and the domain shift in the knowledge transfer of zero-shot image classification,we propose a zero-shot classification approach based on subspace learning and reconstruction for image classification(Zero-Shot Classification based on Subspace learning and Reconstruction,ZSCSR).Firstly,ZSCSR makes full use of the unseen category information to mitigate the domain shift problem.It transfers the relationship between the seen categories and the unseen categories from the semantic embedding space into the visual space,and obtains the visual prototypes of the unseen categories.Then,according to the visual prototypes and semantic prototypes of all categories including the seen and the unseen categories,ZSCSR learns a latent subspace,which aligns the visual and the semantic spaces.The latent subspace not only contains the discriminative information in the visual space,but also contains the information of the category relationships in the semantic embedding space.Meanwhile,the reconstruction constraint reduces the information loss in the subspace learning.Finally,in the zero-shot recognition,the test samples of unseen classes could be classified by the nearest neighbor search in different spaces.There are two main contributions in this paper as follows.(1) ZSCSR learns the visual prototype of the unseen categories through transferring the relationship between the seen categories and the unseen categories from the semantic embedding space to the visual space,which relieves the domain shift problem.(2) ZSCSR learns a latent space through the latent space learning and reconstruction,which reduces the information loss.The proposed method is evaluated for zero-shot recognition on four benchmark datasets.The experimental results show the proposed method achieves higher average accuracies,which prove the effectiveness of the proposed method.
Keywords:zero-shot image classification  transfer learning  subspace learning  reconstruction  feature prototype
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号