首页 | 本学科首页   官方微博 | 高级检索  
     

基于自监督增强特征的直推式零样本图像分类
引用本文:王浩宇,张欣然,王雪松,程玉虎. 基于自监督增强特征的直推式零样本图像分类[J]. 控制与决策, 2024, 39(5): 1707-1717
作者姓名:王浩宇  张欣然  王雪松  程玉虎
作者单位:中国矿业大学 信息与控制工程学院,江苏 徐州 221116
基金项目:国家自然科学基金项目(62176259,61976215);江苏省自然科学基金项目(BK20221116);江苏省卓越博士后计划项目(2022ZB530).
摘    要:图像的视觉特征对实现零样本图像分类有至关重要的作用.尽管目前VGG、GoogLeNet和ResNet等网络提取的深度特征在图像分类领域获得了广泛的应用,但其在零样本图像分类问题上的表现并不理想,仍旧存在较大的提升空间.此外,由于零样本学习场景下训练集与测试集不相交的设定,导致分类网络不可避免地存在领域偏移问题.为此,提出一种基于自监督增强特征的直推式零样本图像分类框架.首先,通过辅助任务构造伪标签,利用自监督学习获得图像的自监督特征并将其与无监督深度特征进行特征融合;然后,将融合特征嵌入语义空间中进行零样本图像分类,并获得未见类的初始预测标签;最后,利用未见类特征和预测标签迭代地优化视觉-语义映射.所提出框架组件可选择,框架组件自监督网络、主干网络和降维网络分别选用CFN、VGG16和PCA构成网络.在CUB、SUN和AwA2数据集上的实验结果表明,所提出网络能够增强特征的判别能力,在零样本图像分类问题上表现良好.

关 键 词:零样本学习  自监督学习  直推式  视觉-语义映射  特征融合  图像分类

Transductive zero-shot image classification based on self-supervised enhancement feature
WANG Hao-yu,ZHANG Xin-ran,WANG Xue-song,CHENG Yu-hu. Transductive zero-shot image classification based on self-supervised enhancement feature[J]. Control and Decision, 2024, 39(5): 1707-1717
Authors:WANG Hao-yu  ZHANG Xin-ran  WANG Xue-song  CHENG Yu-hu
Affiliation:School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China
Abstract:The visual features of images play a crucial role in realizing zero-shot image classification. Although the deep features extracted by networks such as VGG, GoogLeNet, and ResNet have been widely used in the field of image classification, their performance in zero-shot image classification is not ideal. In addition, due to the disjoint setting of the training and testing sets under the zero-shot learning scenario, the classification network inevitably suffers from the problem of domain shift. Therefor, a transductive zero-shot image classification framework based on self-supervised enhancement feature is proposed. The main idea is as follows: first, the pseudo-labels are constructed via the auxiliary task, the self-supervised features of images are obtained by using the self-supervised learning and are further fused with the unsupervised deep features; then, the fused features are embedded in the semantic space for zero-shot image classification, thus the initial predicted labels for unseen classes are obtained; finally, the features and predicted labels of unseen classes are adopted to iteratively optimize the visual-semantic mapping. The framework components proposed can be selected. The framework components self-supervised network, backbone network and reduced-dimension network are CFN, VGG16 and PCA respectively. Experiments on CUB, SUN, and AwA2 datasets show that the proposed network can enhance the discriminative capability of features and perform well on zero-shot image classification tasks.
Keywords:zero-shot learning;self-supervised learning;transductive;visual-semantic mapping;feature fusion;image classification
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号