首页 | 本学科首页   官方微博 | 高级检索  
     


Two-stage deep learning for supervised cross-modal retrieval
Authors:Shao  Jie  Zhao  Zhicheng  Su  Fei
Affiliation:1.School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China
;2.Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China
;
Abstract:

This paper deals with the problem of modeling internet images and associated texts for cross-modal retrieval such as text-to-image retrieval and image-to-text retrieval. Recently, supervised cross-modal retrieval has attracted increasing attention. Inspired by a typical two-stage method, i.e., semantic correlation matching(SCM), we propose a novel two-stage deep learning method for supervised cross-modal retrieval. Limited by the fact that traditional canonical correlation analysis (CCA) is a 2-view method, the supervised semantic information is only considered in the second stage of SCM. To maximize the value of semantics, we expand CCA from 2-view to 3-view and conduct supervised learning in both stages. In the first learning stage, we embed 3-view CCA into a deep architecture to learn non-linear correlation between image, text and semantics. To overcome over-fitting, we add the reconstruct loss of each view into the loss function, which includes the correlation loss of every two views and regularization of parameters. In the second stage, we build a novel fully-convolutional network (FCN), which is trained by joint supervision of contrastive loss and center loss to learn better features. The proposed method is evaluated on two publicly available data sets, and the experimental results show that our method is competitive with state-of-the-art methods.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号