首页 | 本学科首页   官方微博 | 高级检索  
     

基于文本与视觉信息的细粒度图像分类
引用本文:袁建平,陈晓龙,陈显龙,何恩杰,张加其,高宇豆. 基于文本与视觉信息的细粒度图像分类[J]. 图学学报, 2019, 40(3): 503. DOI: 10.11996/JG.j.2095-302X.2019030503
作者姓名:袁建平  陈晓龙  陈显龙  何恩杰  张加其  高宇豆
作者单位:北京恒华伟业科技股份有限公司,北京,100011;华北电力大学控制与计算机工程学院,北京,102206
基金项目:北京市科技计划课题(Z171100001217006)
摘    要:一般细粒度图像分类只关注图像局部视觉信息,但在一些问题中图像局部的文本 信息对图像分类结果有直接帮助,通过提取图像文本语义信息可以进一步提升图像细分类效果。 我们综合考虑了图像视觉信息与图像局部文本信息,提出一个端到端的分类模型来解决细粒度 图像分类问题。一方面使用深度卷积神经网络获取图像视觉特征,另一方面依据提出的端到端 文本识别网络,提取图像的文本信息,再通过相关性计算模块合并视觉特征与文本特征,送入 分类网络。最终在公共数据集 Con-Text 上测试该方法在图像细分类中的结果,同时也在 SVT 数据集上验证端到端文本识别网络的能力,均较之前方法获得更好的效果。

关 键 词:计算机视觉  细粒度图像分类  场景文本识别  卷积神经网络  注意力机制

Fine-Grained Image Classification Based on Text and Visual Information
YUAN Jian-ping,CHEN Xiao-long,CHEN Xian-long,HE En-jie,ZHANG Jia-qi,GAO Yu-dou. Fine-Grained Image Classification Based on Text and Visual Information[J]. Journal of Graphics, 2019, 40(3): 503. DOI: 10.11996/JG.j.2095-302X.2019030503
Authors:YUAN Jian-ping  CHEN Xiao-long  CHEN Xian-long  HE En-jie  ZHANG Jia-qi  GAO Yu-dou
Affiliation:(1. Beijing Forever Technology Co. Ltd, Beijing 100011, China; 2. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China)
Abstract:The fine-grained image classification generally only focuses on the partial visual information of image, but in some problems the text information of partial image has a direct relationship with the classification result. By extracting the semantic information of the image text, the image classification effect can be further improved. We comprehensively consider the visual information and local text information of image, and then propose an end-to-end classification model to solve the problem of fine-grained image classification. On the one hand, the deep convolutional neural network is used to obtain the visual features of the image, on the other hand, according to the proposed end-to-end text recognition network, the text information of the image is extracted, and then the visual feature and the text feature are merged by the correlation calculation module and sent to the classification network. Finally, we test the results of our method in the image classification on the public dataset Con-Text, and also verify the end-to-end text recognition network on the SVT dataset, which is better than the previous method.
Keywords: computer vision  fine-grained image classification  scene text recognition  convolution neural network  attention mechanism  
本文献已被 万方数据 等数据库收录!
点击此处可从《图学学报》浏览原始摘要信息
点击此处可从《图学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号