基于文本与视觉信息的细粒度图像分类 Fine-Grained Image Classification Based on Text and Visual Information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于文本与视觉信息的细粒度图像分类

引用本文：	袁建平,陈晓龙,陈显龙,何恩杰,张加其,高宇豆. 基于文本与视觉信息的细粒度图像分类[J]. 图学学报, 2019, 40(3): 503. DOI: 10.11996/JG.j.2095-302X.2019030503

作者姓名：	袁建平陈晓龙陈显龙何恩杰张加其高宇豆

作者单位：	北京恒华伟业科技股份有限公司,北京,100011;华北电力大学控制与计算机工程学院,北京,102206

基金项目：	北京市科技计划课题(Z171100001217006)

摘要：	一般细粒度图像分类只关注图像局部视觉信息，但在一些问题中图像局部的文本信息对图像分类结果有直接帮助，通过提取图像文本语义信息可以进一步提升图像细分类效果。我们综合考虑了图像视觉信息与图像局部文本信息，提出一个端到端的分类模型来解决细粒度图像分类问题。一方面使用深度卷积神经网络获取图像视觉特征，另一方面依据提出的端到端文本识别网络，提取图像的文本信息，再通过相关性计算模块合并视觉特征与文本特征，送入分类网络。最终在公共数据集 Con-Text 上测试该方法在图像细分类中的结果，同时也在 SVT 数据集上验证端到端文本识别网络的能力，均较之前方法获得更好的效果。
关键词：	计算机视觉细粒度图像分类场景文本识别卷积神经网络注意力机制
Fine-Grained Image Classification Based on Text and Visual Information

YUAN Jian-ping,CHEN Xiao-long,CHEN Xian-long,HE En-jie,ZHANG Jia-qi,GAO Yu-dou. Fine-Grained Image Classification Based on Text and Visual Information[J]. Journal of Graphics, 2019, 40(3): 503. DOI: 10.11996/JG.j.2095-302X.2019030503

Authors:	YUAN Jian-ping CHEN Xiao-long CHEN Xian-long HE En-jie ZHANG Jia-qi GAO Yu-dou

Affiliation:	(1. Beijing Forever Technology Co. Ltd, Beijing 100011, China; 2. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China)

Abstract:	The fine-grained image classification generally only focuses on the partial visual information of image, but in some problems the text information of partial image has a direct relationship with the classification result. By extracting the semantic information of the image text, the image classification effect can be further improved. We comprehensively consider the visual information and local text information of image, and then propose an end-to-end classification model to solve the problem of fine-grained image classification. On the one hand, the deep convolutional neural network is used to obtain the visual features of the image, on the other hand, according to the proposed end-to-end text recognition network, the text information of the image is extracted, and then the visual feature and the text feature are merged by the correlation calculation module and sent to the classification network. Finally, we test the results of our method in the image classification on the public dataset Con-Text, and also verify the end-to-end text recognition network on the SVT dataset, which is better than the previous method.

Keywords:	computer vision fine-grained image classification scene text recognition convolution neural network attention mechanism
本文献已被万方数据等数据库收录！
	点击此处可从《图学学报》浏览原始摘要信息
	点击此处可从《图学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏