自上而下注意图分割的细粒度图像分类 Fine-grained image categorization with segmentation based on top-down attention map期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

自上而下注意图分割的细粒度图像分类

引用本文：	冯语姗,王子磊.自上而下注意图分割的细粒度图像分类[J].中国图象图形学报,2016,21(9):1147-1154.

作者姓名：	冯语姗王子磊

作者单位：	中国科学技术大学自动化系, 合肥 230027,中国科学技术大学自动化系, 合肥 230027

基金项目：	国家自然科学基金项目（61233003）；安徽省自然科学基金项目（1408085MF112）；中央高校研究基金科研业务费专项资金（WK3500000002）

摘要：	目的针对细粒度图像分类中的背景干扰问题，提出一种利用自上而下注意图分割的分类模型。方法首先，利用卷积神经网络对细粒度图像库进行初分类，得到基本网络模型。再对网络模型进行可视化分析，发现仅有部分图像区域对目标类别有贡献，利用学习好的基本网络计算图像像素对相关类别的空间支持度，生成自上而下注意图，检测图像中的关键区域。再用注意图初始化GraphCut算法，分割出关键的目标区域，从而提高图像的判别性。最后，对分割图像提取CNN特征实现细粒度分类。结果该模型仅使用图像的类别标注信息，在公开的细粒度图像库Cars196和Aircrafts100上进行实验验证，最后得到的平均分类正确率分别为86.74%和84.70%。这一结果表明，在GoogLeNet模型基础上引入注意信息能够进一步提高细粒度图像分类的正确率。结论基于自上而下注意图的语义分割策略，提高了细粒度图像的分类性能。由于不需要目标窗口和部位的标注信息，所以该模型具有通用性和鲁棒性，适用于显著性目标检测、前景分割和细粒度图像分类应用。
关键词：	细粒度图像分类卷积神经网络自上而下注意图 GraphCut GoogLeNet
收稿时间：	2016/3/15 0:00:00
修稿时间：	2016/5/12 0:00:00
Fine-grained image categorization with segmentation based on top-down attention map

Feng Yushan and Wang Zilei.Fine-grained image categorization with segmentation based on top-down attention map[J].Journal of Image and Graphics,2016,21(9):1147-1154.

Authors:	Feng Yushan and Wang Zilei

Affiliation:	Department of Automation, University of Science and Technology of China, Hefei 230027, China and Department of Automation, University of Science and Technology of China, Hefei 230027, China

Abstract:	Objective This paper addresses the problem of fine-grained visual categorization requiring domain-specific and expert knowledge. This task is challenging because of all the objects in a database belonging to the same basic level category, with very fine differences between classes. These subtle differences are easily overcome by image background information that is seldom discriminative and often serves as a distractor, which reduces recognition performance in fine-grained image categorization. Therefore, segmenting the background regions and localizing discriminative image parts are important precursors to fine-grained image categorization. To this end, a new segmentation algorithm based on top-down attention maps is proposed to detect discriminative objects and discount the influence of background. Once the objects are localized, they are described by convolutional neural network (CNN) features to predict the corresponding class. Method The proposed method first recognizes a dataset through the CNN model to obtain a ConvNet basis with GoogLeNet structure. The ConvNet basis is visualized to build reliable intuitions of the visual information contained in the trained CNN representations, and the visualization result shows that the saliency parts in the images correspond to the object regions while the activations of background regions are very small. Then, according to the learned ConvNet, we predict the top-1 class label of a given image and determine the spatial support of the predicted class among the image pixels. Spatial support is rearranged to produce a top-down attention map, which can effectively locate the informative image object regions. Next, given an image and its corresponding image-specific class attention map, we compute the object segmentation mask with the GraphCut segmentation algorithm. The high-quality foreground segmentations are then used to encode the image appearance into a highly discriminative visual representation by finetuning the ConvNet basis to learn a new segmentation ConvNet. Finally, the ConvNet basis and segmentation ConvNet are combined to conduct fine-grained image recognition. We also use the original images to finetune the segmentation ConvNet for improved accuracy. Result The proposed model was tested on two new benchmark datasets available for fine-grained image categorization. The two databases were Cars196 and Aircrafts100, which were designed for fine-grained image recognition with public annotations, including class labels, object bounding boxes, and part locations. Only the class label annotation was used in our evaluation, and the final average accuracy rates of Cars196 and Aircrafts100 databases were 86.74% and 84.70%, respectively. These results show that adding visual attention information is more accurate than the GoogLeNet model alone. Conclusion A semantic segmentation strategy based on top-down attention map was proposed to improve the accuracy of fine-grained image categorization. Our method did not need any bounding box or part annotations, making it very robust and applicable to a variety of datasets. The experimental results show that the attention information was very useful for fine-grained image recognition. The proposed novel model proved capable of application to salient object detection, foreground segmentation, and fine-grained image categorization.

Keywords:	fine-grained visual categorization convolutional neural network (CNN) top-down attention map GraphCut GoogLeNet

	点击此处可从《中国图象图形学报》浏览原始摘要信息
	点击此处可从《中国图象图形学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏