一种基于多层语义特征的图像理解方法 An image understanding method based on multi-level semantic features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于多层语义特征的图像理解方法

引用本文：	莫宏伟,田朋.一种基于多层语义特征的图像理解方法[J].控制与决策,2021,36(12):2881-2890.

作者姓名：	莫宏伟田朋

作者单位：	哈尔滨工程大学智能科学与工程学院,哈尔滨150001

基金项目：	国家重点研发计划项目(2018AAA0102702).

摘要：	视觉场景理解包括检测和识别物体、推理被检测物体之间的视觉关系以及使用语句描述图像区域.为了实现对场景图像更全面、更准确的理解,将物体检测、视觉关系检测和图像描述视为场景理解中3种不同语义层次的视觉任务,提出一种基于多层语义特征的图像理解模型,并将这3种不同语义层进行相互连接以共同解决场景理解任务.该模型通过一个信息传递图将物体、关系短语和图像描述的语义特征同时进行迭代和更新,更新后的语义特征被用于分类物体和视觉关系、生成场景图和描述,并引入融合注意力机制以提升描述的准确性.在视觉基因组和COCO数据集上的实验结果表明,所提出的方法在场景图生成和图像描述任务上拥有比现有方法更好的性能.
关键词：	图像理解语义层语义特征视觉关系场景图图像描述注意力机制
An image understanding method based on multi-level semantic features

MO Hong-wei,TIAN Peng.An image understanding method based on multi-level semantic features[J].Control and Decision,2021,36(12):2881-2890.

Authors:	MO Hong-wei TIAN Peng

Affiliation:	College of Intelligent Systems Science and Engineering,Harbin Engineering University,Harbin 150001,China

Abstract:	Visual scene understanding includes detecting and recognizing objects, reasoning the visual relationships of the detected objects, and describing image regions with sentences. In order to achieve the more comprehensive and accurate understanding of scene image, we view object detection, visual relationship detection and image captioning as three visual tasks at different semantic levels in scene understanding, so as to propose an image understanding model based on multi-level semantic features to leverage the mutual connections across the three different semantic layers to solve the scene understanding tasks jointly. The model iterates and updates the semantic features of objects, relationship phrases and image captioning simultaneously through a message pass graph. The updated semantic features are used to classify objects and visual relationships, generate scene graphs and captions, and introduce a fusion attention mechanism to improve the accuracy of captions. The experimental results on the visual genome and COCO datasets show that the proposed method outperforms the existing methods on the scene graph generation and image captioning tasks.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏