首页 | 本学科首页   官方微博 | 高级检索  
     

基于视觉关系推理与上下文门控机制的图像描述
引用本文:陈巧红,裴皓磊,孙麒.基于视觉关系推理与上下文门控机制的图像描述[J].浙江大学学报(自然科学版 ),2022,56(3):542-549.
作者姓名:陈巧红  裴皓磊  孙麒
作者单位:浙江理工大学 信息学院,浙江 杭州 310018
摘    要:为了探索图像场景理解所需要的视觉区域间关系的建模与推理,提出视觉关系推理模块. 该模块基于图像中不同的语义和空间上下文信息,对相关视觉对象间的关系模式进行动态编码,并推断出与当前生成的关系词最相关的语义特征输出. 通过引入上下文门控机制,以根据不同类型的单词动态地权衡视觉注意力模块和视觉关系推理模块的贡献. 实验结果表明,对比以往基于注意力机制的图像描述方法,基于视觉关系推理与上下文门控机制的图像描述方法更好;所提模块可以动态建模和推理不同类型生成单词的最相关特征,对输入图像中物体关系的描述更加准确.

关 键 词:图像语义描述  视觉关系推理  多模态编码  上下文门控机制  注意力机制  

Image caption based on relational reasoning and context gate mechanism
Qiao-hong CHEN,Hao-lei PEI,Qi SUN.Image caption based on relational reasoning and context gate mechanism[J].Journal of Zhejiang University(Engineering Science),2022,56(3):542-549.
Authors:Qiao-hong CHEN  Hao-lei PEI  Qi SUN
Abstract:A visual relationship reasoning module was proposed in order to explore the modeling and reasoning of the relationship between visual regions needed for image scene understanding. The relationship patterns between the two related visual objects were encoded dynamically based on different semantic and spatial context information, and the most relevant feature output of the currently generated relationship words was inferred by using the module. In addition, the contributions between the visual attention module and the visual relational reasoning module were controlled dynamically according to the different types of words by introducing the context gate mechanism. Experimental results show that the method has better performance than other state-of-the-art methods based on attention mechanism. By using the module a model is established dynamically, the most relevant features of different types for the generated words are inferred, and the quality of image caption is improved.
Keywords:image caption  visual relationship reasoning  multimodal encoding  context gate mechanism  attention mechanism  
点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
点击此处可从《浙江大学学报(自然科学版 )》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号