首页 | 本学科首页   官方微博 | 高级检索  
     

基于渐进式学习与多尺度增强的客体视觉注意力估计方法
引用本文:丰江帆,何中鱼.基于渐进式学习与多尺度增强的客体视觉注意力估计方法[J].电子与信息学报,2023,45(4):1475-1484.
作者姓名:丰江帆  何中鱼
作者单位:重庆邮电大学计算机科学与技术学院 重庆 400065
基金项目:国家自然科学基金 (41971365),重庆市自然科学基金(cstc2020jcyj-msxmX0635)
摘    要:视觉注意力机制已引起学界和产业界的广泛关注,但既有工作主要从场景观察者的视角进行注意力检测。然而,现实中不断涌现的智能应用场景需要从客体视角进行视觉注意力检测。例如,检测监控目标的视觉注意力有助于预测其后续行为,智能机器人需要理解交互对象的意图才能有效互动。该文结合客体视觉注意力的认知机制,提出一种基于渐进式学习与多尺度增强的客体视觉注意力估计方法。该方法把客体视域视为几何结构和几何细节的组合,构建层次自注意力模块(HSAM)获取深层特征之间的长距离依赖关系,适应几何特征的多样性;并利用方向向量和视域生成器得到注视点的概率分布,构建特征融合模块将多分辨率特征进行结构共享、融合与增强,更好地获取空间上下文特征;最后构建综合损失函数来估计注视方向、视域和焦点预测的相关性。实验结果表明,该文所提方法在公开数据集和自建数据集上对客体视觉注意力估计的不同精度评价指标都优于目前的主流方法。

关 键 词:客体视觉注意力  渐进式学习  层次自注意力  特征融合
收稿时间:2022-03-02

Objective Visual Attention Estimation Method via Progressive Learning and Multi-scale Enhancement
FENG Jiangfan,HE Zhongyu.Objective Visual Attention Estimation Method via Progressive Learning and Multi-scale Enhancement[J].Journal of Electronics & Information Technology,2023,45(4):1475-1484.
Authors:FENG Jiangfan  HE Zhongyu
Affiliation:School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Abstract:Understanding the attention mechanism of the human visual system has attracted much research attention from researchers and industries. Recent studies of attention mechanisms focus mainly on observer patterns. However, more intelligent applications are presented in the real world and require objective visual attention detection. Automating tasks such as surveillance or human-robot collaboration require anticipating and predicting the behavior of objects. In such contexts, gaze and focus can be highly informative about participants' intentions, goals, and upcoming decisions. Here, a progressive mechanism of objective visual attention is developed by combining cognitive mechanisms. The field is first viewed as a combination of geometric structure and geometric details. A Hierarchical Self-Attention Module (HSAM) is constructed to capture the long-distance dependencies between deep features and adapt geometric feature diversity. With the identified generators, the field of view direction vectors are generated, and the probability distribution of gaze points is obtained. Furthermore, a feature fusion module is designed for structure sharing, fusion, and enhancement of multi-resolution features. Its output contains more detailed spatial and global information, better obtaining spatial context features. The experimental results are in excellent agreement with theoretical predictions by different evaluation metrics for objective attention estimation on publicly available and self-built datasets.
Keywords:
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号