眼动-语言跨模态共指消解方法 Object referring with language and human gaze期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

眼动-语言跨模态共指消解方法

作者姓名：	张珺倩宋明武谢良张亚坤印二威闫野

作者单位：	天津大学医学工程与转化医学研究院,天津人工智能创新中心,国防科技创新研究院；天津人工智能创新中心,国防科技创新研究院；天津人工智能创新中心,国防科技创新研究院；天津人工智能创新中心,国防科技创新研究院；天津人工智能创新中心

基金项目：	国家自然科学基金项目（面上项目，重点项目，重大项目）：面向复杂救援场景的跨媒介融合自然交互技术研究

摘要：	跨模态共指消解是根据人员交互意图对自然图像中所指目标的定位任务，作为智能人机交互领域的关键技术之一，能够应用于抢险救灾、家庭服务或养老助残等场景。现有的目标指代方法一般采用单模态信息表现人类意图，例如语言或者眼动等，然而单一的模态用户输入只能够传达有限的交互信息，难以实现自然而智能的人机协同。本文针对这一问题，同时融合眼动和语言信息，建立了跨模态共指消解模型，利用多种模态信息的优势互补，实现人类意图所指目标的图像定位任务。设计了对比试验，验证了本文提出的眼动-语言跨模态的融合方法性能优于单模态的输入形式。
关键词：	深度学习跨模态目标定位眼动自然语言处理
收稿时间：	2022/7/20 0:00:00
修稿时间：	2022/9/6 0:00:00
Object referring with language and human gaze

Authors:	Zhang Junqian Song Mingwu Xie Liang Zhang Yakun Yin Erwei and Yan Ye

Abstract:	Object referring is a task to locate the target in the image according to human intention. As one of the key technologies of intelligent human-computer interaction, it can be applied to scenarios such as emergency rescue and disaster relief, family service or providing for the disabled. The existing works of object referring generally use single-modal information to express human intention, such as language or gaze, etc. However, a single modal can only convey limited information, it is difficult to perform natural and intelligent human-computer collaboration. In order to solve this problem, we propose a method to achieve object referring with language and human gaze, utilizing the advantages of multiple modals to realize localization of the target referred to by human intention. Comparative experiments are designed to verify that the performance of the gaze-language cross-modal object referring method proposed in this paper outperforms that of the single-modal input method.

Keywords:	deep learning multi-modal localization gaze natural language processing

	点击此处可从《》浏览原始摘要信息
	点击此处可从《》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏