首页 | 本学科首页   官方微博 | 高级检索  
     

眼动-语言跨模态共指消解方法
作者姓名:张珺倩  宋明武  谢良  张亚坤  印二威  闫野
作者单位:天津大学医学工程与转化医学研究院,天津人工智能创新中心,国防科技创新研究院;天津人工智能创新中心,国防科技创新研究院;天津人工智能创新中心,国防科技创新研究院;天津人工智能创新中心,国防科技创新研究院;天津人工智能创新中心
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目):面向复杂救援场景的跨媒介融合自然交互技术研究
摘    要:跨模态共指消解是根据人员交互意图对自然图像中所指目标的定位任务,作为智能人机交互领域的关键技术之一,能够应用于抢险救灾、家庭服务或养老助残等场景。现有的目标指代方法一般采用单模态信息表现人类意图,例如语言或者眼动等,然而单一的模态用户输入只能够传达有限的交互信息,难以实现自然而智能的人机协同。本文针对这一问题,同时融合眼动和语言信息,建立了跨模态共指消解模型,利用多种模态信息的优势互补,实现人类意图所指目标的图像定位任务。设计了对比试验,验证了本文提出的眼动-语言跨模态的融合方法性能优于单模态的输入形式。

关 键 词:深度学习  跨模态  目标定位  眼动  自然语言处理
收稿时间:2022/7/20 0:00:00
修稿时间:2022/9/6 0:00:00

Object referring with language and human gaze
Authors:Zhang Junqian  Song Mingwu  Xie Liang  Zhang Yakun  Yin Erwei and Yan Ye
Abstract:Object referring is a task to locate the target in the image according to human intention. As one of the key technologies of intelligent human-computer interaction, it can be applied to scenarios such as emergency rescue and disaster relief, family service or providing for the disabled. The existing works of object referring generally use single-modal information to express human intention, such as language or gaze, etc. However, a single modal can only convey limited information, it is difficult to perform natural and intelligent human-computer collaboration. In order to solve this problem, we propose a method to achieve object referring with language and human gaze, utilizing the advantages of multiple modals to realize localization of the target referred to by human intention. Comparative experiments are designed to verify that the performance of the gaze-language cross-modal object referring method proposed in this paper outperforms that of the single-modal input method.
Keywords:deep learning  multi-modal  localization  gaze  natural language processing
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号