首页 | 本学科首页   官方微博 | 高级检索  
     

基于子问题渐进式推理的3D视觉问答
引用本文:李长健,杨昱威,肖枭,雷印杰. 基于子问题渐进式推理的3D视觉问答[J]. 计算机应用研究, 2023, 40(4): 987-990+995
作者姓名:李长健  杨昱威  肖枭  雷印杰
作者单位:四川大学电子信息学院,四川大学 电子信息学院,四川大学 电子信息学院,四川大学 电子信息学院
基金项目:国家重点研发计划项目(2021YFC3300305)
摘    要:3D视觉问答可以帮助人们理解空间信息,在幼儿教育等方面具有广阔的应用前景。3D场景信息复杂,现有方法大多直接进行回答,面对复杂问题时容易忽视上下文细节,从而导致性能下降。针对该问题,提出了一种基于子问题渐进式推理的3D视觉问答方法,通过文本分析为复杂的原始问题构建多个简单的子问题。模型在回答子问题的过程中学习上下文信息,帮助理解复杂问题的含义,最终利用积累的联合信息得出原始问题的答案。子问题与原始问题呈现渐近式推理关系,使得模型具有明确的错误解释性和可追溯性。在现有3D数据集ScanQA上进行的实验表明,所提方法在EM@10和CIDEr两个指标上分别达到了51.49%和61.68%,均超过了现有的其他3D视觉问答方法,证实了该方法的有效性。

关 键 词:3D视觉问答  原始问题  子问题  渐进式推理  上下文信息
收稿时间:2022-08-12
修稿时间:2023-03-07

3D visual question answering based on sub-questions asymptotic reasoning
Li Changjian,Yang Yuwei,Xiao Xiao and Lei Yinjie. 3D visual question answering based on sub-questions asymptotic reasoning[J]. Application Research of Computers, 2023, 40(4): 987-990+995
Authors:Li Changjian  Yang Yuwei  Xiao Xiao  Lei Yinjie
Affiliation:College of Electronics and Information Engineering, Sichuan University,,,
Abstract:3D visual question answering can help people understand spatial information, which has a broad application prospect in early childhood education. The 3D scene information is complex, and most of the existing methods answer directly. It is easy to ignore the context information in the scene when facing complex problems, which leads to the performance degradation. To address this problem, this paper proposed a 3D visual question answering method based on sub-question asymptotic reasoning, which constructed multiple simple sub-questions for complex original question through text analysis. The model learnt context information in the process of answering the sub-questions to help understand the meaning of the complex question, and finally used the accumulated joint information to derive the answers to the original question. The sub-questions presented an asymptotic reasoning relationship with the original question, which made the model have explicit error interpretation and traceability. Experiments conducted on the ScanQA dataset show that, the proposed method achieves 51.49% and 61.68% for the two evaluation metrics EM@10 and CIDEr, both exceeding other existing methods, confirming the effectiveness of the method.
Keywords:3D visual question answering   original question   sub-question   asymptotic reasoning   context information
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号