Remember and forget: video and text fusion for video question answering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Remember and forget: video and text fusion for video question answering

Authors:	Feng Gao Yuanyuan Ge Yongge Liu

Abstract:	Video question answering (Video QA) has received much attention in recent years. It can answer questions according to the visual content of a video clip. Video QA task can be solved only according to the video data. But if the video clip has some relevant text information, It can also be solved by using the fused video and text data. How to select the useful region features from the video frames and select the useful text features from the text information needs to be solved. And how to fuse the video and text features also needs to be solved. Therefore, we propose a forget memory network to solve these problems. The forget memory network with video framework can solve Video QA task only according to the video data. It can select the useful region features for the question and forget the irrelevant region features from the video frames. The forget memory network with video and text framework can extract the useful text features and forget the irrelevant text features for the question. And it can fuse the video and text data to solve Video QA task. The fused video and text features can help improve the experimental performance.

Keywords:
本文献已被 SpringerLink 等数据库收录！