首页 | 本学科首页   官方微博 | 高级检索  
     

深度多模态不确定度的短视频事件检测方法
引用本文:苏育挺,王富铕,井佩光. 深度多模态不确定度的短视频事件检测方法[J]. 哈尔滨工业大学学报, 2024, 56(5): 36-45
作者姓名:苏育挺  王富铕  井佩光
作者单位:天津大学 电气自动化与信息工程学院,天津 300072
基金项目:国家自然科学基金(61802277)
摘    要:随着短视频的快速发展,短视频事件检测任务受到越来越多的关注。现有短视频事件检测研究普遍采用深度神经网络来获得确定的检测结果,但是网络忽略了不确定度的影响从而导致错误的预测结果也会产生过度置信的决策。为了解决上述问题,本文提出了一个深度多模态不确定度网络的短视频事件检测方法。首先,该方法在传统域分离网络中嵌入变分层,用来获得预测分布;然后,将视觉模态信息和音频模态信息输入到网络中,利用该方法所构建的独立性和相关性损失可以获得包含不确定度的音频模态共、私有域预测分布以及视觉模态共、私有域预测分布;最后,提出了一个不确定度判别法则用来筛选4个域的预测分布,从而得到最终的预测结果。在公开数据集(UCF-101与HMDB51)和新构建的短视频事件检测数据集上进行了实验。实验结果表明,面对不同的深度分类方法以及不同的数据集,本文方法不仅有着更高的分类准确率,还可以对输出结果进行不确定度估计,针对音频的干扰也具有较强的鲁棒性。

关 键 词:深度神经网络  短视频事件检测  域分离网络  变分层  模态不确定度
收稿时间:2022-07-25

Micro-video event detection method with deep multimodal uncertainty
SU Yuting,WANG Fuyou,JING Peiguang. Micro-video event detection method with deep multimodal uncertainty[J]. Journal of Harbin Institute of Technology, 2024, 56(5): 36-45
Authors:SU Yuting  WANG Fuyou  JING Peiguang
Affiliation:School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Abstract:With the rapid development of micro-videos, the task of micro-video event detection is receiving more and more attention. Existing micro-video event detection studies commonly use deep neural networks to obtain definitive detection results. But these networks that ignore the effect of uncertainty may lead to false predictions yielding definitive results. To address these problems, in this paper, a micro-video event detection method with multimodal uncertainty network was proposed. Firstly, the proposed method embeds a variational layer in a traditional domain separation network, which was used to obtain predictive distributions. Then the visual modal information and the acoustic modal information was fed into the network, and the independence and correlation losses were constructed to obtain visual-audio shared domain predictive distributions and visual-audio private domain predictive distributions. Finally, an uncertainty discriminant was proposed to filter the prediction distribution of the four domains, so as to get the final prediction results. The experiments were performed on the public dataset(UCF-101 and HMDB51) and the newly constructed micro-video event detection dataset. Experimental results show that the proposed method not only has higher classification accuracy on different datasets but also can estimate the uncertainty of the output results. It also shows robustness against audio interference.
Keywords:deep neural network   micro-video event detection   domain separation network   variational layer   modal uncertainty
点击此处可从《哈尔滨工业大学学报》浏览原始摘要信息
点击此处可从《哈尔滨工业大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号