首页 | 本学科首页   官方微博 | 高级检索  
     

基于运动引导的高效无监督视频目标分割网络
引用本文:赵子成, 张开华, 樊佳庆, 刘青山. 基于运动引导的高效无监督视频目标分割网络. 自动化学报, 2023, 49(4): 872−880 doi: 10.16383/j.aas.c210626
作者姓名:赵子成  张开华  樊佳庆  刘青山
作者单位:1.南京信息工程大学自动化学院 南京 210044
基金项目:科技创新2030 —— “新一代人工智能”重大项目(2018AAA0100400), 国家自然科学基金(61876088, U20B2065, 61532009), 江苏省333工程人才项目(BRA2020291)资助
摘    要:大量基于深度学习的无监督视频目标分割(Unsupervised video object segmentation, UVOS)算法存在模型参数量与计算量较大的问题, 这显著限制了算法在实际中的应用. 提出了基于运动引导的视频目标分割网络, 在大幅降低模型参数量与计算量的同时, 提升视频目标分割性能. 整个模型由双流网络、运动引导模块、多尺度渐进融合模块三部分组成. 具体地, 首先, RGB图像与光流估计输入双流网络提取物体外观特征与运动特征; 然后, 运动引导模块通过局部注意力提取运动特征中的语义信息, 用于引导外观特征学习丰富的语义信息; 最后, 多尺度渐进融合模块获取双流网络的各个阶段输出的特征, 将深层特征渐进地融入浅层特征, 最终提升边缘分割效果. 在3个标准数据集上进行了大量评测, 实验结果表明了该方法的优越性能.

关 键 词:无监督视频目标分割   运动引导   局部注意力   互注意力
收稿时间:2021-07-06

Learning Motion Guidance for Efficient Unsupervised Video Object Segmentation
Zhao Zi-Cheng, Zhang Kai-Hua, Fan Jia-Qing, Liu Qing-Shan. Learning motion guidance for efficient unsupervised video object segmentation. Acta Automatica Sinica, 2023, 49(4): 872−880 doi: 10.16383/j.aas.c210626
Authors:ZHAO Zi-Cheng  ZHANG Kai-Hua  FAN Jia-Qing  LIU Qing-Shan
Affiliation:1. School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044
Abstract:Numerous unsupervised video object segmentation (UVOS) algorithms based on deep learning have super-fluous model parameters and expensive computational overhead, which limits the applications of the algorithms in practice. To relieve the issues, this paper proposes an unsupervised video object segmentation network based on motion guidance, which can significantly reduce the number of model parameters and calculations, and improve the performance of segmentation. The multi-scale progressive fusion module consists of three parts. Specifically, RGB image and optical flow estimation are fed into the dual flow network to extract object appearance features and motion features. Then, the motion guidance module extracts semantic information from motion features through local attention to guide semantical appearance features learning. Finally, the multi-scale progressive fusion module obtains output features of each stage of dual flow network, and gradually integrates deep features with shallow features. Extensive evaluations are conducted on three mainstream datasets, and the results show the superior performance of the proposed method.
Keywords:Unsupervised video object segmentation (UVOS)  motion guidance  local attention  co-attention
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号