改进视觉Transformer的视频插帧方法 Video frame interpolation method based on improved visual Transformer期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

改进视觉Transformer的视频插帧方法

引用本文：	石昌通,单鸿涛,郑光远,张玉金,刘怀远,宗智浩.改进视觉Transformer的视频插帧方法[J].计算机应用研究,2024,41(4):1252-1257.

作者姓名：	石昌通单鸿涛郑光远张玉金刘怀远宗智浩

作者单位：	1. 上海工程技术大学电子电气工程学院;2. 上海建桥学院信息技术学院

基金项目：	国家自然科学基金资助项目(62173222)；

摘要：	针对现有的视频插帧方法无法有效处理大运动和复杂运动场景的问题，提出了一种改进视觉Transformer的视频插帧方法。该方法融合了基于跨尺度窗口的注意力和可分离的时空局部注意力，增大了注意力的感受野并聚合了多尺度信息；对时空依赖和远程像素依赖关系进行联合建模，进而增强了模型对大运动场景的处理能力。实验结果表明，该方法在Vimeo90K测试集和DAVIS数据集上的PSNR指标分别达到了37.13 dB和28.28 dB,SSIM指标分别达到了0.978和0.891。同时，可视化结果表明，该方法针对存在大运动、复杂运动和遮挡场景的视频能产生清晰合理的插帧结果。
关键词：	视频插帧 Transformer 基于跨尺度窗口的注意力大运动复杂运动
收稿时间：	2023/7/16 0:00:00
修稿时间：	2024/3/13 0:00:00
Video frame interpolation method based on improved visual Transformer

shichangtong,shanhongtao,zhengguangyuan,zhangyujin,liuhuaiyuan and zongzhihao.Video frame interpolation method based on improved visual Transformer[J].Application Research of Computers,2024,41(4):1252-1257.

Authors:	shichangtong shanhongtao zhengguangyuan zhangyujin liuhuaiyuan and zongzhihao

Affiliation:	Shanghai University of Engineering Science,,,,,

Abstract:	Aiming at the problem that the existing video frame interpolation methods cannot effectively deal with large motion and complex motion scenes, this paper proposed a video frame interpolation method based on improved vision Transformer. This method fused the cross-scale window-based attention and the separable spatial-temporal local attention, enlarged the receptive field of attention, and aggregated multi-scale information. It jointly modeled the spatial-temporal dependencies and long-range pixel dependencies, thereby enhancing the model''s ability to handle large motion scenes. The experimental results show that this model achieves PSNR values of 37.13 dB and 28.28 dB on the Vimeo90K test set and the DAVIS dataset, respectively, while the SSIM values reach 0.978 and 0.891, respectively. At the same time, visualization results show that the proposed method can produce clear and reasonable frame interpolation results for videos with large motion, complex motion and occlusion scenes.

Keywords:	video frame interpolation transformer cross-scale window-based attention large motion complex motion

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏