首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于多模态感知的双声道音频生成方法
引用本文:官 丽,尹 康,樊梦佳,薛 昆,解 凯.一种基于多模态感知的双声道音频生成方法[J].计算技术与自动化,2022(4):157-165.
作者姓名:官 丽  尹 康  樊梦佳  薛 昆  解 凯
作者单位:(1.国网北京市电力公司,北京 100031;2. 南京南瑞继保电气有限公司,江苏 南京 211102)
摘    要:现有多数视频只包含单声道音频,缺乏双声道音频所带来的立体感。针对这一问题,本文提出了一种基于多模态感知的双声道音频生成方法,其在分析视频中视觉信息的基础上,将视频的空间信息与音频内容融合,自动为原始单声道音频添加空间化特征,生成更接近真实听觉体验的双声道音频。我们首先采用一种改进的音频视频融合分析网络,以编码器-解码器的结构,对单声道视频进行编码,接着对视频特征和音频特征进行多尺度融合,并对视频及音频信息进行协同分析,使得双声道音频拥有了原始单声道音频所没有的空间信息,最终生成得到视频对应的双声道音频。在公开数据集上的实验结果表明,本方法取得了优于现有模型的双声道音频生成效果,在STFT距离以及ENV距离两项指标上均取得提升。

关 键 词:音频生成  卷积神经网络  多模态

A Dual-Channel Audio Generation Method Based on Multimodal Perception
GUAN Li,YIN Kang,FAN Meng-ji,XUE Kun,XIE Kai.A Dual-Channel Audio Generation Method Based on Multimodal Perception[J].Computing Technology and Automation,2022(4):157-165.
Authors:GUAN Li  YIN Kang  FAN Meng-ji  XUE Kun  XIE Kai
Abstract:Most existing videos only contain mono audio and lack the stereoscopic sense by dual-channel audio. To address this issue, this paper proposes a method for generating dual-channel audio based on multimodal perception. Based on the analysis of visual information in the video, it fuses the spatial information and the audio content of the video, and generates dual-channel audio that is closer to the real auditory experience. We first encode the mono video via an improved audio-video fusion analysis network with an encoder-decoder structure. Then we fuse the video features and audio features in multiple perspectives. Subsequently, we co-analyze the video and audio information, so that the dual-channel audio has spatial information that the original mono audio does not have. Finally, the corresponding dual-channel audio is generated by the audio-video fusion analysis network. Experimental results demonstrate that our method achieves better performance than existing models in the generation of two-channel audio, with improvements in both STFT distance and ENV distance.
Keywords:audio generation  CNN  multimodal
点击此处可从《计算技术与自动化》浏览原始摘要信息
点击此处可从《计算技术与自动化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号