首页 | 本学科首页   官方微博 | 高级检索  
     

基于音视一致性的音视人眼关注点检测
引用本文:袁梦,于小雨.基于音视一致性的音视人眼关注点检测[J].计算机与现代化,2022,0(4):103-109.
作者姓名:袁梦  于小雨
摘    要:现有音视人眼关注点检测算法使用双流结构分别对音视信息进行特征提取,随后对音视特征融合得到最终的预测图。但数据集中的音频信息和视觉信息会有不相关的情况,因此在音视不一致时直接对音视特征进行融合会使得音频信息对视觉特征产生消极的影响。针对上述问题,本文提出一种基于音视一致性的音视人眼关注点检测网络(Audio-visual Consistency Network, AVCN)。为验证该网络的可靠性,本文在现有音视结合的人眼关注点检测模型上加入音视一致性网络,AVCN对提取的音、视频特征进行一致性二值判断,二者一致时,输出音视融合特征作为最终的预测图,反之则输出视觉占主导的特征作为最终结果。该算法在开放的6个数据集上进行实验,结果表明加入AVCN模型的整体指标会有所提高。

关 键 词:计算机视觉  人眼关注点检测  音视一致性  
收稿时间:2022-05-07

Audio-visual Eye Fixation Prediction Based on Audio-visual Consistency
Abstract:The existing audio-visual human eye fixation prediction algorithms use the double-stream structure to extract the features of audio-visual information respectively, and then fuse the audio-visual features to get the final prediction map. However, the audio information and visual information may not be correlated in the datasets. Therefore, when the audio and visual features are inconsistent, the direct fusion of audio and visual features will have a negative impact on the visual features. In view of the above-mentioned problems, this paper proposes an audio-visual consistency network (AVCN) for eye fixation prediction based on audio-visual consistency. In order to verify the reliability of the network, this paper adds an audio-visual consistency network to the existing audio-visual consistency human eye fixation detection model. AVCN carries out the consistency binary judgment on the extracted audio and video features. When the two are consistent, the audio-visual fusion features will be output as the final prediction map; otherwise, the visual dominant features will be output as the final result. The method is tested on six publicly available datasets, and the results show that the proposed AVCN model has better performance.
Keywords:computer version  eye fixation prediction  audio-visual consistency  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号