首页 | 本学科首页   官方微博 | 高级检索  
     

融合双目多维感知特征的立体视频显著性检测
引用本文:周洋,何永健,唐向宏,陆宇,蒋刚毅.融合双目多维感知特征的立体视频显著性检测[J].中国图象图形学报,2017,22(3):305-314.
作者姓名:周洋  何永健  唐向宏  陆宇  蒋刚毅
作者单位:杭州电子科技大学通信工程学院, 杭州 310018,杭州电子科技大学通信工程学院, 杭州 310018,杭州电子科技大学通信工程学院, 杭州 310018,杭州电子科技大学通信工程学院, 杭州 310018,宁波大学信息科学与工程学院, 宁波 315211
基金项目:国家自然科学基金项目(61401132,61471348);浙江省自然科学基金项目(LY17F020027)
摘    要:目的 立体视频能提供身临其境的逼真感而越来越受到人们的喜爱,而视觉显著性检测可以自动预测、定位和挖掘重要视觉信息,可以帮助机器对海量多媒体信息进行有效筛选。为了提高立体视频中的显著区域检测性能,提出了一种融合双目多维感知特性的立体视频显著性检测模型。方法 从立体视频的空域、深度以及时域3个不同维度出发进行显著性计算。首先,基于图像的空间特征利用贝叶斯模型计算2D图像显著图;接着,根据双目感知特征获取立体视频图像的深度显著图;然后,利用Lucas-Kanade光流法计算帧间局部区域的运动特征,获取时域显著图;最后,将3种不同维度的显著图采用一种基于全局-区域差异度大小的融合方法进行相互融合,获得最终的立体视频显著区域分布模型。结果 在不同类型的立体视频序列中的实验结果表明,本文模型获得了80%的准确率和72%的召回率,且保持了相对较低的计算复杂度,优于现有的显著性检测模型。结论 本文的显著性检测模型能有效地获取立体视频中的显著区域,可应用于立体视频/图像编码、立体视频/图像质量评价等领域。

关 键 词:立体视频  立体显著性检测  视觉注意力  双目感知特征  深度显著性  运动显著性
收稿时间:2016/8/3 0:00:00
修稿时间:2016/11/23 0:00:00

Incorporation of multi-dimensional binocular perceptual characteristics to detect stereoscopic video saliency
Zhou Yang,He Yongjian,Tang Xianghong,Lu Yu and Jiang Gangyi.Incorporation of multi-dimensional binocular perceptual characteristics to detect stereoscopic video saliency[J].Journal of Image and Graphics,2017,22(3):305-314.
Authors:Zhou Yang  He Yongjian  Tang Xianghong  Lu Yu and Jiang Gangyi
Affiliation:Faculty of Communication, Hangzhou Dianzi University, Hangzhou 310018, China,Faculty of Communication, Hangzhou Dianzi University, Hangzhou 310018, China,Faculty of Communication, Hangzhou Dianzi University, Hangzhou 310018, China,Faculty of Communication, Hangzhou Dianzi University, Hangzhou 310018, China and Institute of Information Science and Engineering, Ningbo University, Ningbo 315211, China
Abstract:Objective Stereoscopic three-dimensional (3D) video services, which aim to provide realistic and immersive experiences, have gained considerable acceptance and interest. Visual saliency detection can automatically predict, locate, and identify important visual information, as well as help machines to effectively filter valuable information from high-volume multimedia data. Saliency detection models are widely studied for static or dynamic 2D scenes. However, the saliency problem of stereoscopic 3D videos has received less attention. Moreover, few studies are related to dynamic 3D scenes. Given that 3D characteristics, such as depth and visual fatigue, affect the visual attention of humans, the saliency models of static or dynamic 2D scenes are not directly applicable for 3D scenes. To address the gap in the literature, we propose a novel model for 3D salient region detection in stereoscopic videos. The model utilizes multi-dimensional, perceptual, and binocular characteristics.Methods The proposed model computes the visual salient region for stereoscopic videos from spatial, depth, and temporal domains of stereoscopic videos. The proposed algorithm is partitioned into four blocks:the measures of spatial, depth, temporal (motion) saliency, and fusion of the three conspicuity maps. In the spatial saliency module, the algorithm considers the spatial saliency in each frame of videos as a visual attention dimension. The Bayesian probabilistic framework is adopted to calculate the 2D static conspicuity map. The spatial saliency in the framework emerges naturally as self-information of visual features. These visual features are obtained from the spatial natural statistics of each stereoscopic 3D video frame rather than from a single test frame. In the depth saliency module, the algorithm considers depth as an additional visual attention dimension. Depth signals have specific characteristics that differ from those of natural signals. Therefore, the measure of depth saliency is derived from depth-perception characteristics. The model extracts the foreground saliency from a disparity map, which is combined with depth contrast to generate a depth conspicuity map. In the motion (temporal) saliency module, the algorithm considers motion as another visual dimension. The optical flow algorithm is applied to acquire the inter-frame motion information between adjacent frames. To reduce the computational complexity of optical flow algorithms, the model first extracts the salient region of the current frame in accordance with the previously obtained spatial conspicuity map and depth conspicuity map. The Lucas-Kanade optical flow algorithm is adopted to calculate the motion characteristics between local salient regions of adjacent frames, and the motion conspicuity map is produced by the regional motion vector map. In the fusion step, a new pooling approach is developed to combine the three conspicuity maps to obtain the final saliency map for stereoscopic 3D videos. This fusion approach is based on the principle that human visual systems simultaneously focus on a unique salient region and divert attention to several salient regions in a saliency map. To generate the final saliency maps of stereoscopic videos, the proposed approach replaces the conventional average weighted sum for the fusion of different features and uses a fusion method that is based on global-local difference.Results We evaluated the proposed scheme for stereoscopic video sequences with various scenarios. Moreover, we compared the proposed model with five other state-of-the-art saliency detection models. The experimental results indicated that the proposed model is efficient, effective, and has superior precision and recall with an 80% precision and 72% recall rate.Conclusion The proposed model demonstrated its efficiency and effectiveness in saliency detection for stereoscopic videos. The model can be applied to stereoscopic videos or image coding, stereoscopic videos or image quality assessment, and object detection and recognition.
Keywords:stereoscopic video  stereoscopic saliency detection  visual attention  binocular perceptual characteristics  depth saliency  motion saliency
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号