首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.  相似文献   

2.
目的 立体视频能提供身临其境的逼真感而越来越受到人们的喜爱,而视觉显著性检测可以自动预测、定位和挖掘重要视觉信息,可以帮助机器对海量多媒体信息进行有效筛选。为了提高立体视频中的显著区域检测性能,提出了一种融合双目多维感知特性的立体视频显著性检测模型。方法 从立体视频的空域、深度以及时域3个不同维度出发进行显著性计算。首先,基于图像的空间特征利用贝叶斯模型计算2D图像显著图;接着,根据双目感知特征获取立体视频图像的深度显著图;然后,利用Lucas-Kanade光流法计算帧间局部区域的运动特征,获取时域显著图;最后,将3种不同维度的显著图采用一种基于全局-区域差异度大小的融合方法进行相互融合,获得最终的立体视频显著区域分布模型。结果 在不同类型的立体视频序列中的实验结果表明,本文模型获得了80%的准确率和72%的召回率,且保持了相对较低的计算复杂度,优于现有的显著性检测模型。结论 本文的显著性检测模型能有效地获取立体视频中的显著区域,可应用于立体视频/图像编码、立体视频/图像质量评价等领域。  相似文献   

3.
This paper presents a spatio-temporal saliency model that predicts eye movement during video free viewing. This model is inspired by the biology of the first steps of the human visual system. The model extracts two signals from video stream corresponding to the two main outputs of the retina: parvocellular and magnocellular. Then, both signals are split into elementary feature maps by cortical-like filters. These feature maps are used to form two saliency maps: a static and a dynamic one. These maps are then fused into a spatio-temporal saliency map. The model is evaluated by comparing the salient areas of each frame predicted by the spatio-temporal saliency map to the eye positions of different subjects during a free video viewing experiment with a large database (17000 frames). In parallel, the static and the dynamic pathways are analyzed to understand what is more or less salient and for what type of videos our model is a good or a poor predictor of eye movement.  相似文献   

4.
A dynamic saliency attention model based on local complexity is proposed in this paper. Low-level visual features are extracted from current and some previous frames. Every feature map is resized into some different sizes. The feature maps in same size and same feature for all the frames are used to calculate a local complexity map. All the local complexity maps are normalized and are fused into a dynamic saliency map. In the same time, a static saliency map is acquired by the current frame. Then dynamic and static saliency maps are fused into a final saliency map. Experimental results indicate that: when there is noise among the frames or there is change of illumination among the frames, our model is excellent to Marat?s model and Shi?s model; when the moving objects do not belong to the static salient regions, our model is better than Ban?s model.  相似文献   

5.
The human vision has been studied deeply in the past years, and several different models have been proposed to simulate it on computer. Some of these models concerns visual saliency which is potentially very interesting in a lot of applications like robotics, image analysis, compression, video indexing. Unfortunately they are compute intensive with tight real-time requirements. Among all the existing models, we have chosen a spatio-temporal one combining static and dynamic information. We propose in this paper a very efficient implementation of this model with multi-GPU reaching real-time. We present the algorithms of the model as well as several parallel optimizations on GPU with higher precision and execution time results. The real-time execution of this multi-path model on multi-GPU makes it a powerful tool to facilitate many vision related applications.  相似文献   

6.
Many tone mapping algorithms have been proposed based on the studies in Human Visual System; however, they rarely addressed the effects of attention to contrast response. As attention plays an important role in human visual system, we proposed a local tone mapping method that respects both attention and adaptation effects. We adopt the High Dynamic Range (HDR) saliency map to compute an attention map, which predicts the attentive regions and nonattentative regions in an HDR image. The attention map is then used to locally adjust the contrast of the HDR image according to attention and adaptation models found in psychophysics. We applied our tone mapping approach to HDR images and videos and compared with the results generated by three state-of-the-art tone mapping algorithms. Our experiments show that our approach produces results with better image quality in terms of preserving details and chromaticity of visual saliency.  相似文献   

7.
This work addresses the development of a computational model of visual attention to perform the automatic summarization of digital videos from television archives. Although the television system represents one of the most fascinating media phenomena ever created, we still observe the absence of effective solutions for content-based information retrieval from video recordings of programs produced by this media universe. This fact relates to the high complexity of the content-based video retrieval problem, which involves several challenges, among which we may highlight the usual demand on video summaries to facilitate indexing, browsing and retrieval operations. To achieve this goal, we propose a new computational visual attention model, inspired on the human visual system and based on computer vision methods (face detection, motion estimation and saliency map computation), to estimate static video abstracts, that is, collections of salient images or key frames extracted from the original videos. Experimental results with videos from the Open Video Project show that our approach represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.  相似文献   

8.
In this paper we present a computational model of dynamic visual attention on the sphere which combines static (intensity,chromaticity, orientation) and motion features in order to detect salient locations in omnidirectional image sequences while working directly in spherical coordinates. We build the motion pyramid on the sphere by applying block matching and varying the block size. The spherical motion conspicuity map is obtained by fusing together the spherical motion magnitude and phase conspicuities. Furthermore, we combine this map with the static spherical saliency map in order to obtain the dynamic saliency map on the sphere. Detection of the spots of attention based on the dynamic saliency map on the sphere is applied on a sequence of real spherical images. The effect of using only the spherical motion magnitude or phase for defining the spots of attention on the sphere is examined as well. Finally, we test the spherical versus Euclidean spots detection on the omnidirectional image sequence.  相似文献   

9.

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

  相似文献   

10.
冯欣  杨丹  张凌 《自动化学报》2011,37(11):1322-1331
针对网络中受丢包损伤的视频提出了一种基于视觉注意力变化的全参考客观质量评估方法.该方法基于视觉显著性检测在视频数据上的应用,考察受网络丢包失真影响的视频数据与标准参考数据在空间和时间上引起的视觉注意力变化,并根据此变化相应的视觉显著性在空间和时间上的差异,提出了一组客观质量评估方法.文中采用17个受丢包损伤的视频数据进行测试,并实施了主观评价实验作为评价标准.与传统的没有考虑人眼视觉显著特性的质量评估方法,以及目前主流的基于视觉显著区域/感兴趣区域对失真像素进行加权的方法进行对比,实验结果表明, 基于视觉注意力变化的方法较后两者与主观质量评估结果有更好的相关性, 能够更有效地评估丢包损伤视频的质量.  相似文献   

11.
目的 为研究多场景下的行人检测,提出一种视觉注意机制下基于语义特征的行人检测方法。方法 首先,在初级视觉特征基础上,结合行人肤色的语义特征,通过将自下而上的数据驱动型视觉注意与自上而下的任务驱动型视觉注意有机结合,建立空域静态视觉注意模型;然后,结合运动信息的语义特征,采用运动矢量熵值计算运动显著性,建立时域动态视觉注意模型;在此基础上,以特征权重融合的方式,构建时空域融合的视觉注意模型,由此得到视觉显著图,并通过视觉注意焦点的选择完成行人检测。结果 选用标准库和实拍视频,在Matlab R2012a平台上,进行实验验证。与其他视觉注意模型进行对比仿真,本文方法具有良好的行人检测效果,在实验视频上的行人检测正确率达93%。结论 本文方法在不同的场景下具有良好的鲁棒性能,能够用于提高现有视频监控系统的智能化性能。  相似文献   

12.
Xiao  Feng  Liu  Baotong  Li  Runa 《Multimedia Tools and Applications》2020,79(21-22):14593-14607

In response to the problem that the primary visual features are difficult to effectively address pedestrian detection in complex scenes, we present a method to improve pedestrian detection using a visual attention mechanism with semantic computation. After determining a saliency map with a visual attention mechanism, we can calculate saliency maps for human skin and the human head-shoulders. Using a Laplacian pyramid, the static visual attention model is established to obtain a total saliency map and then complete pedestrian detection. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on the INRIA dataset with 92.78% pedestrian detection accuracy at a very competitive time cost.

  相似文献   

13.
基于视觉注意机制的彩色图像显著性区域提取   总被引:2,自引:0,他引:2  
孟琭 《计算机应用研究》2013,30(10):3159-3161
图像显著性区域提取是计算机视觉处理的重要步骤。结合人类视觉心理、生理模型, 提出一种基于视觉注意机制的彩色图像显著性区域提取模型。通过改进的分水岭算法对彩色图像进行预分割, 从而将原图像分成若干子区域, 在此基础上运用提出的区域化空间注意力模型对各个子区域进行显著图计算, 得到最终的显著性区域提取结果。实验结果表明, 提出的显著性区域提取算法可以很好地从彩色图像中得到与视觉注意机制相一致的结果, 且满足实时性要求, 与传统方法相比, 算法提取的区域更完整、更准确。  相似文献   

14.
目的 动态场景图像中所存在的静态目标、背景纹理等静态噪声,以及背景运动、相机抖动等动态噪声,极易导致运动目标检测误检或漏检。针对这一问题,本文提出了一种基于运动显著性概率图的目标检测方法。方法 该方法首先在时间尺度上构建包含短期运动信息和长期运动信息的构建时间序列组;然后利用TFT(temporal Fourier transform)方法计算显著性值。基于此,得到条件运动显著性概率图。接着在全概率公式指导下得到运动显著性概率图,确定前景候选像素,突出运动目标的显著性,而对背景的显著性进行抑制;最后以此为基础,对像素的空间信息进行建模,进而检测运动目标。结果 对提出的方法在3种典型的动态场景中与9种运动目标检测方法进行了性能评价。3种典型的动态场景包括静态噪声场景、动态噪声场景及动静态噪声场景。实验结果表明,在静态噪声场景中,Fscore提高到92.91%,准确率提高到96.47%,假正率低至0.02%。在动态噪声场景中,Fscore提高至95.52%,准确率提高到95.15%,假正率低至0.002%。而在这两种场景中,召回率指标没有取得最好的性能的原因是,本文所提方法在较好的包络目标区域的同时,在部分情况下易将部分目标区域误判为背景区域的,尤其当目标区域较小时,这种误判的比率更为明显。但是,误判的比率一直维持在较低的水平,且召回率的指标也保持在较高的值,完全能够满足于实际应用的需要,不能抵消整体性能的显著提高。另外,在动静态噪声场景中,4种指标均取得了最优的性能。因此,本文方法能有效地消除静态目标干扰,抑制背景运动和相机抖动等动态噪声,准确地检测出视频序列中的运动目标。结论 本文方法可以更好地抑制静态背景噪声和由背景变化(水波荡漾、相机抖动等)引起的动态噪声,在复杂的噪声背景下准确地检测出运动目标,提高了运动目标检测的鲁棒性和普适性。  相似文献   

15.
针对先前的立体图像显著性检测模型未充分考虑立体视觉舒适度和视差图分布特征对显著区域检测的影响,提出了一种结合立体视觉舒适度因子的显著性计算模型.该模型在彩色图像显著性提取中,首先利用SLIC算法对输入图像进行超像素分割,随后进行颜色相似区域合并后再进行二维图像显著性计算;在深度显著性计算中,首先对视差图进行预处理;然后基于区域对比度进行显著性计算;最后,结合立体视觉舒适度因子对二维显著图和深度显著图进行融合,得到立体图像显著图.在不同类型立体图像上的实验结果表明,该模型获得了85%的准确率和78%的召回率,优于现有常用的显著性检测模型,并与人眼立体视觉注意力机制保持良好的一致性.  相似文献   

16.
利用视觉显著性的图像分割方法   总被引:6,自引:3,他引:3       下载免费PDF全文
提出一种利用视觉显著性对图像进行分割的方法。首先提取图像的底层视觉特征,从局部显著性、全局显著性和稀少性3个方面计算各特征图像中各像素的视觉显著性,得到各特征显著图;对各特征显著图进行综合,生成最终的综合显著图。然后对综合显著图进行阈值分割,得到二值图像,将二值图像与原始图像叠加,将前景和背景分离,得到图像分割结果。在多幅自然图像上进行实验验证,并给出相应的实验结果和分析。实验结果表明,该方法正确有效,具有和人类视觉特性相符合的分割效果。  相似文献   

17.
人类视觉系统能够通过对场景中感兴趣的不同事物进行显著性检测,有效地配置处理资源。基于视觉注意机制的显著性检测方法能够简化遥感影像场景分析、目标解译的复杂程度,节省处理资源。以视觉注意机制为基础,提出了一种尺度自适应的SAR图像显著性检测方法,通过不同尺度下的局部复杂度和自差异性来度量图像的显著性测度,设计显著性尺度确定算法以及融合显著性尺度和显著性测度以生成显著图,完成显著性检测的流程。实验结果表明该方法能够有效应用于SAR图像显著性检测,较之其他主流显著区域检测算法更适用于SAR图像场景分析。  相似文献   

18.
目的 图像的临界差异(just noticeable difference, JND)阈值估计对提升图像压缩比以及信息隐藏效率具有重要意义。亮度适应性和空域掩蔽效应是决定JND阈值大小的两大核心因素。现有的空域掩蔽模型主要考虑对比度掩蔽和纹理掩蔽两方面。然而,当前采用的纹理掩蔽模型不能有效地描述与纹理粗糙度相关的掩蔽效应对图像JND阈值的影响。对此,本文提出一种基于分形理论的JND阈值估计模型。方法 首先,考虑到人眼视觉系统对具有粗糙表面的图像内容变化具有较低的分辨能力,通过经典的分形理论来计算图像局部区域的分形维数,并以此作为对纹理粗糙度的度量,并在此基础上提出一种新的基于纹理粗糙度的纹理掩蔽模型。然后,将提出的纹理掩蔽模型与传统的亮度适应性相结合估计得到初步的JND阈值。最后,考虑到人眼的视觉注意机制,进一步考虑图像内容的视觉显著性,对JND阈值进行感知一致性修正,估计得到最终的JND阈值。结果 选取4种相关方法进行对比,结果表明,在注入相同甚至更多噪声的情况下,相较于对比方法中的最优结果,本文方法的平均VSI(visual saliency-induced index)和平均MO...  相似文献   

19.
Visual attention tends to avoid locations where previous visual attention has once focused. This phenomenon is called inhibition of return (IOR), and is known as one of the important dynamic properties of visual attention. Recently, several studies have reported that IOR occurs not only on locations, but also on visual features. In this study, we propose a visual attention model that involves a featurebased IOR by extending a recent model of the “saliency map.” Our model is demonstrated by a computer simulation, and its neuronal basis is also discussed.  相似文献   

20.
For the purpose of extracting attention regions from distorted videos, a distortion-weighing spatiotemporal visual attention model is proposed. On the impact of spatial and temporal saliency maps, visual attention regions are acquired directed in a bottom-up manner. Meanwhile, the blocking artifact saliency map is detected according to intensity gradient features. An attention selection is applied to identify one of visual attention regions with more relatively serious blocking artifact as the Focus of Attention (FOA) directed in a top-down manner. Experimental results show that the proposed model can not only accurately analyze the spatiotemporal saliency based on the intensity, the texture, and the motion features, but also able to estimate the blocking artifact of distortions in comparing with Walther’s and You’s models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号