首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

  相似文献   

2.

In recent years, the significant progress has been achieved in the field of visual saliency modeling. Our research key is in video saliency, which differs substantially from image saliency and could be better detected by adding the gaze information from the movement of eyes while people are looking at the video. In this paper we purposed a novel gaze saliency method to predict video attention, which is inspired by the widespread usage of mobile smart devices with camera. It is a non-contacted method to predict visual attention, and it does not bring the burden on the hardware. Our method first extracts the bottom-up saliency maps from the video frames, and then constructs the mapping from eye images obtained by the camera in synchronization with the video frames to the screen region. Finally the combination between top-down gaze information and bottom-up saliency maps is conducted by point-wise multiplication to predict the video attention. Furthermore, the proposed approach is validated on the two datasets: one is the public dataset MIT, the other is the dataset we collected, versus other four usual methods, and the experiment results show that our method achieves the state-of-the-art.

  相似文献   

3.

Saliency detection mimics the natural visual attention mechanism that identifies an imagery region to be salient when it attracts visual attention more than the background. This image analysis task covers many important applications in several fields such as military science, ocean research, resources exploration, disaster and land-use monitoring tasks. Despite hundreds of models have been proposed for saliency detection in colour images, there is still a large room for improving saliency detection performances in hyperspectral imaging analysis. In the present study, an ensemble learning methodology for saliency detection in hyperspectral imagery datasets is presented. It enhances saliency assignments yielded through a robust colour-based technique with new saliency information extracted by taking advantage of the abundance of spectral information on multiple hyperspectral images. The experiments performed with the proposed methodology provide encouraging results, also compared to several competitors.

  相似文献   

4.
目的 视觉显著性在众多视觉驱动的应用中具有重要作用,这些应用领域出现了从2维视觉到3维视觉的转换,从而基于RGB-D数据的显著性模型引起了广泛关注。与2维图像的显著性不同,RGB-D显著性包含了许多不同模态的线索。多模态线索之间存在互补和竞争关系,如何有效地利用和融合这些线索仍是一个挑战。传统的融合模型很难充分利用多模态线索之间的优势,因此研究了RGB-D显著性形成过程中多模态线索融合的问题。方法 提出了一种基于超像素下条件随机场的RGB-D显著性检测模型。提取不同模态的显著性线索,包括平面线索、深度线索和运动线索等。以超像素为单位建立条件随机场模型,联合多模态线索的影响和图像邻域显著值平滑约束,设计了一个全局能量函数作为模型的优化目标,刻画了多模态线索之间的相互作用机制。其中,多模态线索在能量函数中的权重因子由卷积神经网络学习得到。结果 实验在两个公开的RGB-D视频显著性数据集上与6种显著性检测方法进行了比较,所提模型在所有相关数据集和评价指标上都优于当前最先进的模型。相比于第2高的指标,所提模型的AUC(area under curve),sAUC(shuffled AUC),SIM(similarity),PCC(Pearson correlation coefficient)和NSS(normalized scanpath saliency)指标在IRCCyN数据集上分别提升了2.3%,2.3%,18.9%,21.6%和56.2%;在DML-iTrack-3D数据集上分别提升了2.0%,1.4%,29.1%,10.6%,23.3%。此外还进行了模型内部的比较,验证了所提融合方法优于其他传统融合方法。结论 本文提出的RGB-D显著性检测模型中的条件随机场和卷积神经网络充分利用了不同模态线索的优势,将它们有效融合,提升了显著性检测模型的性能,能在视觉驱动的应用领域发挥一定作用。  相似文献   

5.
目的 立体视频能提供身临其境的逼真感而越来越受到人们的喜爱,而视觉显著性检测可以自动预测、定位和挖掘重要视觉信息,可以帮助机器对海量多媒体信息进行有效筛选。为了提高立体视频中的显著区域检测性能,提出了一种融合双目多维感知特性的立体视频显著性检测模型。方法 从立体视频的空域、深度以及时域3个不同维度出发进行显著性计算。首先,基于图像的空间特征利用贝叶斯模型计算2D图像显著图;接着,根据双目感知特征获取立体视频图像的深度显著图;然后,利用Lucas-Kanade光流法计算帧间局部区域的运动特征,获取时域显著图;最后,将3种不同维度的显著图采用一种基于全局-区域差异度大小的融合方法进行相互融合,获得最终的立体视频显著区域分布模型。结果 在不同类型的立体视频序列中的实验结果表明,本文模型获得了80%的准确率和72%的召回率,且保持了相对较低的计算复杂度,优于现有的显著性检测模型。结论 本文的显著性检测模型能有效地获取立体视频中的显著区域,可应用于立体视频/图像编码、立体视频/图像质量评价等领域。  相似文献   

6.
7.
In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.  相似文献   

8.
This paper presents a computational method of feature evaluation for modeling saliency in visual scenes. This is highly relevant in visual search studies since visual saliency is at the basis of visual attention deployment. Visual saliency can also become important in computer vision applications as it can be used to reduce the computational requirements by permitting processing only in those regions of the scenes containing relevant information. The method is based on Bayesian theory to describe the interaction between top-down and bottom-up information. Unlike other approaches, it evaluates and selects visual features before saliency estimation. This can reduce the complexity and, potentially, improve the accuracy of the saliency computation. To this end, we present an algorithm for feature evaluation and selection. A two-color conjunction search experiment has been applied to illustrate the theoretical framework of the proposed model. The practical value of the method is demonstrated with video segmentation of instruments in a laparoscopic cholecystectomy operation.  相似文献   

9.
This paper presents a compressed-domain motion object extraction algorithm based on optical flow approximation for MPEG-2 video stream. The discrete cosine transform (DCT) coefficients of P and B frames are estimated to reconstruct DC + 2AC image using their motion vectors and the DCT coefficients in I frames, which can be directly extracted from MPEG-2 compressed domain. Initial optical flow is estimated with Black’s optical flow estimation framework, in which DC image is substituted by DC + 2AC image to provide more intensity information. A high confidence measure is exploited to generate dense and accurate motion vector field by removing noisy and false motion vectors. Global motion estimation and iterative rejection are further utilized to separate foreground and background motion vectors. Region growing with automatic seed selection is performed to extract accurate object boundary by motion consistency model. The object boundary is further refined by partially decoding the boundary blocks to improve the accuracy. Experimental results on several test sequences demonstrate that the proposed approach can achieve compressed-domain video object extraction for MPEG-2 video stream in CIF format with real-time performance.  相似文献   

10.
The predictions of 13 computational bottom-up saliency models and a newly introduced Multiscale Contrast Conspicuity(MCC) metric are compared with human visual conspicuity measurements. The agreement between human visual conspicuity estimates and model saliency predictions is quantified through their rank order correlation. The maximum of the computational saliency value over the target support area correlates most strongly with visual conspicuity for 12 of the 13 models. A simple multiscale contrast model and the MCC metric both yield the largest correlation with human visual target conspicuity (>0:84). Local image saliency largely determines human visual inspection and interpretation of static and dynamic scenes. Computational saliency models therefore have a wide range of important applications, like adaptive content delivery, region-of-interest-based image compression, video summarization, progressive image transmission, image segmentation, image quality assessment, object recognition, and content-aware image scaling. However, current bottom-up saliency models do not incorporate important visual effects like crowding and lateral interaction. Additional knowledge about the exact nature of the interactions between the mechanisms mediating human visual saliency is required to develop these models further. The MCC metric and its associated psychophysical saliency measurement procedure are useful tools to systematically investigate the relative contribution of different feature dimensions to overall visual target saliency.  相似文献   

11.
Zhu  Chunbiao  Li  Ge 《Multimedia Tools and Applications》2018,77(19):25181-25197

Saliency detection is an active topic in the multimedia field. Most previous works on saliency detection focus on 2D images. However, these methods are not robust against complex scenes which contain multiple objects or complex backgrounds. Recently, depth information supplies a powerful cue for saliency detection. In this paper, we propose a multilayer backpropagation saliency detection algorithm based on depth mining by which we exploit depth cue from three different layers of images. The proposed algorithm shows a good performance and maintains the robustness in complex situations. Experiments’ results show that the proposed framework is superior to other existing saliency approaches. Besides, we give two innovative applications by this algorithm, such as scene reconstruction from multiple images and small target object detection in video.

  相似文献   

12.
This paper presents a generic framework in which images are modelled as order-less sets of weighted visual features. Each visual feature is associated with a weight factor that may inform its relevance. This framework can be applied to various bag-of-features approaches such as the bag-of-visual-word or the Fisher kernel representations. We suggest that if dense sampling is used, different schemes to weight local features can be evaluated, leading to results that are often better than the combination of multiple sampling schemes, at a much lower computational cost, because the features are extracted only once. This allows our framework to be a test-bed for saliency estimation methods in image categorisation tasks. We explored two main possibilities for the estimation of local feature relevance. The first one is based on the use of saliency maps obtained from human feedback, either by gaze tracking or by mouse clicks. The method is able to profit from such maps, leading to a significant improvement in categorisation performance. The second possibility is based on automatic saliency estimation methods, including Itti & Koch’s method and SIFT’s DoG. We evaluated the proposed framework and saliency estimation methods using an in house dataset and the PASCAL VOC 2008/2007 dataset, showing that some of the saliency estimation methods lead to a significant performance improvement in comparison to the standard unweighted representation.  相似文献   

13.
This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.  相似文献   

14.
目的 视觉目标跟踪中,不同时刻的目标状态是利用在线学习的模板数据线性组合近似表示。由于跟踪中目标受到自身或场景中各种复杂干扰因素的影响,跟踪器的建模能力很大程度地依赖模板数据的概括性及其误差的估计精度。很多现有算法以向量形式表示样本信号,而改变其原始数据结构,使得样本数据各元素之间原有的自然关系受到严重破坏;此外,这种数据表述机制会提高数据的维度,而带来一定的计算复杂度和资源浪费。本文以多线性分析的角度更进一步深入研究视频跟踪中的数据表示及其建模机制,为其提供更加紧凑有效的解决方法。方法 本文跟踪框架中,候选样本及其重构信号以张量形式表示,从而保证其数据的原始结构。跟踪器输出候选样本外观状态时,以张量良好的多线性特性来组织跟踪系统的建模任务,利用张量核范数及L1范数正则化其目标函数的相关成分,在多任务状态学习假设下充分挖掘各候选样本外观表示任务的独立性及相互依赖关系。结果 用结构化张量表示的数据原型及其多任务观测模型能够较为有效地解决跟踪系统的数据表示及计算复杂度难题。同时,为候选样本外观模型的多任务联合学习提供更加简便有效的解决途径。这样,当跟踪器遇到破坏性较强的噪声干扰时,其张量核范数约束的误差估计机制在多任务联合学习框架下更加充分挖掘目标全面信息,使其更好地适应内在或外在因素所引起的视觉信息变化。在一些公认测试视频上的实验结果表明,本文算法在候选样本外观模型表示方面表现出更为鲁棒的性能。因而和一些优秀的同类算法相比,本文算法在各测试序列中跟踪到的目标图像块平均中心位置误差和平均重叠率分别达到4.2和0.82,体现出更好的跟踪精度。结论 大量实验验证本文算法的张量核范数回归模型及其误差估计机制能够构造出目标每一时刻状态更接近的最佳样本信号,在多任务学习框架下严格探测每一个候选样本的真实状态信息,从而较好地解决模型退化和跟踪漂移问题。  相似文献   

15.
融合对象性和视觉显著度的单目图像2D转3D   总被引:1,自引:0,他引:1       下载免费PDF全文
受对象性测度和视觉显著度的启发,提出一种适用于单目图像2D转3D的对象窗深度中心环绕分布假设,给出融合对象性测度和视觉显著度的单目图像深度估计算法。首先计算图像的视觉显著度并将其映射成深度;其次在图像上随机采样若干个窗,并计算这些窗的对象性测度;再次,定义一个能量函数用于度量深度和对象性测度对彼此的影响程度,并通过迭代优化的方法改进深度和对象性测度的估计结果;最后,根据深度信息进行3D视频合成。实验结果表明,融入对象性测度信息后,显著改进了基于视觉显著度2D转3D的深度估计质量,保证了估计深度在对象边界处的不连续过渡和其他区域的平滑过渡。  相似文献   

16.
A spatiotemporal saliency algorithm based on a center-surround framework is proposed. The algorithm is inspired by biological mechanisms of motion-based perceptual grouping and extends a discriminant formulation of center-surround saliency previously proposed for static imagery. Under this formulation, the saliency of a location is equated to the power of a predefined set of features to discriminate between the visual stimuli in a center and a surround window, centered at that location. The features are spatiotemporal video patches and are modeled as dynamic textures, to achieve a principled joint characterization of the spatial and temporal components of saliency. The combination of discriminant center-surround saliency with the modeling power of dynamic textures yields a robust, versatile, and fully unsupervised spatiotemporal saliency algorithm, applicable to scenes with highly dynamic backgrounds and moving cameras. The related problem of background subtraction is treated as the complement of saliency detection, by classifying nonsalient (with respect to appearance and motion dynamics) points in the visual field as background. The algorithm is tested for background subtraction on challenging sequences, and shown to substantially outperform various state-of-the-art techniques. Quantitatively, its average error rate is almost half that of the closest competitor.  相似文献   

17.
Visual saliency is an important research topic in the field of computer vision due to its numerous possible applications. It helps to focus on regions of interest instead of processing the whole image or video data. Detecting visual saliency in still images has been widely addressed in literature with several formulations. However, visual saliency detection in videos has attracted little attention, and is a more challenging task due to additional temporal information. A common approach for obtaining a spatio-temporal saliency map is to combine a static saliency map and a dynamic saliency map. In our work, we model the dynamic textures in a dynamic scene with local binary patterns to compute the dynamic saliency map, and we use color features to compute the static saliency map. Both saliency maps are computed using a bio-inspired mechanism of human visual system with a discriminant formulation known as center surround saliency, and are fused in a proper way. The proposed model has been extensively evaluated with diverse publicly available datasets which contain several videos of dynamic scenes, and comparison with state-of-the art methods shows that it achieves competitive results.  相似文献   

18.
The human vision has been studied deeply in the past years, and several different models have been proposed to simulate it on computer. Some of these models concerns visual saliency which is potentially very interesting in a lot of applications like robotics, image analysis, compression, video indexing. Unfortunately they are compute intensive with tight real-time requirements. Among all the existing models, we have chosen a spatio-temporal one combining static and dynamic information. We propose in this paper a very efficient implementation of this model with multi-GPU reaching real-time. We present the algorithms of the model as well as several parallel optimizations on GPU with higher precision and execution time results. The real-time execution of this multi-path model on multi-GPU makes it a powerful tool to facilitate many vision related applications.  相似文献   

19.
Zhou  Quan  Cheng  Jie  Lu  Huimin  Fan  Yawen  Zhang  Suofei  Wu  Xiaofu  Zheng  Baoyu  Ou  Weihua  Latecki  Longin Jan 《Multimedia Tools and Applications》2020,79(21-22):14419-14447

Visual saliency detection plays a significant role in the fields of computer vision. In this paper, we introduce a novel saliency detection method based on weighted linear multiple kernel learning (WLMKL) framework, which is able to adaptively combine different contrast measurements in a supervised manner. As most influential factor is contrast operation in bottom-up visual saliency, an average weighted corner-surround contrast (AWCSC) is first designed to measure local visual saliency. Combined with common-used center-surrounding contrast (CESC) and global contrast (GC), three types of contrast operations are fed into our WLMKL framework to produce the final saliency map. We show that the assigned weights for each contrast feature maps are always normalized in our WLMKL formulation. In addition, the proposed approach benefits from the advantages of the contribution of each individual contrast feature maps, yielding more robust and accurate saliency maps. We evaluated our method for two main visual saliency detection tasks: human fixed eye prediction and salient object detection. The extensive experimental results show the effectiveness of the proposed model, and demonstrate the integration is superior than individual subcomponent.

  相似文献   

20.
There are many “machine vision” models of the visual saliency mechanism, which controls the process of selecting and allocating attention to the most “prominent” locations in the scene and helps humans interact with the visual environment efficiently (Itti and C. Koch, 2001; Gao et al., 2000). It is important to know which models perform the best in mimicking the saliency mechanism of the human visual system. There are several metrics to compare saliency models; however, results from different metrics vary widely in evaluating models. In this paper, a procedure is proposed for evaluating metrics for comparing saliency maps using a database of human fixations on approximately 1000 images. This procedure is then employed to identify the best metric. This best metric is then used to evaluate ten published bottom-up saliency models. An optimized level of the blurriness and center-bias is found for each visual saliency model. Performance of the models is also analyzed on a dataset of 54 synthetic images.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号