首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Detection of salient objects in image and video is of great importance in many computer vision applications. In spite of the fact that the state of the art in saliency detection for still images has been changed substantially over the last few years, there have been few improvements in video saliency detection. This paper proposes a novel non-local fully convolutional network architecture for capturing global dependencies more efficiently and investigates the use of recently introduced non-local neural networks in video salient object detection. The effect of non-local operations is studied separately on static and dynamic saliency detection in order to exploit both appearance and motion features. A novel deep non-local fully convolutional network architecture is introduced for video salient object detection and tested on two well-known datasets DAVIS and FBMS. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods.  相似文献   

2.
Computer vision applications often need to process only a representative part of the visual input rather than the whole image/sequence. Considerable research has been carried out into salient region detection methods based either on models emulating human visual attention (VA) mechanisms or on computational approximations. Most of the proposed methods are bottom-up and their major goal is to filter out redundant visual information. In this paper, we propose and elaborate on a saliency detection model that treats a video sequence as a spatiotemporal volume and generates a local saliency measure for each visual unit (voxel). This computation involves an optimization process incorporating inter- and intra-feature competition at the voxel level. Perceptual decomposition of the input, spatiotemporal center-surround interactions and the integration of heterogeneous feature conspicuity values are described and an experimental framework for video classification is set up. This framework consists of a series of experiments that shows the effect of saliency in classification performance and let us draw conclusions on how well the detected salient regions represent the visual input. A comparison is attempted that shows the potential of the proposed method.  相似文献   

3.
We evaluate the applicability of a biologically-motivated algorithm to select visually-salient regions of interest in video streams for multiply-foveated video compression. Regions are selected based on a nonlinear integration of low-level visual cues, mimicking processing in primate occipital, and posterior parietal cortex. A dynamic foveation filter then blurs every frame, increasingly with distance from salient locations. Sixty-three variants of the algorithm (varying number and shape of virtual foveas, maximum blur, and saliency competition) are evaluated against an outdoor video scene, using MPEG-1 and constant-quality MPEG-4 (DivX) encoding. Additional compression radios of 1.1 to 8.5 are achieved by foveation. Two variants of the algorithm are validated against eye fixations recorded from four to six human observers on a heterogeneous collection of 50 video clips (over 45 000 frames in total). Significantly higher overlap than expected by chance is found between human and algorithmic foveations. With both variants, foveated clips are, on average, approximately half the size of unfoveated clips, for both MPEG-1 and MPEG-4. These results suggest a general-purpose usefulness of the algorithm in improving compression ratios of unconstrained video.  相似文献   

4.
We assess whether salient auditory events contained in soundtracks modify eye movements when exploring videos. In a previous study, we found that, on average, nonspatial sound contained in video soundtracks impacts on eye movements. This result indicates that sound could play a leading part in visual attention models to predict eye movements. In this research, we go further and test whether the effect of sound on eye movements is stronger just after salient auditory events. To automatically spot salient auditory events, we used two auditory saliency models: the discrete energy separation algorithm and the energy model. Both models provide a saliency time curve, based on the fusion of several elementary audio features. The most salient auditory events were extracted by thresholding these curves. We examined some eye movement parameters just after these events rather than on all the video frames. We showed that the effect of sound on eye movements (variability between eye positions, saccade amplitude, and fixation duration) was not stronger after salient auditory events than on average over entire videos. Thus, we suggest that sound could impact on visual exploration not only after salient events but in a more global way.  相似文献   

5.
The huge amount of video data on the internet requires efficient video browsing and retrieval strategies. One of the viable solutions is to provide summaries of the videos in the form of key frames. The video summarization using visual attention modeling has been used of late. In such schemes, the visually salient frames are extracted as key frames on the basis of theories of human attention modeling. The visual attention modeling schemes have proved to be effective in video summarization. However, the high computational costs incurred by these techniques limit their applicability in practical scenarios. In this context, this paper proposes an efficient visual attention model based key frame extraction method. The computational cost is reduced by using the temporal gradient based dynamic visual saliency detection instead of the traditional optical flow methods. Moreover, for static visual saliency, an effective method employing discrete cosine transform has been used. The static and dynamic visual attention measures are fused by using a non-linear weighted fusion method. The experimental results indicate that the proposed method is not only efficient, but also yields high quality video summaries.  相似文献   

6.
We propose a new statistical generative model for spatiotemporal video segmentation. The objective is to partition a video sequence into homogeneous segments that can be used as "building blocks" for semantic video segmentation. The baseline framework is a Gaussian mixture model (GMM)-based video modeling approach that involves a six-dimensional spatiotemporal feature space. Specifically, we introduce the concept of frame saliency to quantify the relevancy of a video frame to the GMM-based spatiotemporal video modeling. This helps us use a small set of salient frames to facilitate the model training by reducing data redundancy and irrelevance. A modified expectation maximization algorithm is developed for simultaneous GMM training and frame saliency estimation, and the frames with the highest saliency values are extracted to refine the GMM estimation for video segmentation. Moreover, it is interesting to find that frame saliency can imply some object behaviors. This makes the proposed method also applicable to other frame-related video analysis tasks, such as key-frame extraction, video skimming, etc. Experiments on real videos demonstrate the effectiveness and efficiency of the proposed method.  相似文献   

7.
Key frame extraction based on visual attention model   总被引:2,自引:0,他引:2  
Key frame extraction is an important technique in video summarization, browsing, searching and understanding. In this paper, we propose a novel approach to extract the most attractive key frames by using a saliency-based visual attention model that bridges the gap between semantic interpretation of the video and low-level features. First, dynamic and static conspicuity maps are constructed based on motion, color and texture features. Then, by introducing suppression factor and motion priority schemes, the conspicuity maps are fused into a saliency map that includes only true attention regions to produce attention curve. Finally, after time-constraint cluster algorithm grouping frames with similar content, the frames with maximum saliency value are selected as key-frames. Experimental results demonstrate the effectiveness of our approach for video summarization by retrieving the meaningful key frames.  相似文献   

8.
With the fast evolution of digital video, research and development of new technologies are greatly needed to lower the cost of video archiving, cataloging and indexing, as well as improve the efficiency and accessibility of stored video sequences. A number of methods to respectively meet these requirements have been researched and proposed. As one of the most important research topics, video abstraction helps to enable us to quickly browse a large video database and to achieve efficient content access and representation. In this paper, a video abstraction algorithm based on the visual attention model and online clustering is proposed. First, shot boundaries are detected and key frames in each shot are extracted so that consecutive key frames in a shot have the same distance. Second, the spatial saliency map indicating the saliency value of each region of the image is generated from each key frame and regions of interest (ROI) is extracted according to the saliency map. Third, key frames, as well as their corresponding saliency map, are passed to a specific filter, and several thresholds are used so that the key frames containing less information are discarded. Finally, key frames are clustered using an online clustering method based on the features in ROIs. Experimental results demonstrate the performance and effectiveness of the proposed video abstraction algorithm.  相似文献   

9.
As human vision system is highly sensitive to motion present in a scene, motion saliency forms an important feature in a video sequence. Motion information is used for video compression, object segmentation, object tracking and in many other applications. Though its applications are extensive, accurate detection of motion in a given video is complex and computationally expensive for the solutions reported in the literature. Decomposing a video into visually similar and residual videos is a robust way to detect motion salient regions. The existing decomposition techniques require large execution time as the standard form of the problem is NP-hard. We propose a novel algorithm which detects the motion salient regions by decomposing the input video into background and residual videos in much lesser time without sacrificing the accuracy of the decomposition. In addition, the proposed algorithm is completely parallelizable that ensures further reduction in computational time with the use of advanced multicore processors.  相似文献   

10.
Saliency detection has gained popularity in many applications, and many different approaches have been proposed. In this paper, we propose a new approach based on singular value decomposition (SVD) for saliency detection. Our algorithm considers both the human-perception mechanism and the relationship between the singular values of an image decomposed by SVD and its salient regions. The key concept of our proposed algorithms is based on the fact that salient regions are the important parts of an image. The singular values of an image are divided into three groups: large, intermediate, and small singular values. We propose the hypotheses that the large singular values mainly contain information about the non-salient background and slight information about the salient regions, while the intermediate singular values contain most or even all of the saliency information. The small singular values contain little or even none of the saliency information. These hypotheses are validated by experiments. By regularization based on the average information, regularization using the leading largest singular values or regularization based on machine learning, the salient regions will become more conspicuous. In our proposed approach, learning-based methods are proposed to improve the accuracy of detecting salient regions in images. Gaussian filters are also employed to enhance the saliency information. Experimental results prove that our methods based on SVD achieve superior performance compared to other state-of-the-art methods for human-eye fixations, as well as salient-object detection, in terms of the area under the receiver operating characteristic (ROC) curve (AUC) score, the linear correlation coefficient (CC) score, the normalized scan-path saliency (NSS) score, the F-measure score, and visual quality.  相似文献   

11.
In this paper we describe a novel neural network technique for video compression, using a “point-process” type neural network model we have developed which is closer to biophysical reality and is mathematically much more tractable than standard models. Our algorithm uses an adaptive approach based upon the users' desired video quality Q, and achieves compression ratios of up to 500:1 for moving gray-scale images, based on a combination of motion detection, compression, and temporal subsampling of frames. This leads to a compression ratio of over 1000:1 for full-color video sequences with the addition of the standard 4:1:1 spatial subsampling ratios in the chrominance images. The signal-to-noise ratio ranges from 29 dB to over 34 dB. Compression is performed using a combination of motion detection, neural networks, and temporal subsampling of frames. A set of neural networks is used to adaptively select the desired compression of each picture block as a function of the reconstruction quality. The motion detection process separates out regions of the frame which need to be retransmitted. Temporal subsampling of frames, along with reconstruction techniques, lead to the high compression ratios  相似文献   

12.
Saliency detection is widely used to pick out relevant parts of a scene as visual attention regions for various image/video applications. Since video is increasingly being captured, moved and stored in compressed form, there is a need for detecting video saliency directly in compressed domain. In this study, a compressed video saliency detection algorithm is proposed based on discrete cosine transformation (DCT) coefficients and motion information within a visual window. Firstly, DCT coefficients and motion information are extracted from H.264 video bitstream without full decoding. Due to a high quantization parameter setting in encoder, skip/intra is easily chosen as the best prediction mode, resulting in a large number of blocks with zero motion vector and no residual existing in video bitstream. To address these problems, the motion vectors of skip/intra coded blocks are calculated by interpolating its surroundings. In addition, a visual window is constructed to enhance the contrast of features and to avoid being affected by encoder. Secondly, after spatial and temporal saliency maps being generated by the normalized entropy, a motion importance factor is imposed to refine the temporal saliency map. Finally, a variance-like fusion method is proposed to dynamically combine these maps to yield the final video saliency map. Experimental results show that the proposed approach significantly outperforms other state-of-the-art video saliency detection models.  相似文献   

13.
显著区域检测可应用在对象识别、图像分割、视 频/图像压缩中,是计算机视觉领域的重要研究主题。然而,基于不 同视觉显著特征的显著区域检测法常常不能准确地探测出显著对象且计算费时。近来,卷积 神经网络模型在图像分析和处理 领域取得了极大成功。为提高图像显著区域检测性能,本文提出了一种基于监督式生成对抗 网络的图像显著性检测方法。它 利用深度卷积神经网络构建监督式生成对抗网络,经生成器网络与鉴别器网络的不断相互对 抗训练,使卷积网络准确学习到 图像显著区域的特征,进而使生成器输出精确的显著对象分布图。同时,本文将网络自身误 差和生成器输出与真值图间的 L1距离相结合,来定义监督式生成对抗网络的损失函数,提升了显著区域检测精度。在MSRA 10K与ECSSD数据库上的实 验结果表明,本文方法 分别获得了94.19%与96.24%的准确率和93.99%与90.13%的召回率,F -Measure值也高达94.15%与94.76%,优于先 前常用的显著性检测模型。  相似文献   

14.
This paper addresses a novel approach to automatically extract video salient objects based on visual attention mechanism and seeded object growing technique. First, a dynamic visual attention model to capture the object motions by global motion estimation and compensation is constructed. Through combining it with a static attention model, a saliency map is formed. Then, with a modified inhibition of return (MIOR) strategy, the winner-take-all (WTA) neural network is used to scan the saliency map for the most salient locations selected as attention seeds. Lastly, the particle swarm optimization (PSO) algorithm is employed to grow the attention objects modeled by Markov random field (MRF) from the seeds. Experiments verify that our presented approach could extract both of stationary and moving salient objects efficiently.  相似文献   

15.
State of the art methods for video resizing usually produce perceivable visual discontinuities. Therefore, how to preserve the visual continuity in video frames is one of the most critical issues. In this paper, we propose a novel approach for modeling dynamic visual attention based on spatiotemporal analysis in order to detect the focus of interest automatically. The continuously varied co-sited blocks in a video cube are first detected and their variations are characterized as visual cubes, which are further employed to determine a proper extent of salient regions in video frames. Once the proper extent through video cubes is determined, the resizing process then can be conducted to find the global optimum. Our experiment shows that the proposed content-aware video resizing based on spatiotemporal visual cubes can effectively generate resized videos while keeping their isotropic manipulation and the continuous dynamics of visual perception.  相似文献   

16.
梁永生  柳伟  周莺  魏泽锋  张基宏 《电子学报》2017,45(7):1567-1575
为了有效解决视频流媒体传输网络带宽、播出视频质量和用户实时性访问之间的矛盾,本文提出了一种基于视觉显著计算的视频流媒体渐进式表达方法.在视频内容分析和理解的基础上,首先进行场景分类和视觉敏感区域提取;然后根据编码信息确定视频序列中各帧的重要性,估计帧内片层数据重要性;最后基于视觉显著计算的结果提出一种适应网络带宽和质量可伸缩的视频流媒体渐进式表达方法.采用中粒度质量可伸缩(MGS)编码,在模拟网络测试平台上分别针对集中式和分散式视觉敏感区域视频序列进行实验研究,实验结果验证了本文提出的基于视觉显著计算的视频流媒体渐进式表达方法的正确性和有效性.  相似文献   

17.
This paper proposes a technique for generating the quantization values for 3D-DCT coefficients. The distribution of AC coefficients inside a transform cube is characterized by two regions, theshifted complement hyperboloidand theshifted hyperboloid,which capture the dominant and the less significant coefficients, respectively. An exponential function is used to determine the appropriate quantization values for the two regions. A quantization volume for the 3D-DCT is generated by using the function. The paper also describes a novel procedure for deriving the scan order for the quantized 3D-DCT coefficients. The proposed quantization volume has been tested on various standard test video sequences. The experiments show that the 3D-DCT video compression using the proposed quantization values produce high compression ratios with good visual quality for the reconstructed video frames. If desired, the parameter settings of the function can be further tuned for better visual quality. The proposed scan order was also found to be superior, in terms of compression ratio, to the 3D zig zag approach, which is an extension of the traditional 2D zig zag.  相似文献   

18.
In this paper, we present an approach for medical ultrasound (US) image enhancement. It is based on a novel perceptual saliency measure which favors smooth, long curves with constant curvature. The perceptual salient boundaries of tissues in US images are enhanced by computing the saliency of directional vectors in the image space, via a local searching algorithm. Our measure is generally determined by curvature changes, intensity gradient and the interaction of neighboring vectors. To restrain speckle noise during the enhancement process, an adaptive speckle suspension term is also combined into the proposed saliency measure. The results obtained on both simulated images and medical US data reveal superior performance of the novel approach over a number of commonly used speckle filters. Applications of US image segmentation show that although the proposed algorithm cannot remove the speckle noise completely and may discard weak anatomical structures in some case, it still provides a considerable gain to US image processing for computer-aided diagnosis.  相似文献   

19.
Many salient object detection approaches share the common drawback that they cannot uniformly highlight heterogeneous regions of salient objects, and thus, parts of the salient objects are not discriminated from background regions in a saliency map. In this paper, we focus on this drawback and accordingly propose a novel algorithm that more uniformly highlights the entire salient object as compared to many approaches. Our method consists of two stages: boosting the object-level distinctiveness and saliency refinement. In the first stage, a coarse object-level saliency map is generated based on boosting the distinctiveness of the object proposals in the test images, using a set of object-level features and the Modest AdaBoost algorithm. In the second stage, several saliency refinement steps are executed to obtain a final saliency map in which the boundaries of salient objects are preserved. Quantitative and qualitative comparisons with state-of-the-art approaches demonstrate the superior performance of our approach.  相似文献   

20.
Salient object detection is essential for applications, such as image classification, object recognition and image retrieval. In this paper, we design a new approach to detect salient objects from an image by describing what does salient objects and backgrounds look like using statistic of the image. First, we introduce a saliency driven clustering method to reveal distinct visual patterns of images by generating image clusters. The Gaussian Mixture Model (GMM) is applied to represent the statistic of each cluster, which is used to compute the color spatial distribution. Second, three kinds of regional saliency measures, i.e, regional color contrast saliency, regional boundary prior saliency and regional color spatial distribution, are computed and combined. Then, a region selection strategy integrating color contrast prior, boundary prior and visual patterns information of images is presented. The pixels of an image are divided into either potential salient region or background region adaptively based on the combined regional saliency measures. Finally, a Bayesian framework is employed to compute the saliency value for each pixel taking the regional saliency values as priority. Our approach has been extensively evaluated on two popular image databases. Experimental results show that our approach can achieve considerable performance improvement in terms of commonly adopted performance measures in salient object detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号