共查询到20条相似文献,搜索用时 31 毫秒
1.
A novel on-line video object segmentation scheme based on illumination-invariant color-texture feature extraction and marker prediction is proposed in this paper. First, the location of the object of interest is initialized based on user-specified markers. Superpixels are generated in the next available frame of the input video to extract the illumination-invariant color-texture features of the object of interest. The proposed object marker prediction scheme consists of estimating the user-specified markers and locating the object of interest in the next available frame via superpixel motion prediction using illumination-invariant optical flow, marker superpixel candidate generation using short-term superpixel affinity, and maximum likelihood computation using long-term superpixel affinity. The experimental results obtained when the proposed method is applied to several challenging video clips demonstrate that the proposed approach is competitive with several other state-of-the-art methods, especially when the illumination and object motion change dramatically. 相似文献
2.
This paper presents a novel method of key-frame selection for video summarization based on multidimensional time series analysis. In the proposed scheme, the given video is first segmented into a set of sequential clips containing a number of similar frames. Then the key frames are selected by a clustering procedure as the frames closest to the cluster centres in each resulting video clip. The proposed algorithm is implemented experimentally on a wide range of testing data, and compared with state-of-the-art approaches in the literature, which demonstrates excellent performance and outperforms existing methods on frame selection in terms of fidelity-based metric and subjective perception. 相似文献
3.
4.
5.
O. Le Meur A. Ninassi P. Le Callet D. Barba 《Signal Processing: Image Communication》2010,25(8):597-609
The visual attention deployment in a visual scene is contingent upon a number of factors. The relationship between the observer's attention and the visual quality of the scene is investigated in this paper: can a video artifact disturb the observer's attention? To answer this question, two experiments have been conducted. First, eye-movements of human observers were recorded, while they watched ten video clips of natural scenes under a free-viewing task. These clips were more or less impaired by a video encoding scheme (H.264/AVC). The second experiment relies on the subjective rating of the quality of the video clips. A quality score was then assigned to each clip, indicating the extent to which the impairments were visible. The standardized method double stimulus impairment scale (DSIS) was used, meaning that each observer viewed the original clip followed by its impaired version. Both experimental results have conjointly been analyzed. Our results suggest that video artifacts have no influence on the deployment of visual attention, even though these artifacts have been judged by observers as at least annoying. 相似文献
6.
We address the problem of learning representations from the videos without manual annotation. Different video clips sampled from the same video usually have a similar background and consistent motion. A novel self-supervised task is designed to learn such temporal coherence, which is measured by the mutual information in our work. First, we maximize the mutual information between features extracted from the clips which are sampled from the same video. This encourages the network to learn the shared content by these clips. As a result, the network may focus on the background and ignore the motion in videos due to that different clips from the same video normally have the same background. Second, to address this issue, we simultaneously maximize the mutual information between the feature of the video clip and the local regions where salient motion exists. Our approach, which is referred to as Deep Video Infomax (DVIM), strikes a balance between the background and the motion when learning the temporal coherence. We conduct extensive experiments to test the performance of the proposed DVIM on various tasks. Experimental results of fine-tuning for the high-level action recognition problems validate the effectiveness of the learned representations. Additional experiments for the task of action similarity labeling also demonstrate the generalization of the learned representations of the DVIM. 相似文献
7.
Varatkar G.V. Marculescu R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(1):108-119
The objective of this paper is to introduce self-similarity as a fundamental property exhibited by the bursty traffic between on-chip modules in typical MPEG-2 video applications. Statistical tests performed on relevant traces extracted from common video clips establish unequivocally the existence of self-similarity in video traffic. Using a generic tile-based communication architecture, we discuss the implications of our findings on on-chip buffer space allocation and present quantitative evaluations for typical video streams. We also describe a technique for synthetically generating traces having statistical properties similar to those obtained from real video clips. Our proposed technique speeds up buffer simulations, allows media system designers to explore architectures rapidly and use large media data benchmarks more efficiently. We believe that our findings open new directions of research with deep implications on some fundamental issues in on-chip networks design for multimedia applications. 相似文献
8.
9.
10.
A video signature is a set of feature vectors that compactly represents and uniquely characterizes one video clip from another for fast matching. To find a short duplicated region, the video signature must be robust against common video modifications and have a high discriminability. The matching method must be fast and be successful at finding locations. In this paper, a frame‐based video signature that uses the spatial information and a two‐stage matching method is presented. The proposed method is pair‐wise independent and is robust against common video modifications. The proposed two‐stage matching method is fast and works very well in finding locations. In addition, the proposed matching structure and strategy can distinguish a case in which a part of the query video matches a part of the target video. The proposed method is verified using video modified by the VCE7 experimental conditions found in MPEG‐7. The proposed video signature method achieves a robustness of 88.7% under an independence condition of 5 parts per million with over 1,000 clips being matched per second. 相似文献
11.
基于“bag of words”的视频匹配方法 总被引:3,自引:0,他引:3
提出了一种利用“bag of words”模型对视频内容进行建模和匹配的方法。通过量化视频帧的局部特征构建视觉关键词(visual words)辞典,将视频的子镜头表示成若干视觉关键词的集合。在此基础上构建基于子镜头的视觉关键词词组的倒排索引,用于视频片段的匹配和检索。这种方法保留了局部特征的显著性及其相对位置关系,而且有效地压缩了视频的表达,加速的视频的匹配和检索过程。实验结果表明,和已有方法相比,基于“bag of words”的视频匹配方法在大视频样本库上获得了更高的检索精度和检索速度。 相似文献
12.
13.
一种基于粗集理论的视频流派分类方法 总被引:1,自引:0,他引:1
视频分类提供了一种管理和利用视频数据的有效手段.现有的视频流派分类方法倾向于使用尽量多的特征,借此更有效地表示视频内容,以保证分类的效果,但提取这些视频特征的代价通常都很高,因此有必要考虑流派分类中的特征选择问题.提出了一种基于粗集理论的方法实现视频特征的选择和流派的分类:通过分析相关文献中所使用的各种特征.提取了多种有效特征构成分类的基础;基于启发式搜索的方法用于发现最优约简,从而实现特征选择;通过约简导出的分类规则实现流派标识的确定.与已有方法分类效果的比较以及与决策树方法的实验对比表明了文中方法的有效性. 相似文献
14.
A new shape-sequence-based algorithm that can effectively generate key images from video clips is introduced. Generated key images can be used as the feature information in the browsing and retrieval of video clips from a multimedia database. Experiments with MPEG-7 data sets were performed, and the results are compared with existing methods 相似文献
15.
结合不同图像帧的重要性差异及视频数据分割原则,提出一种基于帧间层次的视频码流划分方案,对划分所得各类信息进行了不平等差错保护仿真实验,仿真结果表明,所采用的层次划分方式相比传统的基于图像帧的划分方式经不等差错保护后,视频恢复效果具有一定的优越性。 相似文献
16.
17.
提出一种基于数值比特高低位的视频码流划分方式。根据视频码流中信息位置和重要性的不同,对视频序列进行归类,对划分所得的各类信息进行不等差错保护仿真实验,经仿真表明,基于数值比特高低位的视频码流划分方式,相比层次划分及传统图像帧划分方式,经不等差错保护后有更好的视频恢复效果 相似文献
18.
Ying Chen Zhang J.A. Jayalath A.D.S. 《Wireless Communications, IEEE Transactions on》2010,9(2):523-527
We propose an efficient and low-complexity scheme for estimating and compensating clipping noise in OFDMA systems. Conventional clipping noise estimation schemes, which need all demodulated data symbols, may become infeasible in OFDMA systems where a specific user may only know his own modulation scheme. The proposed scheme first uses equalized output to identify a limited number of candidate clips, and then exploits the information on known subcarriers to reconstruct clipped signal. Simulation results show that the proposed scheme can significantly improve the system performance. 相似文献
19.
20.
In this paper, a parametric model is proposed which provides estimation for the perceived quality of video, coded with different
codecs, at any bit rate and display format. The video quality metric used is one of the standardized Full Reference models
in Recommendations ITU-T J.144 and ITU-R BT.1683. The proposed model is based on the video quality estimation described in
Recommendation ITU-T G.1070, but incorporates different enhancements, allowing a much better estimation of the perceptual
MOS values, especially in low bit rate ranges. The error obtained with the proposed model, regarding to the ITU models, is
between the ITU algorithms error margins, according to the subjective tests developed by the VQEG. Studies were made for more
than 1500 processed video clips, coded in MPEG-2 and H.264/AVC, in bit rate ranges from 50 kb/s to 12 Mb/s, in SD, VGA, CIF
and QCIF display formats. 相似文献