首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, a unified and adaptive web video thumbnail recommendation framework is proposed, which recommends thumbnails both for video owners and browsers on the basis of image quality assessment, image accessibility analysis, video content representativeness analysis and query-sensitive matching. At the very start, video shot detection is performed and the highest image quality video frame is extracted as the key frame for each shot on the basis of our proposed image quality assessment method. These key frames are utilized as the thumbnail candidates for the following processes. In the image quality assessment, the normalized variance autofocusing function is employed to evaluate the image blur and ensures that the selected video thumbnail candidates are clear and have high image quality. For accessibility analysis, color moment, visual salience and texture are used with a support vector regression model to predict the candidates’ accessibility score, which ensures that the recommended thumbnail’s ROIs are big enough and it is very accessible for users. For content representativeness analysis, the mutual reinforcement algorithm is adopted in the entire video to obtain the candidates’ representativeness score, which ensures that the final thumbnail is representative enough for users to catch the main video contents at a glance. Considering browsers’ query intent, a relevant model is designed to recommend more personalized thumbnails for certain browsers. Finally, by flexibly fusing the above analysis results, the final adaptive recommendation work is accomplished. Experimental results and subjective evaluations demonstrate the effectiveness of the proposed approach. Compared with the existing web video thumbnail generation methods, the thumbnails for video owners not only reflect the contents of the video better, but also make users feel more comfortable. The thumbnails for video browsers directly reflect their preference, which greatly enhances their user experience.  相似文献   

2.
Recognizing scene information in images or has attracted much attention in computer vision or videos, such as locating the objects and answering "Where am research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.  相似文献   

3.
基于帧数据量波动特性的压缩域视频快速检索方法   总被引:1,自引:0,他引:1  
为实现压缩域视频快速检索,提出基于帧数据量波动特性的检索方法。该方法首先计算压缩域各图像帧的数据量,得出查询片段和目标视频等长内的数据量曲线,然后在I帧对齐的基础上将查询片段在目标视频上进行滑动,滑动窗长为单个图组长度。再在每次滑动后计算查询片段与目标视频数据量曲线波动的差异程度,同时每次滑动后要更新目标视频的数据量曲线。最后结合设定门限进行相似判决并返回结果。该方法不需要为每一帧抽取高维特征向量,用一个向量而不是一组高维向量来表述一段视频。实验结果表明,相比现有快速检索算法,该方法使检索速度得到提高,同时也能达到较高的准确率。 另外,该方法既可用于基于压缩域视频库的快速检索,也可用于在线的视频片段匹配,实时发现与设定目标相似的视频。  相似文献   

4.
Video summarization via exploring the global and local importance   总被引:1,自引:0,他引:1  
Video Summarization is to generate an important or interesting short video from a long video. It is important to reduce the time required to analyze the same archived video by removing unnecessary video data. This work proposes a novel method to generate dynamic video summarization by fusing the global importance and local importance based on multiple features and image quality. First, videos are split into several suitable video clips. Second, video frames are extracted from each video clip, and the center parts of frames are also extracted. Third, for each frame and the center part, the global importance and the local importance are calculated by using a set of features and image quality. Finally, the global importance and the local importance are fused to select an optimal subset for generating video summarization. Extensive experiments are conducted to demonstrate that the proposed method enables to generate high-quality video summarization.  相似文献   

5.
提出一种基于鲁棒Hash的视频拷贝检测方案.通过对特征点进行分类,选取在时空域上持久存在的稳定点,对邻域点进行微分计算构造局部特征.将多维特征数据进行Hilbert编码,并选取有效位作为检测Hash码.为了准确的在目标视频中定位可疑内容,提出了Hash匹配方案,将序列相似度作为匹配的依据,提高匹配精度.实验结果表明本方案拥有较好检测性能,适用于视频内容的拷贝检测.  相似文献   

6.
In this paper we study effective approaches to create thumbnails from input images. Since a thumbnail will eventually be presented to and perceived by a human visual system, a thumbnailing algorithm should consider several important issues in the process including thumbnail scale, object completeness and local structure smoothness. To address these issues, we propose a new thumbnailing framework named scale and object aware thumbnailing (SOAT), which contains two components focusing respectively on saliency measure and thumbnail warping/cropping. The first component, named scale and object aware saliency (SOAS), models the human perception of thumbnails using visual acuity theory, which takes thumbnail scale into consideration. In addition, the “objectness” measurement (Alexe et al. 2012) is integrated in SOAS, as to preserve object completeness. The second component uses SOAS to guide the thumbnailing based on either retargeting or cropping. The retargeting version uses the thin-plate-spline (TPS) warping for preserving structure smoothness. An extended seam carving algorithm is developed to sample control points used for TPS model estimation. The cropping version searches a cropping window that balances the spatial efficiency and SOAS-based content preservation. The proposed algorithms were evaluated in three experiments: a quantitative user study to evaluate thumbnail browsing efficiency, a quantitative user study for subject preference, and a qualitative study on the RetargetMe dataset. In all studies, SOAT demonstrated promising performances in comparison with state-of-the-art algorithms.  相似文献   

7.
8.
基于局部排序的视频拷贝检测   总被引:2,自引:0,他引:2  
排序法是一种常用的视频拷贝检测方法.为获得更佳的检测性能,提出一种基于排序特征的视频拷贝检测方案.该方案将帧进行分块,并按照Hilbert曲线顺序分别计算曲线上相邻块的灰度关系排序特征,生成用于检测的哈希码;为了准确地在目标视频中定位可疑内容,提H{了哈希匹配方案,将序列相似度作为匹配的依据,并引入动态规划的方法提高匹配精度;最后构造了拷贝测试样本,并与传统的排序签名检测方案进行性能对比实验.结果表明,文中方案拥有较好的检测性能,适用于视频内容的拷贝检测.  相似文献   

9.
An efficient video retrieval system is essential to search relevant video contents from a large set of video clips, which typically contain several heterogeneous video clips to match with. In this paper, we introduce a content-based video matching system that finds the most relevant video segments from video database for a given query video clip. Finding relevant video clips is not a trivial task, because objects in a video clip can constantly move over time. To perform this task efficiently, we propose a novel video matching called Spatio-Temporal Pyramid Matching (STPM). Considering features of objects in 2D space and time, STPM recursively divides a video clip into a 3D spatio-temporal pyramidal space and compares the features in different resolutions. In order to improve the retrieval performance, we consider both static and dynamic features of objects. We also provide a sufficient condition in which the matching can get the additional benefit from temporal information. The experimental results show that our STPM performs better than the other video matching methods.  相似文献   

10.
目的深度伪造是新兴的一种使用深度学习手段对图像和视频进行篡改的技术,其中针对人脸视频进行的篡改对社会和个人有着巨大的威胁。目前,利用时序或多帧信息的检测方法仍处于初级研究阶段,同时现有工作往往忽视了从视频中提取帧的方式对检测的意义和效率的问题。针对人脸交换篡改视频提出了一个在多个关键帧中进行帧上特征提取与帧间交互的高效检测框架。方法从视频流直接提取一定数量的关键帧,避免了帧间解码的过程;使用卷积神经网络将样本中单帧人脸图像映射到统一的特征空间;利用多层基于自注意力机制的编码单元与线性和非线性的变换,使得每帧特征能够聚合其他帧的信息进行学习与更新,并提取篡改帧图像在特征空间中的异常信息;使用额外的指示器聚合全局信息,作出最终的检测判决。结果所提框架在FaceForensics++的3个人脸交换数据集上的检测准确率均达到96.79%以上;在Celeb-DF数据集的识别准确率达到了99.61%。在检测耗时上的对比实验也证实了使用关键帧作为样本对检测效率的提升以及本文所提检测框架的高效性。结论本文所提出的针对人脸交换篡改视频的检测框架通过提取关键帧减少视频级检测中的计算成本和时间消耗,使用卷积...  相似文献   

11.
随着移动终端深入人们的生活,移动社交APP得到了广泛使用。在移动社交APP中往往会使用大量的图片资源,如微信朋友圈、Instagram的图片分享等。在APP中浏览图片会消耗较多的网络流量,影响加载速度,因此大部分APP采用首先显示缩略图,根据用户需求再加载原图的策略。在服务器端也采用缓存技术来加快缩略图产生时间,减少磁盘I/O。但是,当前的缓存机制更多关注的是缓存的访问频率、最近访问时间等因素,并没有过多关注数据生成用户之间的社交关系,也没有考虑移动用户对缩略图和原图的不同访问模式。把缓存划分为两个部分:缩略图缓存区和原图缓存区,提出了基于社交关系的图片缓存替换算法,在传统缓存替换算法的基础上增加用户的社交关系以及缩略图和原图的关联关系,通过计算图片的缓存价值进行缓存替换。实验表明,所提出的基于社交关系的图片缓存替换算法对于缩略图和原图的缓存命中率都有明显提高。  相似文献   

12.
We present a method for the efficient retrieval and browsing of immense amounts of realistic 3D human body motion capture data. The proposed method organizes motion capture data based on statistical K-means (SK–means), democratic decision making, unsupervised learning, and visual key frame extraction, thus achieving intuitive retrieval by browsing thumbnails of semantic key frames. We apply three steps for the efficient retrieval of motion capture data. The first is obtaining the basic type clusters by clustering motion capture data using the novel SK-means algorithm, and after which, immediately performing character matching. The second is learning the retrieval information of users during the retrieval process and updating the successful retrieval rate of each data; the search results are then ranked on the basis of successful retrieval rate by democratic decision making to improve accuracy. The last step is generating thumbnails with semantic generalization, which is conducted by using a novel key frame extraction algorithm based on visualized data analysis. The experiment demonstrates that this method can be utilised for the efficient organization and retrieval of enormous motion capture data.  相似文献   

13.
Query by video clip   总被引:15,自引:0,他引:15  
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.  相似文献   

14.
关键帧获取是视频内容分析的前提。目前的视频关键帧提取算法往往需要经过较多的计算才能确定,不适合海量视频数据处理的需求。面对互联网数据流的监控应用,分析了MPEG压缩视频流的特点,提出了一种新的关键帧快速抽取方法。该方法考虑了所抽取关键帧的覆盖面和视频动态性检测的需要,根据视频长度抽取多段关键帧,段首帧反馈定位,段内按稀疏系数抽取。通过视频库和IDC机房网络数据流的检测实验表明,提出的方法是快速有效的,能较好地应用于高速网络的视频监控中。  相似文献   

15.
石念峰  侯小静  张平 《计算机应用》2017,37(9):2605-2609
为提高运动视频关键帧的运动表达能力和压缩率,提出柔性姿态估计和时空特征嵌入结合的运动视频关键帧提取技术。首先,利用人体动作的时间连续性保持建立具有时间约束限制的柔性部件铰接人体(ST-FMP)模型,通过非确定性人体部位动作连续性约束,采用N-best算法估计单帧图像中的人体姿态参数;接着,采用人体部位的相对位置和运动方向描述人体运动特征,通过拉普拉斯分值法实施数据降维,获得局部拓扑结构表达能力强的判别性人体运动特征向量;最后,采用迭代自组织数据分析技术(ISODATA)算法动态地确定关键帧。在健美操动作视频关键帧提取实验中,ST-FMP模型将柔性混合铰接人体模型(FMP)的非确定性人体部位的识别准确率提高约15个百分点,取得了81%的关键帧提取准确率,优于KFE和运动块的关键帧算法。所提算法对人体运动特征和人体姿态敏感,适用于运动视频批注审阅。  相似文献   

16.
目的 相比于静态人脸表情图像识别,视频序列中的各帧人脸表情强度差异较大,并且含有中性表情的帧数较多,然而现有模型无法为视频序列中每帧图像分配合适的权重。为了充分利用视频序列中的时空维度信息和不同帧图像对视频表情识别的作用力差异特点,本文提出一种基于Transformer的视频序列表情识别方法。方法首先,将一个视频序列分成含有固定帧数的短视频片段,并采用深度残差网络对视频片段中的每帧图像学习出高层次的人脸表情特征,从而生成一个固定维度的视频片段空间特征。然后,通过设计合适的长短时记忆网络(long short-term memory network,LSTM)和Transformer模型分别从该视频片段空间特征序列中进一步学习出高层次的时间维度特征和注意力特征,并进行级联输入到全连接层,从而输出该视频片段的表情分类分数值。最后,将一个视频所有片段的表情分类分数值进行最大池化,实现该视频的最终表情分类任务。结果 在公开的BAUM-1s(Bahcesehir University multimodal)和RML(Ryerson Multimedia Lab)视频情感数据集上的试验结果表明,该...  相似文献   

17.
关键帧提取是基于内容的视频检索中的重要一步,为了能够有效地提取出不同类型视频的关键帧,提出一种基于粒子群的关键帧提取算法。该方法首先提取出视频中每帧的全局运动和局部运动特征,然后通过粒子群算法自适应地提取视频关键帧。实验结果表明,采用该算法对不同类型的视频提取出的关键帧具有较好的代表性。  相似文献   

18.
Motion, as a feature of video that changes in temporal sequences, is crucial to visual understanding. The powerful video representation and extraction models are typically able to focus attention on motion features in challenging dynamic environments to complete more complex video understanding tasks. However, previous approaches discriminate mainly based on similar features in the spatial or temporal domain, ignoring the interdependence of consecutive video frames. In this paper, we propose the motion sensitive self-supervised collaborative network, a video representation learning framework that exploits a pretext task to assist feature comparison and strengthen the spatiotemporal discrimination power of the model. Specifically, we first propose the motion-aware module, which extracts consecutive motion features from the spatial regions by frame difference. The global–local contrastive module is then introduced, with context and enhanced video snippets being defined as appropriate positive samples for a broader feature similarity comparison. Finally, we introduce the snippet operation prediction module, which further assists contrastive learning to obtain more reliable global semantics by sensing changes in continuous frame features. Experimental results demonstrate that our work can effectively extract robust motion features and achieve competitive performance compared with other state-of-the-art self-supervised methods on downstream action recognition and video retrieval tasks.  相似文献   

19.
当前对视频的分析通常是基于视频帧,但视频帧通常存在大量冗余,所以关键帧的提取至关重要.现有的传统手工提取方法通常存在漏帧,冗余帧等现象.随着深度学习的发展,相对传统手工提取方法,深度卷积网络可以大大提高对图像特征的提取能力.因此本文提出使用深度卷积网络提取视频帧深度特征与传统方法提取手工特征相结合的方法提取关键帧.首先使用卷积神经网络对视频帧进行深度特征提取,然后基于传统手工方法提取内容特征,最后融合内容特征和深度特征提取关键帧.由实验结果可得本文方法相对以往关键帧提取方法有更好的表现.  相似文献   

20.
依据目前的研究状况和家庭视频的特点,提出一种适用于家庭视频的基于场景代表帧的视频摘要生成方法.场景代表帧的选取是在提取法得到.最后给出了基于内容的家庭视频摘要系统.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号