首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 156 毫秒
1.
Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access. In this paper, we introduce InsightVideo, a video analysis and retrieval system, which joins video content hierarchy, hierarchical browsing and retrieval for efficient video access. We propose several video processing techniques to organize the content hierarchy of the video. We first apply a camera motion classification and key-frame extraction strategy that operates in the compressed domain to extract video features. Then, shot grouping, scene detection and pairwise scene clustering strategies are applied to construct the video content hierarchy. We introduce a video similarity evaluation scheme at different levels (key-frame, shot, group, scene, and video.) By integrating the video content hierarchy and the video similarity evaluation scheme, hierarchical video browsing and retrieval are seamlessly integrated for efficient content access. We construct a progressive video retrieval scheme to refine user queries through the interactions of browsing and retrieval. Experimental results and comparisons of camera motion classification, key-frame extraction, scene detection, and video retrieval are presented to validate the effectiveness and efficiency of the proposed algorithms and the performance of the system.  相似文献   

2.
We describe a system for browsing and interactively retrieving video over the Internet at multiple spatial and temporal resolutions. The VideoZoom system enables users to start with coarse, low-resolution views of the sequences and selectively zoom-in in space and time. VideoZoom decomposes the video sequences into a hierarchy of view elements, which are retrieved in a progressive fashion. The client browser incrementally builds the views by retrieving, caching, and assembling the view elements, as needed. By integrating browsing and retrieval into a single progressive retrieval paradigm, VideoZoom provides a new and useful system for accessing video over the Internet. VideoZoom is suitable for digital video libraries and a number of other applications in which streaming methods provide insufficient quality of video, video downloading introduces large latencies, and generating video summaries is difficult or not well integrated with video retrieval tasks  相似文献   

3.
4.
李华北  胡卫明  罗冠 《自动化学报》2008,34(10):1243-1249
近年来基于内容的视频检索技术受到人们越来越多的关注. 本文提出了一套基于语义匹配的交互式视频检索框架, 其贡献主要为以下三方面: 1)定义新型的视频高层特征---语义直方图用以描述视频的高层语义信息; 2)使用主导集聚类算法建立基于非监督学习的检索机制, 用以降低在线计算复杂度和提高检索效率; 3)提出新型的相关反馈机制---基于语义的分支反馈, 该机制采用分支反馈结构和分支更新策略实现检索性能的提升. 实验结果表明了本框架的有效性.  相似文献   

5.
The volume of surveillance videos is increasing rapidly, where humans are the major objects of interest. Rapid human retrieval in surveillance videos is therefore desirable and applicable to a broad spectrum of applications. Existing big data processing tools that mainly target textual data cannot be applied directly for timely processing of large video data due to three main challenges: videos are more data-intensive than textual data; visual operations have higher computational complexity than textual operations; and traditional segmentation may damage video data’s continuous semantics. In this paper, we design SurvSurf, a human retrieval system on large surveillance video data that exploits characteristics of these data and big data processing tools. We propose using motion information contained in videos for video data segmentation. The basic data unit after segmentation is called M-clip. M-clips help remove redundant video contents and reduce data volumes. We use the MapReduce framework to process M-clips in parallel for human detection and appearance/motion feature extraction. We further accelerate vision algorithms by processing only sub-areas with significant motion vectors rather than entire frames. In addition, we design a distributed data store called V-BigTable to structuralize M-clips’ semantic information. V-BigTable enables efficient retrieval on a huge amount of M-clips. We implement the system on Hadoop and HBase. Experimental results show that our system outperforms basic solutions by one order of magnitude in computational time with satisfactory human retrieval accuracy.  相似文献   

6.
Motion, as a feature of video that changes in temporal sequences, is crucial to visual understanding. The powerful video representation and extraction models are typically able to focus attention on motion features in challenging dynamic environments to complete more complex video understanding tasks. However, previous approaches discriminate mainly based on similar features in the spatial or temporal domain, ignoring the interdependence of consecutive video frames. In this paper, we propose the motion sensitive self-supervised collaborative network, a video representation learning framework that exploits a pretext task to assist feature comparison and strengthen the spatiotemporal discrimination power of the model. Specifically, we first propose the motion-aware module, which extracts consecutive motion features from the spatial regions by frame difference. The global–local contrastive module is then introduced, with context and enhanced video snippets being defined as appropriate positive samples for a broader feature similarity comparison. Finally, we introduce the snippet operation prediction module, which further assists contrastive learning to obtain more reliable global semantics by sensing changes in continuous frame features. Experimental results demonstrate that our work can effectively extract robust motion features and achieve competitive performance compared with other state-of-the-art self-supervised methods on downstream action recognition and video retrieval tasks.  相似文献   

7.
Hashing is a common solution for content-based multimedia retrieval by encoding high-dimensional feature vectors into short binary codes. Previous works mainly focus on image hashing problem. However, these methods can not be directly used for video hashing, as videos contain not only spatial structure within each frame, but also temporal correlation between successive frames. Several researchers proposed to handle this by encoding the extracted key frames, but these frame-based methods are time-consuming in real applications. Other researchers proposed to characterize the video by averaging the spatial features of frames and then the existing hashing methods can be adopted. Unfortunately, the sort of “video” features does not take the correlation between frames into consideration and may lead to the loss of the temporal information. Therefore, in this paper, we propose a novel unsupervised video hashing framework via deep neural network, which performs video hashing by incorporating the temporal structure as well as the conventional spatial structure. Specially, the spatial features of videos are obtained by utilizing convolutional neural network, and the temporal features are established via long-short term memory. After that, the time series pooling strategy is employed to obtain the single feature vector for each video. The obtained spatio-temporal feature can be applied to many existing unsupervised hashing methods. Experimental results on two real datasets indicate that by employing the spatio-temporal features, our hashing method significantly improves the performance of existing methods which only deploy the spatial features, and meanwhile obtains higher mean average precision compared with the state-of-the-art video hashing methods.  相似文献   

8.
Video can be encoded into multiple-resolution format in nature. A multi-resolution or scalable video stream is a video sequence encoded such that subsets of the full resolution video bit stream can be decoded to recreate lower resolution video streams. Employing scalable video enables a video server to provide multiple resolution services for a variety of clients with different decoding capabilities and network bandwidths connected to the server. The inherent advantages of the multi-resolution video server include: heterogeneous client support, storage efficiency, adaptable service, and interactive operations support.For designing a video server, several issues should be dealt with under a unified framework including data placement/retrieval, buffer management, and admission control schemes for deterministic service guarantee. In this paper, we present a general framework for designing a large-scale multi-resolution video server. First, we propose a general multi-resolution video stream model which can be implemented by various scalable compression techniques. Second, given the proposed stream model, we devise a hybrid data placement scheme to store scalable video data across disks in the server. The scheme exploits both concurrency and parallelism offered by striping data across the disks and achieves the disk load balancing during any resolution video service. Next, the retrieval of multi-resolution video is described. The deterministic access property of the placement scheme permits the retrieval scheduling to be performed on each disk independently and to support interactive operations (e.g. pause, resume, slow playback, fastforward and rewind) simply by reconstructing the input parameters to the scheduler. We also present an efficient admission control algorithm which precisely estimates the actual disk workload for the given resolution services and hence permits the buffer requirement to be much smaller. The proposed schemes are verified through detailed simulation and implementation.  相似文献   

9.
近年来,深度学习在人工智能领域表现出优异的性能。基于深度学习的人脸生成和操纵技术已经能够合成逼真的伪造人脸视频,也被称作深度伪造,让人眼难辨真假。然而,这些伪造人脸视频可能会给社会带来巨大的潜在威胁,比如被用来制作政治虚假新闻,从而引发政治暴力或干扰正常选举等。因此,亟需研发对应的检测方法来主动发现伪造人脸视频。现有的方法在制作伪造人脸视频时,容易在空间上和时序上留下一些细微的伪造痕迹,比如纹理和颜色上的扭曲或脸部的闪烁等。主流的检测方法同样采用深度学习,可以被划分为两类,即基于视频帧的方法和基于视频片段的方法。前者采用卷积神经网络(Convolutional Neural Network,CNN)发现单个视频帧中的空间伪造痕迹,后者则结合循环神经网络(Recurrent Neural Network,RNN)捕捉视频帧之间的时序伪造痕迹。这些方法都是基于图像的全局信息进行决策,然而伪造痕迹一般存在于五官的局部区域。因而本文提出了一个统一的伪造人脸视频检测框架,利用全局时序特征和局部空间特征发现伪造人脸视频。该框架由图像特征提取模块、全局时序特征分类模块和局部空间特征分类模块组成。在FaceForensics++数据集上的实验结果表明,本文所提出的方法比之前的方法具有更好的检测效果。  相似文献   

10.
Evaluation of key frame-based retrieval techniques for video   总被引:1,自引:0,他引:1  
We investigate the application of a variety of content-based image retrieval techniques to the problem of video retrieval. We generate large numbers of features for each of the key frames selected by a highly effective shot boundary detection algorithm to facilitate a query by example type search. The retrieval performance of two learning methods, boosting and k-nearest neighbours, is compared against a vector space model. We carry out a novel and extensive evaluation to demonstrate and compare the usefulness of these algorithms for video retrieval tasks using a carefully created test collection of over 6000 still images, where performance is measured against relevance judgements based on human image annotations. Three types of experiment are carried out: classification tasks, category searches (both related to automated annotation and summarisation of video material) and real world searches (for navigation and entry point finding). We also show graphical results of real video search tasks using the algorithms, which have not previously been applied to video material in this way.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号