首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
视频摘要技术综述   总被引:2,自引:0,他引:2       下载免费PDF全文
目的 类似于文本摘要,视频摘要是对视频内容的总结。为了合理地评估视频摘要领域的研究进展,正确导向视频摘要的继续研究,本文归纳总结视频摘要技术的主要研究方法和显著性成果,对视频摘要技术进行综述。方法 依据视频摘要的两个主要生成步骤:视频内容分析和摘要生成分别介绍视频摘要的主要研究方法。同时,分析了近5年视频摘要领域的研究状况,对视频摘要发展的新趋势:实时视频摘要和多视角视频摘要进行了阐述。最后,还对视频摘要的评价系统进行了分类总结。结果 对视频摘要进行综述,对摘要中的语义获取难题提出了2种指导性建议。并依据分析结果,展望了视频摘要技术未来的发展方向。结论 视频摘要技术作为视频内容理解的重要组成部分,有较大研究价值。而目前,视频摘要在视频语义表达和摘要评价系统方面并不精确完善,还需进一步的深入研究。  相似文献   

2.
Automatic composition of broadcast sports video   总被引:1,自引:0,他引:1  
This study examines an automatic broadcast soccer video composition system. The research is important as the ability to automatically compose broadcast sports video will not only improve broadcast video generation efficiency, but also provides the possibility to customize sports video broadcasting. We present a novel approach to the two major issues required in the system’s implementation, specifically the camera view selection/switching module and the automatic replay generation module. In our implementation, we use multi-modal framework to perform video content analysis, event and event boundary detection from the raw unedited main/sub-camera captures. This framework explores the possible cues using mid-level representations to bridge the gap between low-level features and high-level semantics. The video content analysis results are utilized for camera view selection/switching in the generated video composition, and the event detection results and mid-level representations are used to generate replays which are automatically inserted into the broadcast soccer video. Our experimental results are promising and found to be comparable to those generated by broadcast professionals.  相似文献   

3.
4.
5.
Although the Metadata Editor is an important part of any digital library, it becomes fundamental in the presence of audiovisual content. This is because the metadata produced by automated support tools (such as speech recognizers and shot detection procedures) is error-prone and often needs correction. In addition, scenes are manually annotated. This paper describes Regia, a prototype application for manually editing metadata for audiovisual documents developed in the ECHO project. Regia allows the user to manually edit textual metadata and to hierarchically organize the segmentation of the audiovisual content. An important feature of this metadata editor is that it is not hard-wired with a particular metadata attributes set. To achieve this feature the XML schema of the metadata model is used by the editor as a configuration file.
Claudio GennaroEmail:
  相似文献   

6.
A video streaming proxy server needs to handle hundreds of simultaneous connections between media servers and clients. Inside, every video arrived at the server and delivered from it follows a specific arrival and delivery schedule. While arrival schedules compete for incoming network bandwidth, delivery schedules compete for outgoing network bandwidth. As a result, a proxy server has to provide sufficient buffer and disk cache for storage, together with memory space, disk space and disk bandwidth. In order to optimize the throughput, a proxy server has to govern the usage of these resources. In this paper, we first analyze the property of a traditional smoothing algorithm and a video staging algorithm. Then we develop, based on the smoothing algorithm, a video staging algorithm for video streaming proxy servers. This algorithm allows us to devise an arrival schedule based on the delivery schedule. Under this arrival and delivery schedule pair, we can achieve a better resource utilization rate gracefully between different parameter sets. It is also interesting to note that the usage of the resources such as network bandwidth, disk bandwidth and memory space becomes interchangeable. It provides the basis for inter-resource scheduling to further improve the throughput of a video streaming proxy server system.
Daniel P. K. LunEmail:
  相似文献   

7.
Video is an information-intensive media with much redundancy. Therefore, it is desirable to be able to mine structure or semantics of video data for efficient browsing, summarization and highlight extraction. In this paper, we propose a mosaic based approach to key-event as well as structure mining, which is regarded as a complementary view for sports video analysis. Mosaic is generated for each shot by a novel efficient mosaicing scheme, which constructs a global motion path and selects a best subset of frames for mosaicing. These improved mosaics are then used as the representative image of shot content. Based on mosaic, the structure and event in sports video are mined by the methods with prior knowledge and without prior knowledge. Without prior knowledge, our system is able to locate global view shots taken by dominant camera. If prior knowledge is available, the events in these global view shots are detected using robust features extracted from mosaics. For global view mining, the experiments compared with key-frame-based scheme have demonstrated that this mosaic-based scheme presents better results in several kinds of sports videos; for events mining, the detection of key-plays and key-events in the specific-domain of soccer videos have proved its effectiveness.
Xian-Sheng HuaEmail:
  相似文献   

8.
The fast evolution of digital video has brought many new multimedia applications and, as a consequence, has increased the amount of research into new technologies that aim at improving the effectiveness and efficiency of video acquisition, archiving, cataloging and indexing, as well as increasing the usability of stored videos. Among possible research areas, video summarization is an important topic that potentially enables faster browsing of large video collections and also more efficient content indexing and access. Essentially, this research area consists of automatically generating a short summary of a video, which can either be a static summary or a dynamic summary. In this paper, we present VSUMM, a methodology for the production of static video summaries. The method is based on color feature extraction from video frames and k-means clustering algorithm. As an additional contribution, we also develop a novel approach for the evaluation of video static summaries. In this evaluation methodology, video summaries are manually created by users. Then, several user-created summaries are compared both to our approach and also to a number of different techniques in the literature. Experimental results show - with a confidence level of 98% - that the proposed solution provided static video summaries with superior quality relative to the approaches to which it was compared.  相似文献   

9.
10.
一种家庭视频摘要生成的新方法   总被引:1,自引:1,他引:1  
智敏  蔡安妮 《计算机工程》2006,32(6):226-227
计算机硬件的发展使家用计算机具有处理和存储视频资料的能力,而家用数字摄像设备的普及使家庭视频的数量越来越多,家庭用户对视频摘要技术的需求也越来越强烈。在回顾现有视频摘要相关的概念、分类和技术,以及分析家庭视频的特征基础上,给出了家庭视频摘要的特点,并提出了一个面向家庭视频的视频摘要算法。  相似文献   

11.
Automated virtual camera control has been widely used in animation and interactive virtual environments. We have developed a multiple sparse camera based free view video system prototype that allows users to control the position and orientation of a virtual camera, enabling the observation of a real scene in three dimensions (3D) from any desired viewpoint. Automatic camera control can be activated to follow selected objects by the user. Our method combines a simple geometric model of the scene composed of planes (virtual environment), augmented with visual information from the cameras and pre-computed tracking information of moving targets to generate novel perspective corrected 3D views of the virtual camera and moving objects. To achieve real-time rendering performance, view-dependent textured mapped billboards are used to render the moving objects at their correct locations and foreground masks are used to remove the moving objects from the projected video streams. The current prototype runs on a PC with a common graphics card and can generate virtual 2D views from three cameras of resolution 768×576 with several moving objects at about 11 fps.  相似文献   

12.
提出了一种基于短时切片的球拍类体育视频比赛镜头提取方法。该方法对视频时空切片分帧,通过对切片帧聚类、合并、边界检测和映射获取比赛镜头。实验表明该方法具有很好的鲁棒性和准确度。  相似文献   

13.
Exploring video content structure for hierarchical summarization   总被引:4,自引:0,他引:4  
In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the supergroup into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach.Published online: 15 September 2004 Corespondence to: Xingquan ZhuThis research has been supported by the NSF under grants 9972883-EIA, 9974255-IIS, 9983248-EIA, and 0209120-IIS, a grant from the state of Indiana 21th Century Fund, and by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant DAAD19-02-1-0178.  相似文献   

14.
This paper presents an approach for detecting suspicious events in videos by using only the video itself as the training samples for valid behaviors. These salient events are obtained in real-time by detecting anomalous spatio-temporal regions in a densely sampled video. The method codes a video as a compact set of spatio-temporal volumes, while considering the uncertainty in the codebook construction. The spatio-temporal compositions of video volumes are modeled using a probabilistic framework, which calculates their likelihood of being normal in the video. This approach can be considered as an extension of the Bag of Video words (BOV) approaches, which represent a video as an order-less distribution of video volumes. The proposed method imposes spatial and temporal constraints on the video volumes so that an inference mechanism can estimate the probability density functions of their arrangements. Anomalous events are assumed to be video arrangements with very low frequency of occurrence. The algorithm is very fast and does not employ background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. Experiments were performed on four video datasets of abnormal activities in both crowded and non-crowded scenes and under difficult illumination conditions. The proposed method outperformed all other approaches based on BOV that do not account for contextual information.  相似文献   

15.
16.
Video compositing, the editing and integrating of many video sequences into a single presentation, is an integral part of advanced multimedia services. Single-user compositing systems have been suggested in the past, but when they are extended to accommodate many users, the amount of memory required quickly grows out of hand. We propose two new architectures for digital video compositing in a multiuser environment that are memory-efficient and can operate in real time. Both architectures decouple the task of memory management from compositing processing. We show that under hard throughput and bandwidth constraints, a memory less solution for transferring data from many video sources to many users does not exist. We overcome this using (i) a dynamic memory buffering architecture and (ii) a constant memory bandwidth solution that transforms the sources-to-users transfer schedule into two schedules, then pipelines the computation. The architectures support opaque overlapping of images, arbitrarily shaped images, and images whose shapes dynamically change from frame to frame.  相似文献   

17.
This paper presents a two-level queueing system for dynamic summarization and interactive searching of video content. Video frames enter the queueing system; some insignificant and redundant frames are removed; the remaining frames are pulled out of the system as top-level key frames. Using an energy-minimization method, the first queue removes the video frames that constitute the gradual transitions of video shots. The second queue measures the content similarity of video frames and reduces redundant frames. In the queueing system, all key frames are linked in a directed-graph index structure, allowing video content to be accessed at any level-of-detail. Furthermore, this graph-based index structure enables interactive video content exploration, and the system is able to retrieve the video key frames that complement the video content already viewed by users. Experimental results on four full-length videos show that our queueing system performs much better than two existing methods on video key frame selection at different compression ratios. The evaluation on video content search shows that our interactive system is more effective than other systems on eight video searching tasks. Compared with the regular media player, our system reduces the average content searching time by half.  相似文献   

18.
Wireless capsule endoscopy (WCE) has several benefits over traditional endoscopy such as its portability and ease of usage, particularly for remote internet of things (IoT)-assisted healthcare services. During the WCE procedure, a significant amount of redundant video data is generated, the transmission of which to healthcare centers and gastroenterologists securely for analysis is challenging as well as wastage of several resources including energy, memory, computation, and bandwidth. In addition to this, it is inherently difficult and time consuming for gastroenterologists to analyze this huge volume of gastrointestinal video data for desired contents. To surmount these issues, we propose a secure video summarization framework for outdoor patients going through WCE procedure. In the proposed system, keyframes are extracted using a light-weighted video summarization scheme, making it more suitable for WCE. Next, a cryptosystem is presented for security of extracted keyframes based on 2D Zaslavsky chaotic map. Experimental results validate the performance of the proposed cryptosystem in terms of robustness and high-level security compared to other recent image encryption schemes during dissemination of important keyframes to healthcare centers and gastroenterologists for personalized WCE.  相似文献   

19.
A novel software-based video compression algorithm, the Popular Video Coder (PVC), is presented in this paper, and a video phone system, the Popular Phone, is also implemented based on the PVC. The PVC simplifies the traditional video coder by removing the transform and the motion estimation parts and modifies the quantizer and entropy coder. Two novel coding algorithms, the adaptive quantizer and the modified windowed Huffman-like coder, are used in the PVC to encode the video data with a quality picture at a high compression ratio. The video quality of the proposed coder is as good as that of the MPEG coder when the input is a low-resolution and slow-motion video, and the computational complexity of the PVC is much lower than that of the Motion Picture Expert Group (MPEG). Since no compression hardware is needed for the PVC to encode and decode video data, the cost and complexity of developing multimedia applications, such as video phone and multimedia e-mail systems, can be greatly reduced. Furthermore, some networking issues, such as error control and flow control, are discussed in connection with applying the PVC to implement the Popular Phone.  相似文献   

20.
This paper addresses the problem of ensuring the integrity of a digital video and presents a scalable signature scheme for video authentication based on cryptographic secret sharing. The proposed method detects spatial cropping and temporal jittering in a video, yet is robust against frame dropping in the streaming video scenario. In our scheme, the authentication signature is compact and independent of the size of the video. Given a video, we identify the key frames based on differential energy between the frames. Considering video frames as shares, we compute the corresponding secret at three hierarchical levels. The master secret is used as digital signature to authenticate the video. The proposed signature scheme is scalable to three hierarchical levels of signature computation based on the needs of different scenarios. We provide extensive experimental results to show the utility of our technique in three different scenarios—streaming video, video identification and face tampering.
Mohan S. KankanhalliEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号