首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present several algorithms suitable for analysis of broadcast video. First, we show how wavelet analysis of frames of video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments. Next we describe how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based “signature.” Finally, we show that during a subsequent broadcast of a similar or identical video clip, the segment can be found in the database by quickly searching for the relevant signature. The method is robust against noise and typical variations in the video stream, even global changes in brightness that can fool histogram-based techniques. In the paper, we compare experimentally our shot transition mechanism to a color histogram implementation, and also evaluate the effectiveness of our database-searching scheme. Our algorithms are very efficient and run in realtime on a desktop computer. We describe how this technology could be employed to construct a “smart VCR” that was capable of alerting the viewer to the beginning of a specific program or identifying  相似文献   

2.
With the wide spread of smartphones, a large number of user-generated videos are produced everyday. The embedded sensors, e.g., GPS and the digital compass, make it possible that videos are accessed based on their geo-properties. In our previous work, we have created a framework for integrated, sensor-rich video acquisition (with one instantiation implemented in the form of smartphone applications) which associates a continuous stream of location and viewing direction information with the collected videos, hence allowing them to be expressed and manipulated as spatio-temporal objects. These sensor meta-data are considerably smaller in size compared to the visual content and are helpful in effectively and efficiently searching for geo-tagged videos in large-scale repositories. In this study, we propose a novel three-level grid-based index structure and introduce a number of related query types, including typical spatial queries and ones based on bounded radius and viewing direction restriction. These two criteria are important in many video applications and we demonstrate the importance with a real-world dataset. Moreover, experimental results on a large-scale synthetic dataset show that our approach can provide a significant speed improvements of at least 30 %, considering a mix of queries, compared to a multi-dimensional R-tree implementation.  相似文献   

3.
Visual (image and video) database systems require efficient indexing to enable fast access to the images in a database. In addition, the large memory capacity and channel bandwidth requirements for the storage and transmission of visual data necessitate the use of compression techniques. We note that image/video indexing and compression are typically pursued independently. This reduces the storage efficiency and may degrade the system performance. In this paper, we present novel algorithms based on vector quantization (VQ) for indexing of compressed images and video. To start with, the images are compressed using VQ. In the first technique, for each codeword in the codebook, a histogram is generated and stored along with the codeword. We note that the superposition of the histograms of the codewords, which are used to represent an image, is a close approximation of the histogram of the image. This histogram is used as an index to store and retrieve the image. In the second technique, the histogram of the labels of an image is used as an index to access the image. We also propose an algorithm for indexing compressed video sequences. Here, each frame is encoded in the intraframe mode using VQ. The labels are used for the segmentation of a video sequence into shots, and for indexing the representative frame of each shot. The proposed techniques not only provide fast access to stored visual data, but also combine compression and indexing. The average retrieval rates are 95% and 94% at compression ratios of 16:1 and 64:1, respectively. The corresponding cut detection rates are 97% and 90%, respectively.  相似文献   

4.
Automatic parsing and indexing of news video   总被引:9,自引:0,他引:9  
Automatic construction of content-based indices for video source material requires general semantic interpretation of both images and their accompanying sounds; but such a broadly-based semantic analysis is beyond the capabilities of the current technologies of machine vision and audio signal analysis. However, if one can assume a limited and well-demarcated body of domain knowledge for describing the content of a body of video, then it becomes easier to interpret a video source in terms of that domain knowledge. This paper presents our work on using domain knowledge to parse news video programs and to index them on the basis of their visual content. Models based on both the spatial structure of image frames and the temporal structure of the entire program have been developed for news videos, along with algorithms that apply these models by locating and identifying instances of their elements. Experimental results are also discussed in detail to evaluate both the models and the algorithms that use them. Finally, proposals for future work are summarized.  相似文献   

5.
6.
提出了一种基于内容的视频索引方法。将视频内容分为人脸(face)、风景(landscape)和运动目标(object motion)三种模式,实现手机平台上的快速视频索引。 首先在不同模式下用滑动窗口的方法计算出每个时间片的感兴趣程度。在face模式下,利用水平投影直方图分析人脸所在的大致位置,综合统计人脸区域的变化;在Landscape模式下,进行HSV空间的直方图统计;在object motion模式下,用基于光流的跟踪算法得出目标的运动矢量。在用户选择对应的模式后,对每个时间片内的视频求得各前后帧的  相似文献   

7.
8.
9.
Multimedia event-based video indexing using time intervals   总被引:6,自引:0,他引:6  
We propose the time interval multimedia event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen temporal interval relations and allows for proper inclusion of context and synchronization of the heterogeneous information sources involved in multimodal video analysis. To demonstrate the viability of our approach, it was evaluated on the domains of soccer and news broadcasts. For automatic classification of semantic events, we compare three different machine learning techniques, i.c. C4.5 decision tree, maximum entropy, and support vector machine. The results show that semantic video indexing results significantly benefit from using the TIME framework.  相似文献   

10.
This paper presents an efficient automatic face recognition scheme useful for video indexing applications. In particular the following problem is addressed: given a set of known face images and given a complex video sequence to be indexed, find where the corresponding faces appear in the shots of the sequence. The main and final objective is to develop a tool to be used in the MPEG-7 standardization effort to help video indexing activities. Conventional face recognition schemes are not well suited for this application and alternative and more efficient schemes have to be developed. In this paper, in the context of Principal Component Analysis for face recognition, the concept of self-eigenfaces is introduced. In addition, the color information is also incorporated in the face recognition stage. The face recognition scheme is used in combination with an automatic face detection scheme which makes the overall approach highly useful. The resulting scheme is very efficient to find specific face images and to cope with the different face conditions present in a complex video sequence. Results are presented using the test sequences accepted in the MPEG-7 video content sequences set.  相似文献   

11.
Video indexing is employed to represent the features of video sequences. Motion vectors derived from compressed video are preferred for video indexing because they can be accessed by partial decoding; thus, they are used extensively in various video analysis and indexing applications. In this study, we introduce an efficient compressed domain video indexing method and implement it on the H.264/AVC coded videos. The video retrieval experimental evaluations indicate that the video retrieval based on the proposed indexing method outperforms motion vector based video retrieval in 74 % of queries with little increase in computation time. Furthermore, we compared our method with a pixel level video indexing method which employs both temporal and spatial features. Experimental evaluation results indicate that our method outperforms the pixel level method both in performance and speed. Hence considering the speed and precision characteristics of indexing methods, the proposed method is an efficient indexing method which can be used in various video indexing and retrieval applications.  相似文献   

12.
就海量视频数据进行标引的方法进行了阐述,对元数据、Dublin Core、OAIS进行了分析。通过研究这些技术在视频挖掘中所具有的优势,提出了一种海量视频数据标引平台的体系结构并实现了相关的功能模块,并对视频数据的搜索提出了一个基于标引的解决方法。实验结果证明,该平台可为互联网视频搜索的发展提供更加快捷、方便、准确的标弓l和检索模式,有效降低了用户获取相关视频数据的时间。  相似文献   

13.
In this paper, we propose several methods for analyzing and recognizing Chinese video captions, which constitute a very useful information source for video content. Image binarization, performed by combining a global threshold method and a window-based method, is used to obtain clearer images of characters, and a caption-tracking scheme is used to locate caption regions and detect caption changes. The separation of characters from possibly complex backgrounds is achieved by using size and color constraints and by cross examination of multiframe images. To segment individual characters, we use a dynamic split-and-merge strategy. Finally, we propose a character recognition process using a prototype classification method, supplemented by a disambiguation process using support vector machines, to improve recognition outcomes. This is followed by a postprocess that integrates multiple recognition results. The overall accuracy rate for the entire process applied to test video films is 94.11%. Published online: 2 February 2005  相似文献   

14.
针对交互式的多媒体学习系统的特点,提出了一种基于自然语言的方法来实现基于内容的视频检索,用户可以用自然语言和系统进行交互,从而方便快捷地找到自己想要的视频片段.该方法集成了自然语言处理、实体名提取,基于帧的索引以及信息检索等技术,从而使系统能够处理用户提出的自然语言问题,根据问题构建简洁明了的问题模板,用问题模板与系统中已建的描述视频的模板进行匹配,从而降低了视频检索问题的复杂度,提高了系统的易用性.  相似文献   

15.
16.
We are witnessing a significant growth in the number of smartphone users and advances in phone hardware and sensor technology. In conjunction with the popularity of video applications such as YouTube, an unprecedented number of user-generated videos (UGVs) are being generated and consumed by the public, which leads to a Big Data challenge in social media. In a very large video repository, it is difficult to index and search videos in their unstructured form. However, due to recent development, videos can be geo-tagged (e.g., locations from GPS receiver and viewing directions from digital compass) at the acquisition time, which can provide potential for efficient management of video data. Ideally, each video frame can be tagged by the spatial extent of its coverage area, termed Field-Of-View (FOV). This effectively converts a challenging video management problem into a spatial database problem. This paper attacks the challenges of large-scale video data management using spatial indexing and querying of FOVs, especially maximally harnessing the geographical properties of FOVs. Since FOVs are shaped similar to slices of pie and contain both location and orientation information, conventional spatial indexes, such as R-tree, cannot index them efficiently. The distribution of UGVs’ locations is non-uniform (e.g., more FOVs in popular locations). Consequently, even multilevel grid-based indexes, which can handle both location and orientation, have limitations in managing the skewed distribution. Additionally, since UGVs are usually captured in a casual way with diverse setups and movements, no a priori assumption can be made to condense them in an index structure. To overcome the challenges, we propose a class of new R-tree-based index structures that effectively harness FOVs’ camera locations, orientations and view-distances, in tandem, for both filtering and optimization. We also present novel search strategies and algorithms for efficient range and directional queries on our indexes. Our experiments using both real-world and large synthetic video datasets (over 30 years’ worth of videos) demonstrate the scalability and efficiency of our proposed indexes and search algorithms.  相似文献   

17.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

18.
K  Namitha  Narayanan  Athi 《Multimedia Tools and Applications》2020,79(43-44):32331-32360

Video synopsis is an effective solution for fast browsing and retrieval of long surveillance videos. It aims to shorten long video sequences into its equivalent compact video representation by rearranging the video events in the temporal domain and/or spatial domain. Conventional video synopsis methods focus on reducing the collisions between tubes and maintaining their chronological order, which may alter the original interactions between tubes due to improper tube rearrangement. In this paper, we present an approach to preserve the relationships among tubes (tracks of moving objects) of the original video in the synopsis video. First, a recursive tube-grouping algorithm is proposed to determine the behavior interactions among tubes in a video and group the related tubes together to form tube sets. Second, to preserve the discovered relationships, a spatio-temporal cube voting algorithm is proposed. This cube voting method optimally rearranges the tube sets in the synopsis video, minimizing false collisions between tubes. Third, a method to estimate the duration of the synopsis video is proposed based on an entropy measure of tube collisions. The extensive experimental results demonstrate that the proposed video synopsis framework condenses videos by preserving the original tube interactions and reducing false tube collisions.

  相似文献   

19.
Nowadays, tremendous amount of video is captured endlessly from increased numbers of video cameras distributed around the world. Since needless information is abundant in the raw videos, making video browsing and retrieval is inefficient and time consuming. Video synopsis is an effective way to browse and index such video, by producing a short video representation, while keeping the essential activities of the original video. However, video synopsis for single camera is limited in its view scope, while understanding and monitoring overall activity for large scenarios is valuable and demanding. To solve the above issues, we propose a novel video synopsis algorithm for partially overlapping camera network. Our main contributions reside in three aspects: First, our algorithm can generate video synopsis for large scenarios, which can facilitate understanding overall activities. Second, for generating overall activity, we adopt a novel unsupervised graph matching algorithm to associate trajectories across cameras. Third, a novel multiple kernel similarity is adopted in selecting key observations for eliminating content redundancy in video synopsis. We have demonstrated the effectiveness of our approach on real surveillance videos captured by our camera network.  相似文献   

20.
Multimedia Tools and Applications - The amount of digital material in video lecture archives is growing rapidly, causing the search&retrieval process to be time-consuming and almost...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号