首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This work addresses the development of a computational model of visual attention to perform the automatic summarization of digital videos from television archives. Although the television system represents one of the most fascinating media phenomena ever created, we still observe the absence of effective solutions for content-based information retrieval from video recordings of programs produced by this media universe. This fact relates to the high complexity of the content-based video retrieval problem, which involves several challenges, among which we may highlight the usual demand on video summaries to facilitate indexing, browsing and retrieval operations. To achieve this goal, we propose a new computational visual attention model, inspired on the human visual system and based on computer vision methods (face detection, motion estimation and saliency map computation), to estimate static video abstracts, that is, collections of salient images or key frames extracted from the original videos. Experimental results with videos from the Open Video Project show that our approach represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.  相似文献   

2.
设计和实现一个支持语义的分布式视频检索系统:"语寻"。该系统利用一个改进的视频语义处理工具(该工具基于IBM VideoAnnEx标注工具,并增加镜头语义图标注和自然语言处理的功能)对视频进行语义分析和标注,生成包含语义信息的MPEG-7描述文件,然后对视频的MPEG-7描述文件建立分布式索引,并同时分布式存储视频文件;系统提供丰富的Web查询接口,包括关键字语义扩展查询,语义图查询以及自然语句查询,当用户提交语义查询意图后,便能够迅速地检索到感兴趣的视频和片段,并且可以浏览点播;整个系统采用分布式架构,具备良好的可扩展性,并能够支持海量视频信息的索引和检索。  相似文献   

3.
4.
With the recent popularization of mobile video cameras including camera phones, a new technology, mobile video surveillance, which uses mobile video cameras for video surveillance has been emerging. Such videos, however, may infringe upon the privacy of others by disclosing privacy sensitive information (PSI), i.e., their appearances. To prevent videos from infringing on the right to privacy, new techniques are required that automatically obscure PSI regions. The problem is how to determine the PSI regions to be obscured while maintaining enough video content to present the camera persons’ capture-intentions, i.e., what they want to record in their videos to achieve their surveillance tasks. To this end, we introduce a new concept called intended human objects that are defined as human objects essential for capture-intentions, and develop a new method called intended human object detection that automatically detects the intended human objects in videos taken by different camera persons. Through the process of intended human object detection, we develop a system for automatically obscuring PSI regions. We experimentally show the performance of intended human object detection and the contributions of the features used. Our user study shows the potential applicability of our proposed system.  相似文献   

5.
We report the recognition in video streams of isolated alphabetic characters and connected cursive textual characters, such as alphabetic, hiragana and kanji characters, that are drawn in the air. This topic involves a number of difficult problems in computer vision, such as the segmentation and recognition of complex motion on videos. We use an algorithm called time–space continuous dynamic programming (TSCDP), which can realize both time- and location-free (spotting) recognition. Spotting means that the prior segmentation of input video is not required. Each reference (model) character is represented by a single stroke that is composed of pixels. We conducted two experiments involving the recognition of 26 isolated alphabetic characters and 23 Japanese hiragana and kanji air-drawn characters. We also conducted gesture recognition experiments based on TSCDP, which showed that TSCDP was free from many of the restrictions imposed by conventional methods.  相似文献   

6.
7.
Personalization is one of the most important mechanisms to make multimedia systems easy to use. In video applications, its embodiment is to tailor video contents for a particular viewer. For this purpose, we are now developing a system of retrieving and browsing video segments, called video portal with personalization (VIPP). VIPP is characterized by 1) supporting the viewer's access to video contents and making a summarized video clip by taking his/her preference into account and 2) acquiring the viewer's profile from his/her operations automatically. In this paper, we propose a method for learning to personalize from the viewer's operations such as retrieval and browsing, as well as describe how the personalized retrieval and summarization of videos can be realized. From the experiments, we clarify the effect of personalization on retrieval and summarization of baseball videos on VIPP.  相似文献   

8.
We present an original approach for motion-based video retrieval involving partial query. More precisely, we propose a unified statistical framework allowing us to simultaneously extract entities of interest in video shots and supply the associated content-based characterization, which can be used to satisfy partial queries. It relies on the analysis of motion activity in video sequences based on a non-parametric probabilistic modeling of motion information. Areas comprising relevant types of motion activity are extracted from a Markovian region-level labeling applied to the adjacency graph of an initial block-based partition of the image. As a consequence, given a set of videos, we are able to construct a structured base of samples of entities of interest represented by their associated statistical models of motion activity. The retrieval operations is then formulated as a Bayesian inference issue using the MAP criterion. We report different results of extraction of entities of interest in video sequences and examples of retrieval operations performed on a base composed of one hundred video samples.  相似文献   

9.
This paper tackles the problem of surveillance video content modelling. Given a set of surveillance videos, the aims of our work are twofold: firstly a continuous video is segmented according to the activities captured in the video; secondly a model is constructed for the video content, based on which an unseen activity pattern can be recognised and any unusual activities can be detected. To segment a video based on activity, we propose a semantically meaningful video content representation method and two segmentation algorithms, one being offline offering high accuracy in segmentation, and the other being online enabling real-time performance. Our video content representation method is based on automatically detected visual events (i.e. ‘what is happening in the scene’). This is in contrast to most previous approaches which represent video content at the signal level using image features such as colour, motion and texture. Our segmentation algorithms are based on detecting breakpoints on a high-dimensional video content trajectory. This differs from most previous approaches which are based on shot change detection and shot grouping. Having segmented continuous surveillance videos based on activity, the activity patterns contained in the video segments are grouped into activity classes and a composite video content model is constructed which is capable of generalising from a small training set to accommodate variations in unseen activity patterns. A run-time accumulative unusual activity measure is introduced to detect unusual behaviour while usual activity patterns are recognised based on an online likelihood ratio test (LRT) method. This ensures robust and reliable activity recognition and unusual activity detection at the shortest possible time once sufficient visual evidence has become available. Comparative experiments have been carried out using over 10 h of challenging outdoor surveillance video footages to evaluate the proposed segmentation algorithms and modelling approach.  相似文献   

10.
Image and video analysis requires rich features that can characterize various aspects of visual information. These rich features are typically extracted from the pixel values of the images and videos, which require huge amount of computation and seldom useful for real-time analysis. On the contrary, the compressed domain analysis offers relevant information pertaining to the visual content in the form of transform coefficients, motion vectors, quantization steps, coded block patterns with minimal computational burden. The quantum of work done in compressed domain is relatively much less compared to pixel domain. This paper aims to survey various video analysis efforts published during the last decade across the spectrum of video compression standards. In this survey, we have included only the analysis part, excluding the processing aspect of compressed domain. This analysis spans through various computer vision applications such as moving object segmentation, human action recognition, indexing, retrieval, face detection, video classification and object tracking in compressed videos.  相似文献   

11.
刘云恒  刘耀宗 《计算机科学》2016,43(Z6):448-451, 475
公安视频监控技术已经从联网整合阶段发展到视频实战深度应用阶段,面对源源不断的公安视频大数据,需要研究新型的大数据处理方法。根据公安视频大数据应用需求,采用基于Hadoop技术的视频大数据处理平台,并采用以Map-Reduce算法为基础的人脸检索与识别算法,来实现公安视频大数据的智能信息处理,达到公安大数据实战应用的目的。  相似文献   

12.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

13.
In this paper, we propose a novel motion-based video retrieval approach to find desired videos from video databases through trajectory matching. The main component of our approach is to extract representative motion features from the video, which could be broken down to the following three steps. First, we extract the motion vectors from each frame of videos and utilize Harris corner points to compensate the effect of the camera motion. Second, we find interesting motion flows from frames using sliding window mechanism and a clustering algorithm. Third, we merge the generated motion flows and select representative ones to capture the motion features of videos. Furthermore, we design a symbolic based trajectory matching method for effective video retrieval. The experimental results show that our algorithm is capable to effectively extract motion flows with high accuracy and outperforms existing approaches for video retrieval.  相似文献   

14.

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

  相似文献   

15.
Implementing new strategies to achieve higher work efficiency is essential to improve productivity during a monotonous and exhausting work task. The Modular Arrangement of Predetermined Time Standard (MODAPTS) concept is widely used to achieve these goals in the field of human factors. However, MODAPTS is very complicated with regard to assisting engineers in understanding the system because it requires much learning time and even longer hours of effort during coding manual process. To help overcome the deficiency of the traditional method, a modern method for motion analysis, called PCA-based motion analysis, is proposed in this paper. Motion analysis has already been the main approach to analyze human motion in sports science and biometrics, but it is still unfamiliar for human factors analysts. This paper discusses a potential connection between motion analysis technology and MODAPTS analysis, which would help make MODAPTS more efficient and reliable. In the experiment, fifteen participants were asked to watch a motion sequence and then analyze the motion sequence using MODAPTS. Meanwhile, the motion-captured data were carefully segmented into motion elements with the PCA approach. A comparison of the motion segmentation contrast was made between MODAPTS analysis and automatic motion element segmentation using PCA. The accuracy rate of segmentation by the PCA approach was 80.08%, and the primitive frames of the two methods indicated that the segmentation is acceptable. In addition, the PCA-based motion analysis showed a substantial time-saving difference in the processing time, which was only approximately 3 min for motion analysis versus over 1 h for MODAPTS. Motion analysis provides high efficiency and reliability for motion segmentation and sufficient precision compared to the results using MODAPTS. Moreover, assessment of the operations’ rationality and optimization of the production line design instead of repetitive work on motion segmentation is the focus. Integrating motion analysis technology into traditional MODAPTS is a useful advancement that permits significant progress for human factors analysis. In the future, the accuracy of the automatic segmentation techniques should be improved.Relevance to industryIn recent years, motion analysis technology has become increasingly popular in various fields but not in the field of human factors. The automatic method presented in this paper may allow industrial workers to optimize unreasonable motion, remove unnecessary operations and formulate the standard working time more conveniently and accurately.  相似文献   

16.
In order to analyse surveillance video, we need to efficiently explore large datasets containing videos of walking humans. Effective analysis of such data relies on retrieval of video data which has been enriched using semantic annotations. A manual annotation process is time-consuming and prone to error due to subject bias however, at surveillance-image resolution, the human walk (their gait) can be analysed automatically. We explore the content-based retrieval of videos containing walking subjects, using semantic queries. We evaluate current research in gait biometrics, unique in its effectiveness at recognising people at a distance. We introduce a set of semantic traits discernible by humans at a distance, outlining their psychological validity. Working under the premise that similarity of the chosen gait signature implies similarity of certain semantic traits we perform a set of semantic retrieval experiments using popular Latent Semantic Analysis techniques. We perform experiments on a dataset of 2000 videos of people walking in laboratory conditions and achieve promising retrieval results for features such as Sex (mAP  =  14% above random), Age (mAP  =  10% above random) and Ethnicity (mAP  =  9% above random).  相似文献   

17.
当前,智慧城市成为信息时代城市建设的一个基本目标,智能视频安防监控是其中重要一环,希望从视频图像提取出有效的信息,提供有效的治安防控业务信息。由于视频监控系统广泛使用于各行各业,监控视频数据已成为一类典型的大数据,因此,如何对监控视频大数据进行高效的处理成为一个重要挑战。为此,本文在分析视频处理特点的基础上,提出并实现了一种基于HadoopMapReduce计算框架的分布式离线视频处理方法,该方法根据视频处理的特点进行优化,提升了监控视频大数据的处理效率。  相似文献   

18.
Many interesting and promising prototypes for visualizing video data have been proposed, including those that combine videos with their spatial context (contextualized videos). However, relatively little work has investigated the fundamental design factors behind these prototypes in order to provide general design guidance. Focusing on real-time video data visualization, we evaluated two important design factors--video placement method and spatial context presentation method--through a user study. In addition, we evaluated the effect of spatial knowledge of the environment. Participants' performance was measured through path reconstruction tasks, where the participants followed a target through simulated surveillance videos and marked the target paths on the environment model. We found that embedding videos inside the model enabled realtime strategies and led to faster performance. With the help of contextualized videos, participants not familiar with the real environment achieved similar task performance to participants that worked in that environment. We discuss design implications and provide general design recommendations for traffic and security surveillance system interfaces.  相似文献   

19.
Human Motion Analysis: A Review   总被引:4,自引:0,他引:4  
Human motion analysis is receiving increasing attention from computer vision researchers. This interest is motivated by a wide spectrum of applications, such as athletic performance analysis, surveillance, man–machine interfaces, content-based image storage and retrieval, and video conferencing. This paper gives an overview of the various tasks involved in motion analysis of the human body. We focus on three major areas related to interpreting human motion: (1) motion analysis involving human body parts, (2) tracking a moving human from a single view or multiple camera perspectives, and (3) recognizing human activities from image sequences. Motion analysis of human body parts involves the low-level segmentation of the human body into segments connected by joints and recovers the 3D structure of the human body using its 2D projections over a sequence of images. Tracking human motion from a single view or multiple perspectives focuses on higher-level processing, in which moving humans are tracked without identifying their body parts. After successfully matching the moving human image from one frame to another in an image sequence, understanding the human movements or activities comes naturally, which leads to our discussion of recognizing human activities.  相似文献   

20.
We present a novel technique, called 2-Phase Service Model, for streaming videos to home users in a limited-bandwidth environment. This scheme first delivers some number of non-adjacent data fragments to the client in Phase 1. The missing fragments are then transmitted in Phase 2 as the client is playing back the video. This approach offers many benefits. The isochronous bandwidth required for Phase 2 can be controlled within the capability of the transport medium. The data fragments received during Phase 1 can be used to provide an excellent preview of the video. They can also be used to facilitate VCR-style operations such as fast-forward and fast-reverse. Systems designed based on this method are less expensive because the fast-forward and fast-reverse versions of the video files are no longer needed. Eliminating these files also improves system performance because mapping between the regular files and their fast-forward and fast-reverse versions is no longer part of the VCR operations. Furthermore, since each client machine handles its own VCR-style interaction, this technique is very scalable. We provide simulation results to show that 2-Phase Service Model is able to handle VCR functions efficiently. We also implement a video player called {\em FRVplayer}. With this prototype, we are able to judge that the visual quality of the previews and VCR-style operations is excellent. These features are essential to many important applications. We discuss the application of FRVplayer in the design of a video management system, called VideoCenter. This system is intended for Internet applications such as digital video libraries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号