首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Video is an information-intensive media with much redundancy. Therefore, it is desirable to be able to mine structure or semantics of video data for efficient browsing, summarization and highlight extraction. In this paper, we propose a mosaic based approach to key-event as well as structure mining, which is regarded as a complementary view for sports video analysis. Mosaic is generated for each shot by a novel efficient mosaicing scheme, which constructs a global motion path and selects a best subset of frames for mosaicing. These improved mosaics are then used as the representative image of shot content. Based on mosaic, the structure and event in sports video are mined by the methods with prior knowledge and without prior knowledge. Without prior knowledge, our system is able to locate global view shots taken by dominant camera. If prior knowledge is available, the events in these global view shots are detected using robust features extracted from mosaics. For global view mining, the experiments compared with key-frame-based scheme have demonstrated that this mosaic-based scheme presents better results in several kinds of sports videos; for events mining, the detection of key-plays and key-events in the specific-domain of soccer videos have proved its effectiveness.
Xian-Sheng HuaEmail:
  相似文献   

2.
In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community, we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps: first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then, the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their combination.
Mario VentoEmail:
  相似文献   

3.
Detecting and tracking human faces in video sequences is useful in a number of applications such as gesture recognition and human-machine interaction. In this paper, we show that online appearance models (holistic approaches) can be used for simultaneously tracking the head, the lips, the eyebrows, and the eyelids in monocular video sequences. Unlike previous approaches to eyelid tracking, we show that the online appearance models can be used for this purpose. Neither color information nor intensity edges are used by our proposed approach. More precisely, we show how the classical appearance-based trackers can be upgraded in order to deal with fast eyelid movements. The proposed eyelid tracking is made robust by avoiding eye feature extraction. Experiments on real videos show the usefulness of the proposed tracking schemes as well as their enhancement to our previous approach.
Javier OrozcoEmail:
  相似文献   

4.
Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm has been performed on various genres of films and TV program. Promising experimental results show that the proposed method makes sense to efficient retrieval of interesting video content.
Yuncai LiuEmail:
  相似文献   

5.
This paper presents a framework that explicitly detects events in broadcasting baseball videos and facilitates the development of many practical applications. Three phases of contributions are included in this work: reliable shot classification, explicit event detection, and elaborate applications. At the shot classification stage, color and geometric information are utilized to classify video shots into several canonical views. To explicitly detect semantic events, rule-based decision and model-based decision methods are developed. We emphasize that this system efficiently and exactly identifies what happened in baseball games rather than roughly finding some interesting parts. On the basis of explicit event detection, many accurate and practical applications such as automatic box score generation and game summarization could be built. The reported results show the effectiveness of the proposed framework and demonstrate some research opportunities about bridging the semantic gap for sports videos.
Ja-Ling WuEmail:
  相似文献   

6.
This paper proposes a framework to aid video analysts in detecting suspicious activity within the tremendous amounts of video data that exists in today’s world of omnipresent surveillance video. Ideas and techniques for closing the semantic gap between low-level machine readable features of video data and high-level events seen by a human observer are discussed. An evaluation of the event classification and detection technique is presented and a future experiment to refine this technique is proposed. These experiments are used as a lead to a discussion on the most optimal machine learning algorithm to learn the event representation scheme proposed in this paper.
Bhavani ThuraisinghamEmail:
  相似文献   

7.
8.
Audiovisual integration for tennis broadcast structuring   总被引:2,自引:2,他引:0  
This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot content and the interleaving of shots. This stochastic modelling is performed in the global framework of Hidden Markov Models (HMMs) that can be efficiently applied to merge audio and visual cues. Our approach is validated in the particular domain of tennis videos. The model integrates prior information about tennis content and editing rules. The basic temporal unit is the video shot. Visual features are used to characterize the type of shot view. Audio features describe the audio events within a video shot. Two sets of audio features are used in this study: the first one is extracted from a manual segmentation of the soundtrack and is more reliable. The second one is provided by an automatic segmentation and classification process. As a result of the overall HMM process, typical tennis scenes are simultaneously segmented and identified. The experiments illustrate the improvement of HMM-based fusion over indexing using only the best single media, when both media are of similar quality.
Ewa KijakEmail:
  相似文献   

9.
In conventional motion compensated temporal filtering based wavelet coding scheme, where the group of picture structure and low-pass frame position are fixed, variations in motion activities of video sequences are not considered. In this paper, we propose an adaptive group of picture structure selection scheme, which the group of picture size and low-pass frame position are selected based on mutual information. Furthermore, the temporal decomposition process is determined adaptively according to the selected group of picture structure. A large amount of experimental work is carried out to compare the compression performance of proposed method with the conventional motion compensated temporal filtering encoding scheme and adaptive group of picture structure in standard scalable video coding model. The proposed low-pass frame selection can improve the compression quality by about 0.3–0.5 dB comparing to the conventional scheme in video sequences with high motion activities. In the scenes with un-even variation of motion activities, e.g. frequent shot cuts, the proposed adaptive group of picture size can achieve a better compression capability than conventional scheme. When comparing to adaptive group of picture in standard scalable video coding model, the proposed group of picture structure scheme can lead to about 0.2~0.8 dB improvements in sequences with high motion activities or shot cut.
Zhao-Guang LiuEmail:
  相似文献   

10.
This paper addresses the problem of ensuring the integrity of a digital video and presents a scalable signature scheme for video authentication based on cryptographic secret sharing. The proposed method detects spatial cropping and temporal jittering in a video, yet is robust against frame dropping in the streaming video scenario. In our scheme, the authentication signature is compact and independent of the size of the video. Given a video, we identify the key frames based on differential energy between the frames. Considering video frames as shares, we compute the corresponding secret at three hierarchical levels. The master secret is used as digital signature to authenticate the video. The proposed signature scheme is scalable to three hierarchical levels of signature computation based on the needs of different scenarios. We provide extensive experimental results to show the utility of our technique in three different scenarios—streaming video, video identification and face tampering.
Mohan S. KankanhalliEmail:
  相似文献   

11.
Real-time 2D to 3D video conversion   总被引:1,自引:0,他引:1  
We present a real-time implementation of 2D to 3D video conversion using compressed video. In our method, compressed 2D video is analyzed by extracting motion vectors. Using the motion vector maps, depth maps are built for each frame and the frames are segmented to provide object-wise depth ordering. These data are then used to synthesize stereo pairs. 3D video synthesized in this fashion can be viewed using any stereoscopic display. In our implementation, anaglyph projection was selected as the 3D visualization method, because it is mostly suited to standard displays.
Ianir IdesesEmail:
  相似文献   

12.
We present a study of using camera-phones and visual-tags to access mobile services. Firstly, a user-experience study is described in which participants were both observed learning to interact with a prototype mobile service and interviewed about their experiences. Secondly, a pointing-device task is presented in which quantitative data was gathered regarding the speed and accuracy with which participants aimed and clicked on visual-tags using camera-phones. We found that participants’ attitudes to visual-tag-based applications were broadly positive, although they had several important reservations about camera-phone technology more generally. Data from our pointing-device task demonstrated that novice users were able to aim and click on visual-tags quickly (well under 3 s per pointing-device trial on average) and accurately (almost all meeting our defined speed/accuracy tradeoff of 6% error-rate). Based on our findings, design lessons for camera-phone and visual-tag applications are presented.
Eleanor Toye (Corresponding author)Email:
Richard SharpEmail:
Anil MadhavapeddyEmail:
David ScottEmail:
Eben UptonEmail:
Alan BlackwellEmail:
  相似文献   

13.
Clustering multidimensional sequences in spatial and temporal databases   总被引:3,自引:2,他引:1  
Many environmental, scientific, technical or medical database applications require effective and efficient mining of time series, sequences or trajectories of measurements taken at different time points and positions forming large temporal or spatial databases. Particularly the analysis of concurrent and multidimensional sequences poses new challenges in finding clusters of arbitrary length and varying number of attributes. We present a novel algorithm capable of finding parallel clusters in different subspaces and demonstrate our results for temporal and spatial applications. Our analysis of structural quality parameters in rivers is successfully used by hydrologists to develop measures for river quality improvements.
Thomas SeidlEmail:
  相似文献   

14.
There are only a few ethical regulations that deal explicitly with robots, in contrast to a vast number of regulations, which may be applied. We will focus on ethical issues with regard to “responsibility and autonomous robots”, “machines as a replacement for humans”, and “tele-presence”. Furthermore we will examine examples from special fields of application (medicine and healthcare, armed forces, and entertainment). We do not claim to present a complete list of ethical issue nor of regulations in the field of robotics, but we will demonstrate that there are legal challenges with regard to these issues.
Michael Nagenborg (Corresponding author)Email: URL: www.michaelnagenborg.de
Rafael CapurroEmail:
Jutta WeberEmail:
Christoph PingelEmail:
  相似文献   

15.
In this paper, we aim to provide adaptive multimedia services especially video ones to end-users in an efficient and secure manner. Users moving outside the office should be able to maintain an office-like environment at their current locations. First, the agents within our proposed architecture negotiate the different communication and interaction factors autonomously and dynamically. Moreover, we needed to develop a user agent in addition to service and system agents that could negotiate the requirements and capabilities at run time to furnish best possible service results. Thus we designed and integrated a video indexing and key framing service within our overall agent-based architecture. We integrated this video indexing and content-based analysis service to adapt the video content according to run time conditions. We designed a video XML schema to validate the media content out of this multimedia service according to specific requirements and features, as we will describe later.
Ahmed KarmouchEmail:
  相似文献   

16.
Multimodal support to group dynamics   总被引:1,自引:1,他引:0  
The complexity of group dynamics occurring in small group interactions often hinders the performance of teams. The availability of rich multimodal information about what is going on during the meeting makes it possible to explore the possibility of providing support to dysfunctional teams from facilitation to training sessions addressing both the individuals and the group as a whole. A necessary step in this direction is that of capturing and understanding group dynamics. In this paper, we discuss a particular scenario, in which meeting participants receive multimedia feedback on their relational behaviour, as a first step towards increasing self-awareness. We describe the background and the motivation for a coding scheme for annotating meeting recordings partially inspired by the Bales’ Interaction Process Analysis. This coding scheme was aimed at identifying suitable observable behavioural sequences. The study is complemented with an experimental investigation on the acceptability of such a service.
Fabio Pianesi (Corresponding author)Email:
Massimo ZancanaroEmail:
Elena NotEmail:
Chiara LeonardiEmail:
Vera FalconEmail:
Bruno LepriEmail:
  相似文献   

17.
The paper presents a real-time algorithm that compensates image distortions due to atmospheric turbulence in video sequences, while keeping the real moving objects in the video unharmed. The algorithm involves (1) generation of a “reference” frame, (2) estimation, for each incoming video frame, of a local image displacement map with respect to the reference frame, (3) segmentation of the displacement map into two classes: stationary and moving objects; (4) turbulence compensation of stationary objects. Experiments with both simulated and real-life sequences have shown that the restored videos, generated in real-time using standard computer hardware, exhibit excellent stability for stationary objects while retaining real motion.
Barak FishbainEmail:
  相似文献   

18.
This paper describes the simulated car racing competition that was arranged as part of the 2007 IEEE Congress on Evolutionary Computation. Both the game that was used as the domain for the competition, the controllers submitted as entries to the competition and its results are presented. With this paper, we hope to provide some insight into the efficacy of various computational intelligence methods on a well-defined game task, as well as an example of one way of running a competition. In the process, we provide a set of reference results for those who wish to use the simplerace game to benchmark their own algorithms. The paper is co-authored by the organizers and participants of the competition.
Julian Togelius (Corresponding author)Email:
Simon LucasEmail:
Ho Duc ThangEmail:
Jonathan M. GaribaldiEmail:
Tomoharu NakashimaEmail:
Chin Hiong TanEmail:
Itamar ElhananyEmail:
Shay BerantEmail:
Philip HingstonEmail:
Robert M. MacCallumEmail:
Thomas HaferlachEmail:
Aravind GowrisankarEmail:
Pete BurrowEmail:
  相似文献   

19.
Connecting the family with awareness systems   总被引:1,自引:1,他引:0  
Awareness systems have attracted significant research interest for their potential to support interpersonal relationships. Investigations of awareness systems for the domestic environment have suggested that such systems can help individuals stay in touch with dear friends or family and provide affective benefits to their users. Our research provides empirical evidence to refine and substantiate such suggestions. We report our experience with designing and evaluating the ASTRA awareness system, for connecting households and mobile family members. We introduce the concept of connectedness and its measurement through the Affective Benefits and Costs of communication questionnaire (ABC-Q). We inform results that testify the benefits of sharing experiences at the moment they happen without interrupting potential receivers. Finally, we document the role that lightweight, picture-based communication can play in the range of communication media available.
Natalia Romero (Corresponding author)Email:
Panos MarkopoulosEmail:
Joy van BarenEmail:
Boris de RuyterEmail:
Wijnand IJsselsteijnEmail:
Babak FarshchianEmail:
  相似文献   

20.
In the age of speech and voice recognition technologies, sign language recognition is an essential part of ensuring equal access for deaf people. To date, sign language recognition research has mostly ignored facial expressions that arise as part of a natural sign language discourse, even though they carry important grammatical and prosodic information. One reason is that tracking the motion and dynamics of expressions in human faces from video is a hard task, especially with the high number of occlusions from the signers’ hands. This paper presents a 3D deformable model tracking system to address this problem, and applies it to sequences of native signers, taken from the National Center of Sign Language and Gesture Resources (NCSLGR), with a special emphasis on outlier rejection methods to handle occlusions. The experiments conducted in this paper validate the output of the face tracker against expert human annotations of the NCSLGR corpus, demonstrate the promise of the proposed face tracking framework for sign language data, and reveal that the tracking framework picks up properties that ideally complement human annotations for linguistic research.
Christian Vogler (Corresponding author)Email:
Siome GoldensteinEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号