首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automatic annotation of semantic events allows effective retrieval of video content. In this work, we present solutions for highlights detection in sports videos. This application is particularly interesting for broadcasters, since they extensively use manual annotation to select interesting highlights that are edited to create new programmes. The proposed approach exploits the typical structure of a wide class of sports videos, namely, those related to sports which are played in delimited venues with playfields of well known geometry, like soccer, basketball, swimming, track and field disciplines, and so on. For this class of sports, a modeling scheme based on a limited set of visual cues and on finite state machines (FSM) that encode the temporal evolution of highlights is presented. Algorithms for model checking and for visual cues estimation are discussed, as well as applications of the representation to different sport domains.  相似文献   

2.
In this paper, we present a large database of over 50,000 user-labeled videos collected from YouTube. We develop a compact representation called "tiny videos" that achieves high video compression rates while retaining the overall visual appearance of the video as it varies over time. We show that frame sampling using affinity propagation-an exemplar-based clustering algorithm-achieves the best trade-off between compression and video recall. We use this large collection of user-labeled videos in conjunction with simple data mining techniques to perform related video retrieval, as well as classification of images and video frames. The classification results achieved by tiny videos are compared with the tiny images framework [24] for a variety of recognition tasks. The tiny images data set consists of 80 million images collected from the Internet. These are the largest labeled research data sets of videos and images available to date. We show that tiny videos are better suited for classifying scenery and sports activities, while tiny images perform better at recognizing objects. Furthermore, we demonstrate that combining the tiny images and tiny videos data sets improves classification precision in a wider range of categories.  相似文献   

3.
4.
Advances in the media and entertainment industries, including streaming audio and digital TV, present new challenges for managing and accessing large audio-visual collections. Current content management systems support retrieval using low-level features, such as motion, color, and texture. However, low-level features often have little meaning for naive users, who much prefer to identify content using high-level semantics or concepts. This creates a gap between systems and their users that must be bridged for these systems to be used effectively. To this end, in this paper, we first present a knowledge-based video indexing and content management framework for domain specific videos (using basketball video as an example). We will provide a solution to explore video knowledge by mining associations from video data. The explicit definitions and evaluation measures (e.g., temporal support and confidence) for video associations are proposed by integrating the distinct feature of video data. Our approach uses video processing techniques to find visual and audio cues (e.g., court field, camera motion activities, and applause), introduces multilevel sequential association mining to explore associations among the audio and visual cues, classifies the associations by assigning each of them with a class label, and uses their appearances in the video to construct video indices. Our experimental results demonstrate the performance of the proposed approach.  相似文献   

5.
6.
The paper proposes measures for weighted indexing of sports news videos. The content-based analyses of sports news videos lead to the classification of frames or shots into sports categories. A set of sports categories reported in a given news video can be used as a video representation in visual information retrieval system. However, such an approach does not take into account how many sports events of a given category have been reported and how long these events have been presented in news for televiewers. Weighting of sports categories in a video representation reflecting their importance in a given video or in a whole video data base would be desirable. The effects of applying the proposed measures have been demonstrated in a test video collection. The experiments and evaluations performed on this collection have also shown that we do not need to apply perfect content-based analyses to ensure proper weighted indexing of sports news videos. It is sufficient to recognize the content of only some frames and to determine the number of shots, scenes or pseudo-scenes detected in temporal aggregation process, or even only the number of events of a given sports category in a sports news video being indexed.  相似文献   

7.
Based on the analysis of temporal slices, we propose novel approaches for clustering and retrieval of video shots. Temporal slices are a set of two-dimensional (2-D) images extracted along the time dimension of an image volume. They encode rich set of visual patterns for similarity measure. In this paper, we first demonstrate that tensor histogram features extracted from temporal slices are suitable for motion retrieval. Subsequently, we integrate both tensor and color histograms for constructing a two-level hierarchical clustering structure. Each cluster in the top level contains shots with similar color while each cluster in bottom level consists of shots with similar motion. The constructed structure is then used for the cluster-based retrieval. The proposed approaches are found to be useful particularly for sports games, where motion and color are important visual cues when searching and browsing the desired video shots.  相似文献   

8.
The majority of existing work on sports video analysis concentrates on highlight extraction. Little work focuses on the important issue as how the extracted highlights should be organized. In this paper, we present a multimodal approach to organize the highlights extracted from racket sports video grounded on human behavior analysis using a nonlinear affective ranking model. Two research challenges of highlight ranking are addressed, namely affective feature extraction and ranking model construction. The basic principle of affective feature extraction in our work is to extract sensitive features which can stimulate user's emotion. Since the users pay most attention to player behavior and audience response in racket sport highlights, we extract affective features from player behavior including action and trajectory, and game-specific audio keywords. We propose a novel motion analysis method to recognize the player actions. We employ support vector regression to construct the nonlinear highlight ranking model from affective features. A new subjective evaluation criterion is proposed to guide the model construction. To evaluate the performance of the proposed approaches, we have tested them on more than ten-hour broadcast tennis and badminton videos. The experimental results demonstrate that our action recognition approach significantly outperforms the existing appearance-based method. Moreover, our user study shows that the affective highlight ranking approach is effective.  相似文献   

9.
视频标注是指利用语义索引信息标注视频内容,其目的是方便检索视频。现有视频标注工作使用的视觉底层特征,较难直接用来标注体育视频中的人体专业动作。针对此问题,使用视频图像序列中二维人体关节点特征,建立专业动作知识库来标注体育视频中的专业动作。采用动态规划算法比较视频之间的人体动作差异,并融入协同训练学习算法进行体育视频的半自动标注。以网球比赛视频为测试数据进行实验,结果表明,该算法的动作标注正确率达到81.4%,与现有算法的专业动作标注相比,提高了30.5%。  相似文献   

10.
In instructional videos of chalk board presentations, the visual content refers to the text and figures written on the boards. Existing methods on video summarization are not effective for this video domain because they are mainly based on low-level image features such as color and edges. In this work, we present a novel approach to summarizing the visual content in instructional videos using middle-level features. We first develop a robust algorithm to extract content text and figures from instructional videos by statistical modelling and clustering. This algorithm addresses the image noise, nonuniformity of the board regions, camera movements, occlusions, and other challenges in the instructional videos that are recorded in real classrooms. Using the extracted text and figures as the middle level features, we retrieve a set of key frames that contain most of the visual content. We further reduce content redundancy and build a mosaicked summary image by matching extracted content based on K-th Hausdorff distance and connected component decomposition. Performance evaluation on four full-length instructional videos shows that our algorithm is highly effective in summarizing instructional video content.  相似文献   

11.
Semantic annotation of soccer videos: automatic highlights identification   总被引:4,自引:0,他引:4  
Automatic semantic annotation of video streams allows both to extract significant clips for production logging and to index video streams for posterity logging. Automatic annotation for production logging is particularly demanding, as it is applied to non-edited video streams and must rely only on visual information. Moreover, annotation must be computed in quasi real-time. In this paper, we present a system that performs automatic annotation of the principal highlights in soccer video, suited for both production and posterity logging. The knowledge of the soccer domain is encoded into a set of finite state machines, each of which models a specific highlight. Highlight detection exploits visual cues that are estimated from the video stream, and particularly, ball motion, the currently framed playfield zone, players’ positions and colors of players’ uniforms. The highlight models are checked against the current observations, using a model checking algorithm. The system has been developed within the EU ASSAVID project.  相似文献   

12.
现有的体育视频分析方法大多都专注于重要事件的提取,而忽视了如何对这些事件进行组织和语义分析。本文提出了一种基于序列模式挖掘的田径视频镜头分类算法。本文主要围绕两个问题展开——特征提取和语义规则的定义。在特征提取阶段,自动的将田径视频镜头分割为一系列可识别的运动事件序列,然后使用机器学习的算法对每类行为事件进行识别。在语义规则定义阶段,使用序列模式挖掘方法发现其中的频繁序列,在此基础上进行。实验选用了上千段田径视频镜头进行测试,结果显示了本文算法进行田径视频镜头分类的有效性。  相似文献   

13.
Automatic content analysis of sports videos is a valuable and challenging task. Motivated by analogies between a class of sports videos and languages, the authors propose a novel approach for sports video analysis based on compiler principles. It integrates both semantic analysis and syntactic analysis to automatically create an index and a table of contents for a sports video. Each shot of the video sequence is first annotated and indexed with semantic labels through detection of events using domain knowledge. A grammar-based parser is then constructed to identify the tree structure of the video content based on the labels. Meanwhile, the grammar can be used to detect and recover errors during the analysis. As a case study, a sports video parsing system is presented in the particular domain of diving. Experimental results indicate the proposed approach is effective.  相似文献   

14.
谭洁  吴玲达  应龙 《计算机应用研究》2009,26(10):3960-3962
针对动画视频与新闻视频、体育运动视频的不同特点,提出了一种适合动画视频的 技术。首先通过对动画视频的结构分析得到动画视频的可视特征与层次结构;然后根据动画视频内容的重要度来选取视频中的重要片段;最后通过粒度选择,按照时序方式组合得到故事板和缩略视频形式的视频 。实验表明该方法能有效地获得动画视频 。  相似文献   

15.
Online platforms are frequently used as an alternative environment for individuals to meet and engage in a variety of activities, like attending courses online. We examined the effect of adding social presence cues in online video lectures and technological efficacy on college students’ perceived learning, class social presence, and perception that the videos aided learning. Participants rated their technological efficacy and completed an online class with video lectures that either included the video (image) of the instructor or not. The interaction between technological efficacy and video manipulation predicted lower ratings of perceived learning, social presence, and video usefulness, particularly for students with lower technological efficacy. A mediated-moderation analysis showed that, the interaction between person (efficacy) and media (instructor image in video vs. no image) predicted greater perceived learning through the mediators of perceived usefulness of videos, class interactivity, and felt comfort in the class.  相似文献   

16.
To increase the performance of sport team, the tactical analysis of team from game video is essential. Trajectories of the players are the most useful cues in a sport video for tactical analysis. In this paper, we propose a technique to reconstruct the trajectories of players from broadcast basketball videos. We first propose a mosaic based approach to detect the boundary lines of court. Then, the locations of players are determined by the integration of shape and color visual information. A layered graph is constructed for the detected players, which includes all possible trajectories. A dynamic programming based algorithm is applied to find the trajectory of each player. Finally, the trajectories of players are displayed on a standard basketball court model by a homography transformation. In contrast to related works, our approach exploits more spatio-temporal information in video. Experimental results show that the proposed approach works well and outperforms some existing technique.  相似文献   

17.
In the last half century the most used video storage devices have been the magnetic tapes, where the information are stored in analog format based on the electromagnetism principles. When the digital technique has become the most used, it was necessary to convert analog information in digital format in order to preserve these data. Unfortunately, analog videos may be affected by drops that produce some visual defect which could be acquired during the digitization process. Despite there are many hardware to perform the digitization, just few implement the automatic correction of these defects. In some cases, drop removal is possible through the analog device. However, when a damaged already-converted video is owned, a correction based on image processing technique is the unique way to enhance the videos. In this paper, the drop, also known as “Tracking Error” or “Mistracking,” is analyzed. We propose an algorithm to detect the drops’ visual artifacts in the converted videos, as well as a digital restoration method.  相似文献   

18.

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

  相似文献   

19.
Learning activities interactions between small groups is a key step in understanding team sports videos. Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete. For team sports videos such as volleyball and basketball videos, there are plenty of intra-team and inter-team relations. In this paper, a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos. To tackle this problem, a novel Hierarchical Relation Network is proposed. After all players in a video are finely divided into two teams, the feature of the two teams’ activities and interactions will be enhanced by Graph Convolutional Networks, which are finally recognized to generate Group Scene Graph. For evaluation, built on Volleyball dataset with additional 9660 team activity labels, a Volleyball+ dataset is proposed. A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method. Moreover, the idea of our method can be directly utilized in another video-based task, Group Activity Recognition. Experiments show the priority of our method and display the link between the two tasks. Finally, from the athlete’s view, we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’ activities and provide professional gaming suggestions.  相似文献   

20.
The dramatic growth of video content over modern media channels (such as the Internet and mobile phone platforms) directs the interest of media broadcasters towards the topics of video retrieval and content browsing. Several video retrieval systems benefit from the use of semantic indexing based on content, since it allows an intuitive categorization of videos. However, indexing is usually performed through manual annotation, thus introducing potential problems such as ambiguity, lack of information, and non-relevance of index terms. In this paper, we present SHIATSU, a complete system for video retrieval which is based on the (semi-)automatic hierarchical semantic annotation of videos exploiting the analysis of visual content; videos can then be searched by means of attached tags and/or visual features. We experimentally evaluate the performance of SHIATSU on two different real video benchmarks, proving its accuracy and efficiency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号