首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper describes a novel framework for automatic lecture video editing by gesture, posture, and video text recognition. In content analysis, the trajectory of hand movement is tracked and the intentional gestures are automatically extracted for recognition. In addition, head pose is estimated through overcoming the difficulties due to the complex lighting conditions in classrooms. The aim of recognition is to characterize the flow of lecturing with a series of regional focuses depicted by human postures and gestures. The regions of interest (ROIs) in videos are semantically structured with text recognition and the aid of external documents. By tracing the flow of lecturing, a finite state machine (FSM) which incorporates the gestures, postures, ROIs, general editing rules and constraints, is proposed to edit videos with novel views. The FSM is designed to generate appropriate simulated camera motion and cutting effects that suit the pace of a presenter's gestures and postures. To remedy the undesirable visual effects due to poor lighting conditions, we also propose approaches to automatically enhance the visibility and readability of slides and whiteboard images in the edited videos  相似文献   

2.
3.
4.
Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.  相似文献   

5.

This work introduces a novel approach to extract meaningful content information from video by collaborative integration of image understanding and natural language processing. We developed a person browser system that associates faces and overlaid name texts in videos. This approach takes news videos as a knowledge source, then automatically extracts face and assoicated name text as content information. The proposed framework consists of the text detection module, the face detection module, and the person indexing database module. The successful results of person extraction reveal that the proposed methodology of integrated use of image understanding techniques and natural language processing technique is headed in the right direction to achieve our goal of accessing real content of multimedia information.

  相似文献   

6.
目的 足球视频镜头和球场区域是足球视频事件检测的必要条件,对于足球视频语义分析具有重要作用。针对现有镜头分类方法的不足,提出识别足球视频镜头类型的波动检测法。方法 该方法使用一个滑动窗口在视频帧图像中滑动,记录滑动窗口内球场像素比例在远镜头阈值上下的波动次数,根据波动次数判断镜头类型。对于足球场地区域分类,提出使用视频图像中球场区域的左上角和右上角点的位置关系识别球场区域类型的方法,该方法使用高斯混合模型识别出球场,根据球场在帧图像中左右边界坐标的高低判断球场区域类型,方法简单高效。结果 本文提出的两种方法与现有的分类方法相比,在准确率和召回率方面具有较大提高,检测效率高,可以满足实时性要求。结论 本文方法解决了传统滑动窗口法无法正确识别球场倾斜角度过大的帧图像,降低了传统球场区域检测方法依赖球场线检测而导致的准确率不高的问题。  相似文献   

7.
Text embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based optical character recognition (OCR) systems that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods.  相似文献   

8.
视频和图像中的文本通常在基于内容的视频数据库检索、网络视频搜索,图像分割和图像修复等中起到重要作用,为了提高文本检测的效率,给出了一种基于多种特征自适应阈值的视频文本检测方法.方法是在Michael算法的基础上,利用文本边缘的强度,密度,水平竖直边缘比3个特征计算自适应局部阈值,用阈值能较好去除非文本区域,提取文本边缘,检测并定位文本,减少了Michael算法单一特征阈值的不利影响.在文本定位阶段引入了合并机制.减少了不完整区域的出现.实验结果表明有较高的精度和召回率,可用于视频搜索、图像分割和图像修复等.  相似文献   

9.
随着网络和多媒体技术的不断发展,基于内容的多媒体信息检索技术变得越来越重要.同成熟的文本检索技术相比,视频检索还处在研究和探索阶段.视频检索的一个有效方法是将无结构的视频节目进行镜头分割,根据每个镜头的关键帧对视频建立索引.因此,镜头分割是基于内容的视频检索的基本步骤,在各种类型的镜头检测算法中,叠化镜头是很难检测的.根据叠化(dissolve)镜头内部预测帧预测误差能量和运动矢量分布特点,提出一种在压缩域中分割叠化镜头的新算法.与公开发表的同类算法相比,它具有以下优点:工作在压缩域上、速度快、鲁棒性好、精度更高.  相似文献   

10.
The increased availability and usage of multimedia information have created a critical need for efficient multimedia processing algorithms. These algorithms must offer capabilities related to browsing, indexing, and retrieval of relevant data. A crucial step in multimedia processing is that of reliable video segmentation into visually coherent video shots through scene change detection. Video segmentation enables subsequent processing operations on video shots, such as video indexing, semantic representation, or tracking of selected video information. Since video sequences generally contain both abrupt and gradual scene changes, video segmentation algorithms must be able to detect a large variety of changes. While existing algorithms perform relatively well for detecting abrupt transitions (video cuts), reliable detection of gradual changes is much more difficult. A novel one-pass, real-time approach to video scene change detection based on statistical sequential analysis and operating on a compressed multimedia bitstream is proposed. Our approach models video sequences as stochastic processes, with scene changes being reflected by changes in the characteristics (parameters) of the process. Statistical sequential analysis is used to provide an unified framework for the detection of both abrupt and gradual scene changes.  相似文献   

11.
Video lectures are an old distance learning approach that offers only basic interaction and retrieval features to the user. Thus, to follow the new learning paradigms, we need to re‐engineer the e‐learning processes while preserving the investments made in the past. In this paper we present an approach for migrating video lectures to multimedia learning objects. Two essential problems are tackled: the detection of slide transitions and the generation of the learning objects. To this aim, the video of the lecture is scanned to detect the slide changes, while the learning object metadata and the slide pictures are extracted from the presentation document. A tool named VLMigrator (video lecture migrator) has been developed to support the migration of video lectures and the restructuring of their contents in terms of learning objects. Both the migration strategy and the tool have been experimented in a case study. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

12.
为了对教学视频这一专门类别视频进行自动标注,本文首先提取视频中的字幕信息,通过文本预处理后,使用视频中的字幕文本信息内容结合潜在狄利克雷分布(Latent Dirichlet allocation,LDA)主题模型方法获得视频镜头在主题上的概率分布,通过计算主题概率分布差异,进行语义层面镜头分割。然后以镜头为样本,使用安全的半监督支持向量机(Safe semi-supervised support vector machine,S4VM)方法,通过少量的标注镜头样本,完成对未标注镜头的自动标注。实验结果表明,本文方法利用字幕文本信息和LDA模型,有效完成了视频的语义镜头分割,不仅可以对镜头完成标注,而且可以对整个视频进行关键词标注。  相似文献   

13.
In this paper, a subspace-based multimedia data mining framework is proposed for video semantic analysis, specifically video event/concept detection, by addressing two basic issues, i.e., semantic gap and rare event/concept detection. The proposed framework achieves full automation via multimodal content analysis and intelligent integration of distance-based and rule-based data mining techniques. The content analysis process facilitates the comprehensive video analysis by extracting low-level and middle-level features from audio/visual channels. The integrated data mining techniques effectively address these two basic issues by alleviating the class imbalance issue along the process and by reconstructing and refining the feature dimension automatically. The promising experimental performance on goal/corner event detection and sports/commercials/building concepts extraction from soccer videos and TRECVID news collections demonstrates the effectiveness of the proposed framework. Furthermore, its unique domain-free characteristic indicates the great potential of extending the proposed multimedia data mining framework to a wide range of different application domains.  相似文献   

14.
15.
This paper presents a contextual video advertising system, called AdOn, which supports intelligent overlay in-video advertising. Unlike most current ad-networks such as Youtube that overlay the ads at fixed locations in the videos (e.g., on the bottom fifth of videos 15 s in), AdOn is able to automatically detect a set of spatio-temporal non-intrusive locations and associate the contextually relevant ads with these locations. The overlay ad locations are obtained on the basis of video structuring, face and text detection, as well as visual saliency analysis, so that the intrusiveness to the users can be minimized. The ads are selected according to content-based multimodal relevance so that the relevance can be maximized. AdOn represents one of the first attempts towards contextual overlay video advertising by leveraging information retrieval and multimedia content analysis techniques. The experiments conducted on a video database with more than 100 video programs and 7,000 ad products indicated that AdOn is superior to existing advertising approaches in terms of ad relevance and user experience.  相似文献   

16.
Most existing approaches on sports video analysis have concentrated on semantic event detection. Sports professionals, however, are more interested in tactic analysis to help improve their performance. In this paper, we propose a novel approach to extract tactic information from the attack events in broadcast soccer video and present the events in a tactic mode to the coaches and sports professionals. We extract the attack events with far-view shots using the analysis and alignment of web-casting text and broadcast video. For a detected event, two tactic representations, aggregate trajectory and play region sequence, are constructed based on multi-object trajectories and field locations in the event shots. Based on the multi-object trajectories tracked in the shot, a weighted graph is constructed via the analysis of temporal-spatial interaction among the players and the ball. Using the Viterbi algorithm, the aggregate trajectory is computed based on the weighted graph. The play region sequence is obtained using the identification of the active field locations in the event based on line detection and competition network. The interactive relationship of aggregate trajectory with the information of play region and the hypothesis testing for trajectory temporal-spatial distribution are employed to discover the tactic patterns in a hierarchical coarse-to-fine framework. Extensive experiments on FIFA World Cup 2006 show that the proposed approach is highly effective.  相似文献   

17.
While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.  相似文献   

18.
Web video categorization is a fundamental task for web video search. In this paper, we explore web video categorization from a new perspective, by integrating the model-based and data-driven approaches to boost the performance. The boosting comes from two aspects: one is the performance improvement for text classifiers through query expansion from related videos and user videos. The model-based classifiers are built based on the text features extracted from title and tags. Related videos and user videos act as external resources for compensating the shortcoming of the limited and noisy text features. Query expansion is adopted to reinforce the classification performance of text features through related videos and user videos. The other improvement is derived from the integration of model-based classification and data-driven majority voting from related videos and user videos. From the data-driven viewpoint, related videos and user videos are treated as sources for majority voting from the perspective of video relevance and user interest, respectively. Semantic meaning from text, video relevance from related videos, and user interest induced from user videos, are combined to robustly determine the video category. Their combination from semantics, relevance and interest further improves the performance of web video categorization. Experiments on YouTube videos demonstrate the significant improvement of the proposed approach compared to the traditional text based classifiers.  相似文献   

19.
20.
Sports video annotation is important for sports video semantic analysis such as event detection and personalization. In this paper, we propose a novel approach for sports video semantic annotation and personalized retrieval. Different from the state of the art sports video analysis methods which heavily rely on audio/visual features, the proposed approach incorporates web-casting text into sports video analysis. Compared with previous approaches, the contributions of our approach include the following. 1) The event detection accuracy is significantly improved due to the incorporation of web-casting text analysis. 2) The proposed approach is able to detect exact event boundary and extract event semantics that are very difficult or impossible to be handled by previous approaches. 3) The proposed method is able to create personalized summary from both general and specific point of view related to particular game, event, player or team according to user's preference. We present the framework of our approach and details of text analysis, video analysis, text/video alignment, and personalized retrieval. The experimental results on event boundary detection in sports video are encouraging and comparable to the manually selected events. The evaluation on personalized retrieval is effective in helping meet users' expectations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号