首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We propose an efficient real-time automatic license plate recognition (ALPR) framework, particularly designed to work on CCTV video footage obtained from cameras that are not dedicated to the use in ALPR. At present, in license plate detection, tracking and recognition are reasonably well-tackled problems with many successful commercial solutions being available. However, the existing ALPR algorithms are based on the assumption that the input video will be obtained via a dedicated, high-resolution, high-speed camera and is/or supported by a controlled capture environment, with appropriate camera height, focus, exposure/shutter speed and lighting settings. However, typical video forensic applications may require searching for a vehicle having a particular number plate on noisy CCTV video footage obtained via non-dedicated, medium-to-low resolution cameras, working under poor illumination conditions. ALPR in such video content faces severe challenges in license plate localization, tracking and recognition stages. This paper proposes a novel approach for efficient localization of license plates in video sequence and the use of a revised version of an existing technique for tracking and recognition. A special feature of the proposed approach is that it is intelligent enough to automatically adjust for varying camera distances and diverse lighting conditions, a requirement for a video forensic tool that may operate on videos obtained by a diverse set of unspecified, distributed CCTV cameras.  相似文献   

2.
This paper presents an unified approach in analyzing and structuring the content of videotaped lectures for distance learning applications. By structuring lecture videos, we can support topic indexing and semantic querying of multimedia documents captured in the traditional classrooms. Our goal in this paper is to automatically construct the cross references of lecture videos and textual documents so as to facilitate the synchronized browsing and presentation of multimedia information. The major issues involved in our approach are topical event detection, video text analysis and the matching of slide shots and external documents. In topical event detection, a novel transition detector is proposed to rapidly locate the slide shot boundaries by computing the changes of text and background regions in videos. For each detected topical event, multiple keyframes are extracted for video text detection, super-resolution reconstruction, binarization and recognition. A new approach for the reconstruction of high-resolution textboxes based on linear interpolation and multi-frame integration is also proposed for the effective binarization and recognition. The recognized characters are utilized to match the video slide shots and external documents based on our proposed title and content similarity measures.  相似文献   

3.
Gesture plays an important role for recognizing lecture activities in video content analysis. In this paper, we propose a real-time gesture detection algorithm by integrating cues from visual, speech and electronic slides. In contrast to the conventional “complete gesture” recognition, we emphasize detection by the prediction from “incomplete gesture”. Specifically, intentional gestures are predicted by the modified hidden Markov model (HMM) which can recognize incomplete gestures before the whole gesture paths are observed. The multimodal correspondence between speech and gesture is exploited to increase the accuracy and responsiveness of gesture detection. In lecture presentation, this algorithm enables the on-the-fly editing of lecture slides by simulating appropriate camera motion to highlight the intention and flow of lecturing. We develop a real-time application, namely simulated smartboard, and demonstrate the feasibility of our prediction algorithm using hand gesture and laser pen with simple setup without involving expensive hardware.   相似文献   

4.
During the last decade, many natural interaction methods between human and computer have been introduced. They were developed for substitutions of keyboard and mouse devices so that they provide convenient interfaces. Recently, many studies on vision based gestural control methods for Human-Computer Interaction (HCI) have been attracted attention because of their convenience and simpleness. Two of the key issues in these kinds of interfaces are robustness and real-time processing. This paper presents a hand gesture based virtual mouse interface and Two-layer Bayesian Network (TBN) for robust hand gesture recognition in real-time. The TBN provides an efficient framework to infer hand postures and gestures not only from information at the current time frame, but also from the preceding and following information, so that it compensates for erroneous postures and its locations under cluttered background environment. Experiments demonstrated that the proposed model recognized hand gestures with a recognition rate of 93.76 % and 85.15 % on simple and cluttered background video data, respectively, and outperformed previous methods: Hidden Markov Model (HMM), Finite State Machine (FSM).  相似文献   

5.
为了提高实际复杂场景的人机交互中动态手势识别的准确性和实时性,提出了一种时序局部敏感直方图(Temporal Locality Sensitive Histograms of Oriented Gradients,TLSHOG)特征新方法,用于描述手势运动的时序变化和空间姿态,实现了快速而精确的动态手势识别。采用普通网络摄像头获取手部的二维图像序列作为训练样本,然后构造单帧图像特征描述手部的空间姿态,并结合时间金字塔(Temporal Pyramid,TP)来描述手势运动轨迹的时空特征,运用多维支持向量机(Support Vector Machine,SVM)算法进行模型训练,对测试样本中的多种手势进行精确的分类。实验结果表明,该方法准确度高,实时性好,对于复杂背景干扰、光照强度变化有较强的鲁棒性。  相似文献   

6.
This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques include named entity recognition, person entity extraction, coreference resolution, and semantic event extraction. Apart from the information extraction components, the proposed system also encompasses modules for news story segmentation, text extraction, and video retrieval along with a news video database to make it a full-fledged system to be employed in practical settings. The proposed system is a generic one employing a wide range of techniques to automate the semantic video indexing process and to bridge the semantic gap between what can be automatically extracted from videos and what people perceive as the video semantics. Based on the proposed system, a novel automatic semantic annotation and retrieval system is built for Turkish and evaluated on a broadcast news video collection, providing evidence for its feasibility and convenience for news videos with a satisfactory overall performance.  相似文献   

7.
The sign language is a method of communication for the deaf-mute. Articulated gestures and postures of hands and fingers are commonly used for the sign language. This paper presents a system which recognizes the Korean sign language (KSL) and translates into a normal Korean text. A pair of data-gloves are used as the sensing device for detecting motions of hands and fingers. For efficient recognition of gestures and postures, a technique of efficient classification of motions is proposed and a fuzzy min-max neural network is adopted for on-line pattern recognition.  相似文献   

8.
Locating content in existing video archives is both a time and bandwidth consuming process since users might have to download and manually watch large portions of superfluous videos. In this paper, we present two novel prototypes using an Internet based video composition and streaming system with a keyword-based search interface that collects, converts, analyses, indexes, and ranks video content. At user requests, the system can automatically sequence out portions of single videos or aggregate content from multiple videos to produce a single, personalized video stream on-the-fly.  相似文献   

9.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

10.
图像和视频中的文字获取技术   总被引:6,自引:0,他引:6       下载免费PDF全文
许多图像都包含丰富的文字信息,如用作网页设计的以图像形式存在的标语和视频图像中的字幕。这些文字的自动检测、分割、提取和识别,对图像高层语义内容的自动理解、索引和检索非常有价值,因此引起国内外众多学者的研究兴趣。为使人们对该领域有一个系统的了解,并使该领域研究人员有所借鉴,在对目前国内外图像和视频中文字获取技术相关文献综合理解的基础上,综述了该领域的发展现状,同时从文字检测、抽取和文字识别两个方面,重点讨论了其主要的技术方法及应用优缺点,并结合当前面临的问题,指出今后可进一步研究的方向。  相似文献   

11.
Due to the prevalence of digital video camcorders, home videos have become an important part of life-logs of personal experiences. To enable efficient video parsing, a critical step is to automatically extract objects, events and scene characteristics present in videos. This paper addresses the problem of extracting objects from home videos. Automatic detection of objects is a classical yet difficult vision problem, particularly for videos with complex scenes and unrestricted domains. Compared with edited and surveillant videos, home videos captured in uncontrolled environment are usually coupled with several notable features such as shaking artifacts, irregular motions, and arbitrary settings. These characteristics have actually prohibited the effective parsing of semantic video content using conventional vision analysis. In this paper, we propose a new approach to automatically locate multiple objects in home videos, by taking into account of how and when to initialize objects. Previous approaches mostly consider the problem of how but not when due to the efficiency or real-time requirements. In home-video indexing, online processing is optional. By considering when, some difficult problems can be alleviated, and most importantly, enlightens the possibility of parsing semantic video objects. In our proposed approach, the how part is formulated as an object detection and association problem, while the when part is a saliency measurement to determine the best few locations to start multiple object initialization  相似文献   

12.

This paper proposes a novel approach for recognizing faces in videos with high recognition rate. Initially, the feature vector based on Normalized Local Binary Patterns is obtained for the face region. A set of training and testing videos are used in this face recognition procedure. Each frame in the query video is matched with the signature of the faces in the database using Euclidean distance and a rank list is formed. Each ranked list is clustered and its reliability is analyzed for re-ranking. Multiple re-ranked lists of the query video is fused together to form a video signature. This video signature embeds diverse intra-personal variations such as poses, expressions and facilitates in matching two videos with large variations. For matching two videos, their composite ranked lists are compared using a Kendall Tau distance measure. The developed methods are deployed on the YouTube and ChokePoint videos, and they exhibit significant performance improvement owing to their novel approach when compared with the existing techniques.

  相似文献   

13.
A well-produced video always creates a strong impression on the viewer. However, due to the limitations of the camera, the ambient conditions or the skills of the videographer, the quality of captured videos sometimes falls short of one's expectations. On the other hand, we have a vast amount of superbly captured videos available on the web and in digital libraries. In this paper, we propose the novel approach of video analogies that provides a powerful ability to improve the quality of a video by borrowing features from a higher quality video. We want to improve the given target video in order to obtain a higher quality output video. During the matching phase, we find the correspondence between the pair by using feature matching. Then for the target video, we utilize this correspondence to transfer some desired traits of the source video into the target video in order to obtain a new video. Thus, the new video will obtain the desired features from the source video while retaining the merits of the target video. The video analogies technique provides an intuitive mechanism for automatic editing of videos. We demonstrate the utility of the analogies method by considering three applications – colorizing videos, reducing video blurs, and video rhythm adjustment. We describe each application in detail and provide experimental results to establish the efficacy of the proposed approach.  相似文献   

14.
Most of the studies establishing factors affecting digital text and multimedia comprehension have been conducted in controlled conditions. The present study sought to test and extend the modality and seductive details effects, and the role of verbal ability and working memory capacity, to a remote, self-paced, E-learning scenario. Two hundred and thirteen first-year undergraduates read or watched videos about scientific expository content in three formats: digital text (written expository texts, navigated in seven screens), presentation video (audio explanation, with written keywords), and presentation video with dynamic decorative images (audio explanation, written keywords, and dynamic decorative and irrelevant images). In a face-to-face session, they completed working memory and verbal ability tests. Comprehension performance was similar for the three conditions. For the multimedia videos with dynamic decorative irrelevant images, comprehension depended on working memory capacity. Verbal ability was relevant for both expository text and videos.  相似文献   

15.

This work introduces a novel approach to extract meaningful content information from video by collaborative integration of image understanding and natural language processing. We developed a person browser system that associates faces and overlaid name texts in videos. This approach takes news videos as a knowledge source, then automatically extracts face and assoicated name text as content information. The proposed framework consists of the text detection module, the face detection module, and the person indexing database module. The successful results of person extraction reveal that the proposed methodology of integrated use of image understanding techniques and natural language processing technique is headed in the right direction to achieve our goal of accessing real content of multimedia information.

  相似文献   

16.
Video in digital format is now commonplace and widespread in both professional use, and in domestic consumer products from camcorders to mobile phones. Video content is growing in volume and while we can capture, compress, store, transmit and display video with great facility, editing videos and manipulating them based on their content is still a non-trivial activity. In this paper, we give a brief review of the state of the art of video analysis, indexing and retrieval and we point to research directions which we think are promising and could make searching and browsing of video archives based on video content, as easy as searching and browsing (text) web pages. We conclude the paper with a list of grand challenges for researchers working in the area.  相似文献   

17.
A novel approach is proposed for the recognition of moving hand gestures based on the representation of hand motions as contour-based similarity images (CBSIs). The CBSI was constructed by calculating the similarity between hand contours in different frames. The input CBSI was then matched with CBSIs in the database to recognize the hand gesture. The proposed continuous hand gesture recognition algorithm can simultaneously divide the continuous gestures into disjointed gestures and recognize them. No restrictive assumptions were considered for the motion of the hand between the disjointed gestures. The proposed algorithm was tested using hand gestures from American Sign Language and the results showed a recognition rate of 91.3% for disjointed gestures and 90.4% for continuous gestures. The experimental results illustrate the efficiency of the algorithm for noisy videos.  相似文献   

18.
19.
视频和图像中的文本通常在基于内容的视频数据库检索、网络视频搜索,图像分割和图像修复等中起到重要作用,为了提高文本检测的效率,给出了一种基于多种特征自适应阈值的视频文本检测方法.方法是在Michael算法的基础上,利用文本边缘的强度,密度,水平竖直边缘比3个特征计算自适应局部阈值,用阈值能较好去除非文本区域,提取文本边缘,检测并定位文本,减少了Michael算法单一特征阈值的不利影响.在文本定位阶段引入了合并机制.减少了不完整区域的出现.实验结果表明有较高的精度和召回率,可用于视频搜索、图像分割和图像修复等.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号