首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multimedia Tools and Applications - Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information,...  相似文献   

2.
Multimedia Tools and Applications - Text detection in video/images is challenging due to the presence of multiple blur caused by defocus and motion. In this paper, we present a new method for...  相似文献   

3.
Multimedia Tools and Applications - Video scene text contains valuable information for scene understanding, as scene text in video provides important semantic clues for human beings to sense the...  相似文献   

4.
针对视频中文本信息在视频序列和视频索引中的重要性,本文提出了一种基于文字混合特征的文本定位算法.该算法首先对视频序列中每隔25帧的单帧图像进行边缘检测和投影处理来提取文本块,然后用支持向量基进行筛选,排除非文本块的干扰,最后利用视频序列中相邻帧之间的相关性来搜索剩余帧中的文本块.本文的算法在提高检测速度的同时保证了较高的检测准确度.  相似文献   

5.
Text contained in scene images provides the semantic context of the images. For that reason, robust extraction of text regions is essential for successful scene text understanding. However, separating text pixels from scene images still remains as a challenging issue because of uncontrolled lighting conditions and complex backgrounds. In this paper, we propose a two-stage conditional random field (TCRF) approach to robustly extract text regions from the scene images. The proposed approach models the spatial and hierarchical structures of the scene text, and it finds text regions based on the scene text model. In the first stage, the system generates multiple character proposals for the given image by using multiple image segmentations and a local CRF model. In the second stage, the system selectively integrates the generated character proposals to determine proper character regions by using a holistic CRF model. Through the TCRF approach, we cast the scene text separation problem as a probabilistic labeling problem, which yields the optimal label configuration of pixels that maximizes the conditional probability of the given image. Experimental results indicate that our framework exhibits good performance in the case of the public databases.  相似文献   

6.
In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.  相似文献   

7.
8.
目的 目前基于卷积神经网络(CNN)的文本检测方法对自然场景中小尺度文本的定位非常困难。但自然场景图像中文本目标与其他目标存在很强的关联性,即自然场景中的文本通常伴随特定物体如广告牌、路牌等同时出现,基于此本文提出了一种顾及目标关联的级联CNN自然场景文本检测方法。方法 首先利用CNN检测文本目标及包含文本的关联物体目标,得到文本候选框及包含文本的关联物体候选框;再扩大包含文本的关联物体候选框区域,并从原始图像中裁剪,然后以该裁剪图像作为CNN的输入再精确检测文本候选框;最后采用非极大值抑制方法融合上述两步生成的文本候选框,得到文本检测结果。结果 本文方法能够有效地检测小尺度文本,在ICDAR-2013数据集上召回率、准确率和F值分别为0.817、0.880和0.847。结论 本文方法顾及自然场景中文本目标与包含文本的物体目标的强关联性,提高了自然场景图像中小尺度文本检测的召回率。  相似文献   

9.
A memory leak in a managed language occurs when the program inadvertently maintains references to objects that it no longer needs. Memory leaks cause systematic heap growth that degrade performance and can result in program crashes after perhaps days or weeks of execution. Prior approaches for detecting memory leaks rely on heap differencing or detailed object statistics which store state proportional to the number of objects in the heap. These overheads preclude their use on the same processor for deployed long‐running applications. This paper introduces Cork as a tool that accurately identifies heap growth caused by leaks. It is space efficient (adding less than 1% to the heap) and time efficient (adding 2.3% on average to total execution time). We implement this approach of examining and summarizing the class of live objects during garbage collection in a class points‐from graph (CPFG). Each node in the CPFG represents a class and edges between nodes represent references between objects of the specific classes. Cork annotates nodes and edges with the corresponding volume of live objects. Cork identifies growing data structures across multiple collections and computes a class slice to identify leaks for the user. We experiment with two functions for identifying growth and show that Cork is accurate: it identifies systematic heap growth with no false positives in 4 of 15 benchmarks we tested. Cork's slice report enabled us to quickly identify and eliminate growing data structures in large and unfamiliar programs, something their developers had not previously done. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
International Journal on Document Analysis and Recognition (IJDAR) - Recently scene text detection has become a hot research topic. Arbitrary-shaped text detection is more challenging due to the...  相似文献   

11.
Text recognition in natural scene images is a challenging task that has recently been garnering increased research attention. In this paper, we propose a method for recognizing text by utilizing the layout consistency of a text string. We estimate the layout (four lines of a text string) using initial character extraction and recognition result. On the basis of the layout consistency across a word, we perform character extraction and recognition again using four lines, which is more accurate than the first process. Our layout estimation method is different from previous methods in terms of exploiting character recognition results and its use of a class-conditional layout model. More accurate and robust estimation is achieved, and it can be used to refine character extraction and recognition. We call this two-way process—from extraction and recognition to layout, and from layout to extraction and recognition—“bidirectional” to discriminate it from previous feedback refinement approaches. Experimental results demonstrate that our bidirectional processes provide a boost to the performance of word recognition.  相似文献   

12.
13.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

14.
We consider some natural variations on the following classic pattern-matching problem: given an NFA M over the alphabet Σ and a pattern p over some alphabet Δ, does there exist a word xL(M) such that x matches p? We consider the restricted problem where M only accepts a finite language. We also consider the variation where only some factor of x is required to match the pattern p. We show that both of these problems are NP-complete. We also consider the same problems for context-free grammars; in this case the problems become PSPACE-complete.  相似文献   

15.
Multimedia Tools and Applications - Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and...  相似文献   

16.
17.
针对现有优秀的anchor-free文本检测方法只挖掘了文本框几何特性而没有考虑文本框位置特性且缺乏有效的过滤机制,提出了挖掘文本框位置特性的anchor-free自然场景文本检测方法.该方法以ResNet50作为卷积神经网络的主干网络,将多个不同尺寸的特征层融合后预测文本框的几何特性和位置特性,最后辅之以二层过滤机制得到最终的检测文本框.在公开的数据集ICDAR2013和ICDAR2011上F值分别达到了0.870和0.861,证明了该方法的有效性.  相似文献   

18.
We consider a variant of Gold’s learning paradigm where a learner receives as input nn different languages (in the form of one text where all input languages are interleaved). Our goal is to explore the situation when a more “coarse” classification of input languages is possible, whereas more refined classification is not. More specifically, we answer the following question: under which conditions, a learner, being fed nn different languages, can produce mm grammars covering all input languages, but cannot produce kk grammars covering input languages for any k>mk>m. We also consider a variant of this task, where each of the output grammars may not cover more than rr input languages. Our main results indicate that the major factor affecting classification capabilities is the difference n−mnm between the number nn of input languages and the number mm of output grammars. We also explore the relationship between classification capabilities for smaller and larger groups of input languages. For the variant of our model with the upper bound on the number of languages allowed to be represented by one output grammar, for classes consisting of disjoint languages, we found complete picture of relationship between classification capabilities for different parameters nn (the number of input languages), mm (number of output grammars), and rr (bound on the number of languages represented by each output grammar). This picture includes a combinatorial characterization of classification capabilities for the parameters n,m,rn,m,r of certain types.  相似文献   

19.
Artificial Intelligence Review - Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing...  相似文献   

20.
Pattern Analysis and Applications - The aim of this article is twofold. First, we propose an effective methodology for binarization of scene images. For our present study, we use the publicly...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号