共查询到20条相似文献,搜索用时 15 毫秒
1.
Hideaki Goto Hirotomo Aso 《International Journal on Document Analysis and Recognition》1999,2(2-3):111-119
In order to enhance the ability of document analysis systems, we need a text line extraction method which can handle not
only straight text lines but also text lines in various shapes. This paper proposes a new method called Extended Linear Segment
Linking (ELSL for short), which is able to extract text lines in arbitrary orientations and curved text lines. We also consider
the existence of both horizontally and vertically printed text lines on the same page. The new method can produce text line
candidates for multiple orientations. We verify the ability of the method by some experiments as well.
Received December 21, 1998 / Revised version September 2, 1999 相似文献
2.
Carlos Merino-Gracia Majid Mirmehdi José Sigut José L. González-Mora 《Image and vision computing》2013
Cheap, ubiquitous, high-resolution digital cameras have led to opportunities that demand camera-based text understanding, such as wearable computing or assistive technology. Perspective distortion is one of the main challenges for text recognition in camera captured images since the camera may often not have a fronto-parallel view of the text. We present a method for perspective recovery of text in natural scenes, where text can appear as isolated words, short sentences or small paragraphs (as found on posters, billboards, shop and street signs etc.). It relies on the geometry of the characters themselves to estimate a rectifying homography for every line of text, irrespective of the view of the text over a large range of orientations. The horizontal perspective foreshortening is corrected by fitting two lines to the top and bottom of the text, while the vertical perspective foreshortening and shearing are estimated by performing a linear regression on the shear variation of the individual characters within the text line. The proposed method is efficient and fast. We present comparative results with improved recognition accuracy against the current state-of-the-art. 相似文献
3.
David Crandall Sameer Antani Rangachar Kasturi 《International Journal on Document Analysis and Recognition》2003,5(2-3):138-157
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically
index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's
semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary
color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about
the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video.
In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However,
text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once
in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial
problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television
programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing,
and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared
with existing work found in the literature.
Received: January 29, 2002 / Accepted: September 13, 2002
D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com
S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov
Correspondence to: David Crandall 相似文献
4.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
5.
6.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
7.
8.
Radiosity for scenes with many mirror reflections 总被引:1,自引:0,他引:1
9.
Dot-matrix text recognition is a difficult problem, especially when characters are broken into several disconnected components.
We present a dot-matrix text recognition system which uses the fact that dot-matrix fonts are fixed-pitch, in order to overcome
the difficulty of the segmentation process. After finding the most likely pitch of the text, a decision is made as to whether
the text is written in a fixed-pitch or proportional font. Fixed-pitch text is segmented using a pitch-based segmentation
process that can successfully segment both touching and broken characters. We report performance results for the pitch estimation,
fixed-pitch decision and segmentation, and recognition processes.
Received October 18, 1999 / Revised April 21, 2000 相似文献
10.
11.
Gregory K. Myers Robert C. Bolles Quang-Tuan Luong James A. Herson Hrishikesh B. Aradhye 《International Journal on Document Analysis and Recognition》2005,7(2-3):147-158
Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of existing perspective rectification methods that were primarily designed for images of document pages. We propose an approach that reliably rectifies and subsequently recognizes individual lines of text. Our system, which includes novel algorithms for extraction of text from real-world scenery, perspective rectification, and binarization, has been rigorously tested on still imagery as well as on MPEG-2 video clips in real time.Received: 15 December 2003, Published online: 14 December 2004Gregory K. Myers: Correspondence to 相似文献
12.
In this paper, we discuss an appearance-matching approach to the difficult problem of interpreting color scenes containing
occluded objects. We have explored the use of an iterative, coarse-to-fine sum-squared-error method that uses information
from hypothesized occlusion events to perform run-time modification of scene-to-template similarity measures. These adjustments
are performed by using a binary mask to adaptively exclude regions of the template image from the squared-error computation.
At each iteration higher resolution scene data as well as information derived from the occluding interactions between multiple
object hypotheses are used to adjust these masks. We present results which demonstrate that such a technique is reasonably
robust over a large database of color test scenes containing objects at a variety of scales, and tolerates minor 3D object
rotations and global illumination variations.
Received: 21 November 1996 / Accepted: 14 October 1997 相似文献
13.
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given
page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in
a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary
text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the
most likely candidates for those words, using only widths of the word images. The identity of each word is determined using
a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using
a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page
images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on
90% of all the pages. These can serve as useful seeds to bootstrap font learning.
Received October 8, 1999 / Revised March 29, 2000 相似文献
14.
Kazem Taghva Eric Stofsky 《International Journal on Document Analysis and Recognition》2001,3(3):125-137
In this paper, we describe a spelling correction system designed specifically for OCR-generated text that selects candidate
words through the use of information gathered from multiple knowledge sources. This system for text correction is based on
static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system
incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of
the new system is presented as well.
Received August 16, 2000 / Revised October 6, 2000 相似文献
15.
Markus Junker Rainer Hoch 《International Journal on Document Analysis and Recognition》1998,1(2):116-122
In the literature, many feature types are proposed for document classification. However, an extensive and systematic evaluation
of the various approaches has not yet been done. In particular, evaluations on OCR documents are very rare. In this paper
we investigate seven text representations based on n-grams and single words. We compare their effectiveness in classifying OCR texts and the corresponding correct ASCII texts
in two domains: business letters and abstracts of technical reports. Our results indicate that the use of n-grams is an attractive technique which can even compare to techniques relying on a morphological analysis. This holds for
OCR texts as well as for correct ASCII texts.
Received February 17, 1998 / Revised April 8, 1998 相似文献
16.
Gyeonghwan Kim Venu Govindaraju Sargur N. Srihari 《International Journal on Document Analysis and Recognition》1999,2(1):37-44
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system
are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation
of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line
detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word
gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition,
concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic
constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed
for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are
described in this paper. Preliminary experiments show promising results in terms of speed and accuracy.
Received October 30, 1998 / Revised January 15, 1999 相似文献
17.
Analysis of locking behavior in three real database systems 总被引:1,自引:0,他引:1
Vigyan Singhal Alan Jay Smith 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(1):40-52
Concurrency control is essential to the correct functioning of a database due to the need for correct, reproducible results.
For this reason, and because concurrency control is a well-formulated problem, there has developed an enormous body of literature
studying the performance of concurrency control algorithms. Most of this literature uses either analytic modeling or random
number-driven simulation, and explicitly or implicitly makes certain assumptions about the behavior of transactions and the
patterns by which they set and unset locks. Because of the difficulty of collecting suitable measurements, there have been
only a few studies which use trace-driven simulation, and still less study directed toward the characterization of concurrency
control behavior of real workloads. In this paper, we present a study of three database workloads, all taken from IBM DB2
relational database systems running commercial applications in a production environment. This study considers topics such
as frequency of locking and unlocking, deadlock and blocking, duration of locks, types of locks, correlations between applications
of lock types, two-phase versus non-two-phase locking, when locks are held and released, etc. In each case, we evaluate the
behavior of the workload relative to the assumptions commonly made in the research literature and discuss the extent to which
those assumptions may or may not lead to erroneous conclusions.
Edited by H. Garcia-Molina. Received April 5, 1994 / Accepted November 1, 1995 相似文献
18.
Published online: 19 July 2001 相似文献
19.
E. Kavallieratou N. Fakotakis G. Kokkinakis 《International Journal on Document Analysis and Recognition》2002,4(4):226-242
In this paper, an integrated offline recognition system for unconstrained handwriting is presented. The proposed system consists
of seven main modules: skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, slant
removing, word segmentation, and character segmentation and recognition, stemming from the implementation of already existing
algorithms as well as novel algorithms. This system has been tested on the NIST, IAM-DB, and GRUHD databases and has achieved
accuracy that varies from 65.6% to 100% depending on the database and the experiment. 相似文献
20.
Published online: 15 March 2002 相似文献