共查询到20条相似文献,搜索用时 15 毫秒
1.
An architecture for handwritten text recognition systems 总被引:1,自引:1,他引:0
Gyeonghwan Kim Venu Govindaraju Sargur N. Srihari 《International Journal on Document Analysis and Recognition》1999,2(1):37-44
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system
are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation
of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line
detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word
gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition,
concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic
constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed
for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are
described in this paper. Preliminary experiments show promising results in terms of speed and accuracy.
Received October 30, 1998 / Revised January 15, 1999 相似文献
2.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content
of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images
into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure
of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is
reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font,
face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the
performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate
lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection
of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection
and specificity of the lexicon.
Received May 1, 1998 / Revised October 20, 1998 相似文献
3.
B.B. Chaudhuri U. Garain 《International Journal on Document Analysis and Recognition》2001,3(3):138-149
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered.
It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital
style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these
styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination
of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized
text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i)
different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach
on a large number of good quality, as well as degraded, document images are presented.
Received July 12, 2000 / Revised October 1, 2000 相似文献
4.
Paul Clark Majid Mirmehdi 《International Journal on Document Analysis and Recognition》2002,4(4):243-257
We present two different approaches to the location and recovery of text in images of real scenes. The techniques we describe
are invariant to the scale and 3D orientation of the text, and allow recovery of text in cluttered scenes. The first approach
uses page edges and other rectangular boundaries around text to locate a surface containing text, and to recover a fronto-parallel
view. This is performed using line detection, perceptual grouping, and comparison of potential text regions using a confidence
measure. The second approach uses low-level texture measures with a neural network classifier to locate regions of text in
an image. Then we recover a fronto-parallel view of each located paragraph of text by separating the individual lines of text
and determining the vanishing points of the text plane. We illustrate our results using a number of images.
Received May 20, 2001 / Accepted June 19, 2001 相似文献
5.
Klaus U. Schulz Stoyan Mihov 《International Journal on Document Analysis and Recognition》2002,5(1):67-85
The Levenshtein distance between two words is the minimal number of insertions, deletions or substitutions that are needed
to transform one word into the other. Levenshtein automata of degree n for a word W are defined as finite state automata that recognize the set of all words V where the Levenshtein distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W, a deterministic Levenshtein automaton of degree n for W in time linear to the length of W. Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein automaton
for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text
using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein
automata and leads to even improved efficiency. Evaluation results are given that also address variants of both methods that
are based on modified Levenshtein distances where further primitive edit operations (transpositions, merges and splits) are
used.
Received: 13 February 2002 / Accepted: 13 March 2002 相似文献
6.
V. Bansal R.M.K. Sinha 《International Journal on Document Analysis and Recognition》2002,4(4):269-280
Abstract. This paper describes a method for the correction of optically read Devanagari character strings using a Hindi word dictionary.
The word dictionary is partitioned in order to reduce the search space besides preventing forced matching to an incorrect
word. The dictionary partitioning strategy takes into account the underlying OCR process. The dictionary words at the top
level have been divided into two partitions, namely: a short-words partition and the remaining words partition. The short-word partition is sub-partitioned using the envelope information of the words. The envelope consists of the number of top, lower, core modifiers along with the number of core charactersp. Devanagari characters are written in three strips. Most of the characters
referred to as core characters are written in the middle strip. The remaining words are further partitioned using tags. A tag is a string of fixed length associated with each partition. The correction process uses a distance matrix for a assigning
penalty for a mismatch. The distance matrix is based on the information about errors that the classification process is known
to make and the confidence figure that the classification process associates with its output. An improvement of approximately
20% in recognition performance is obtained. For a short word, 590 words are searched on average from 14 sub-partitions of
the short-words partition before an exact match is found. The average number of partitions and the average number of words
increase to 20 and 1585, respectively, when an exact match is not found. For tag-based partitions, on an average, 100 words
from 30 partitions are compared when either an exact match is found or a word within the preset threshold distance is found.
If an exact match or a match within a preset threshold is not found, the average number of partitions becomes 75 and 450 words
on an average are compared. To the best of our knowledge this is the first work on the use of a Hindi word dictionary for
OCR post-processing.
Received August 6, 2001 / Accepted August 22, 2001 相似文献
7.
Hwan-Chul Park Se-Young Ok Young-Jung Yu Hwan-Gue Cho 《International Journal on Document Analysis and Recognition》2001,4(2):115-130
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer
vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In
this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm
for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation,
and density) of characters and propose a characteristic value for classification using the run-length frequency of the image
component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal
or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved
lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification
and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D
space according to the area of the bounding box and positional information from the document. We conducted tests with more
than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental
results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental
documents.
Received August 3, 2001 / Accepted August 8, 2001 相似文献
8.
9.
In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen
(LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066
forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words
occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting
recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic
knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus.
The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation
of the text into lines and words.
Received September 28, 2001 / Revised October 10, 2001 相似文献
10.
Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout
of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific
models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building
a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics,
images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and
statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels
for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative
page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented
our classification scheme using decision tree classifiers and self-organizing maps.
Received June 15, 2000 / Revised November 15, 2000 相似文献
11.
Dot-matrix text recognition is a difficult problem, especially when characters are broken into several disconnected components.
We present a dot-matrix text recognition system which uses the fact that dot-matrix fonts are fixed-pitch, in order to overcome
the difficulty of the segmentation process. After finding the most likely pitch of the text, a decision is made as to whether
the text is written in a fixed-pitch or proportional font. Fixed-pitch text is segmented using a pitch-based segmentation
process that can successfully segment both touching and broken characters. We report performance results for the pitch estimation,
fixed-pitch decision and segmentation, and recognition processes.
Received October 18, 1999 / Revised April 21, 2000 相似文献
12.
Yiming Ye John K. Tsotsos Eric Harley Karen Bennet 《Machine Vision and Applications》2000,12(1):32-43
Abstract. This paper proposes a novel tracking strategy that can robustly track a person or other object within a fixed environment
using a pan, tilt, and zoom camera with the help of a pre-recorded image database. We define a set of camera states which
is sufficient to survey the environment for the target. Background images for these camera states are stored as an image database.
During tracking, camera movements are restricted to these states. Tracking and segmentation are simplified, as each tracking
image can be compared with the corresponding pre-recorded background image.
Received: 26 August 1999 / Accepted: 22 February 2000 相似文献
13.
Sonia Garcia-Salicetti Bernadette Dorizzi Patrick Gallinari Zsolt Wimmer 《International Journal on Document Analysis and Recognition》2001,4(1):56-68
In this paper, we present a hybrid online handwriting recognition system based on hidden Markov models (HMMs). It is devoted
to word recognition using large vocabularies. An adaptive segmentation of words into letters is integrated with recognition,
and is at the heart of the training phase. A word-model is a left-right HMM in which each state is a predictive multilayer
perceptron that performs local regression on the drawing (i.e., the written word) relying on a context of observations. A
discriminative training paradigm related to maximum mutual information is used, and its potential is shown on a database of
9,781 words.
Received June 19, 2000 / Revised October 16, 2000 相似文献
14.
Abstract. The purpose of this study is to discuss existing fractal-based algorithms and propose novel improvements of these algorithms
to identify tumors in brain magnetic-response (MR) images. Considerable research has been pursued on fractal geometry in various
aspects of image analysis and pattern recognition. Magnetic-resonance images typically have a degree of noise and randomness
associated with the natural random nature of structure. Thus, fractal analysis is appropriate for MR image analysis. For tumor
detection, we describe existing fractal-based techniques and propose three modified algorithms using fractal analysis models.
For each new method, the brain MR images are divided into a number of pieces. The first method involves thresholding the pixel
intensity values; hence, we call the technique piecewise-threshold-box-counting (PTBC) method. For the subsequent methods,
the intensity is treated as the third dimension. We implement the improved piecewise-modified-box-counting (PMBC) and piecewise-triangular-prism-surface-area
(PTPSA) methods, respectively. With the PTBC method, we find the differences in intensity histogram and fractal dimension
between normal and tumor images. Using the PMBC and PTPSA methods, we may detect and locate the tumor in the brain MR images
more accurately. Thus, the novel techniques proposed herein offer satisfactory tumor identification.
Received: 13 October 2001 / Accepted: 28 May 2002
Correspondence to: K.M. Iftekharuddin 相似文献
15.
Hideaki Goto Hirotomo Aso 《International Journal on Document Analysis and Recognition》1999,2(2-3):111-119
In order to enhance the ability of document analysis systems, we need a text line extraction method which can handle not
only straight text lines but also text lines in various shapes. This paper proposes a new method called Extended Linear Segment
Linking (ELSL for short), which is able to extract text lines in arbitrary orientations and curved text lines. We also consider
the existence of both horizontally and vertically printed text lines on the same page. The new method can produce text line
candidates for multiple orientations. We verify the ability of the method by some experiments as well.
Received December 21, 1998 / Revised version September 2, 1999 相似文献
16.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
17.
18.
We consider the problem of scheduling a set of pages on a single broadcast channel using time-multiplexing. In a perfectly periodic schedule, time is divided into equal size slots, and each page is transmitted in a time slot precisely every fixed interval of time (the period of the page). We study the case in which each page i has a given demand probability , and the goal is to design a perfectly periodic schedule that minimizes the average time a random client waits until its
page is transmitted. We seek approximate polynomial solutions. Approximation bounds are obtained by comparing the costs of
a solution provided by an algorithm and a solution to a relaxed (non-integral) version of the problem. A key quantity in our
methodology is a fraction we denote by , that depends on the maximum demand probability: . The best known polynomial algorithm to date guarantees an approximation of . In this paper, we develop a tree-based methodology for perfectly periodic scheduling, and using new techniques, we derive
algorithms with better bounds. For small values, our best algorithm guarantees approximation of . On the other hand, we show that the integrality gap between the cost of any perfectly periodic schedule and the cost of
the fractional problem is at least . We also provide algorithms with good performance guarantees for large values of .
Received: December 2001 / Accepted: September 2002 相似文献
19.
Henry S. Baird Allison L. Coates Richard J. Fateman 《International Journal on Document Analysis and Recognition》2003,5(2-3):158-163
Abstract. We exploit the gap in ability between human and machine vision systems to craft a family of automatic challenges that tell
human and machine users apart via graphical interfaces including Internet browsers. Turing proposed [Tur50] a method whereby
human judges might validate “artificial intelligence” by failing to distinguish between human and machine interlocutors. Stimulated
by the “chat room problem” posed by Udi Manber of Yahoo!, and influenced by the CAPTCHA project [BAL00] of Manuel Blum et
al. of Carnegie-Mellon Univ., we propose a variant of the Turing test using pessimal print: that is, low-quality images of machine-printed text synthesized pseudo-randomly over certain ranges of words, typefaces,
and image degradations. We show experimentally that judicious choice of these ranges can ensure that the images are legible
to human readers but illegible to several of the best present-day optical character recognition (OCR) machines. Our approach
is motivated by a decade of research on performance evaluation of OCR machines [RJN96,RNN99] and on quantitative stochastic
models of document image quality [Bai92,Kan96]. The slow pace of evolution of OCR and other species of machine vision over
many decades [NS96,Pav00] suggests that pessimal print will defy automated attack for many years. Applications include `bot'
barriers and database rationing.
Received: February 14, 2002 / Accepted: March 28, 2002
An expanded version of: A.L. Coates, H.S. Baird, R.J. Fateman (2001) Pessimal Print: a reverse Turing Test. In: {\it Proc.
6th Int. Conf. on Document Analysis and Recognition}, Seattle, Wash., USA, September 10–13, pp. 1154–1158
Correspondence to: H. S. Baird 相似文献