共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given
page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in
a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary
text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the
most likely candidates for those words, using only widths of the word images. The identity of each word is determined using
a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using
a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page
images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on
90% of all the pages. These can serve as useful seeds to bootstrap font learning.
Received October 8, 1999 / Revised March 29, 2000 相似文献
2.
This paper describes a method for recognizing partially occluded objects under different levels of illumination brightness
by using the eigenspace analysis. In our previous work, we developed the “eigenwindow” method to recognize the partially occluded
objects in an assembly task, and demonstrated with sufficient high performance for the industrial use that the method works
successfully for multiple objects with specularity under constant illumination. In this paper, we modify the eigenwindow method
for recognizing objects under different illumination conditions, as is sometimes the case in manufacturing environments, by
using additional color information. In the proposed method, a measured color in the RGB color space is transformed into one
in the HSV color space. Then, the hue of the measured color, which is invariant to change in illumination brightness and direction,
is used for recognizing multiple objects under different illumination conditions. The proposed method was applied to real
images of multiple objects under various illumination conditions, and the objects were recognized and localized successfully. 相似文献
3.
Cheng-Lin Liu Hiroshi Sako Hiromichi Fujisawa 《International Journal on Document Analysis and Recognition》2002,4(3):191-204
This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition.
The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural
classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that high accuracies can be achieved
at moderate memory space and computation cost. The performance is measured in terms of classification accuracy, sensitivity
to training sample size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers is enhanced
by training with synthesized outlier data. The classifiers are tested on a large data set extracted from NIST SD19. As results,
the test accuracies of the evaluated classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule
and regularized discriminant analysis (RDA). It is shown that neural classifiers are more susceptible to small sample size
than MQDF, although they yield higher accuracies on large sample size. As a neural classifier, the polynomial classifier (PC)
gives the highest accuracy and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier rejection
even though it is not trained with outlier data. The results indicate that pattern classifiers have complementary advantages
and they should be appropriately combined to achieve higher performance.
Received: July 18, 2001 / Accepted: September 28, 2001 相似文献
4.
Giovanni Seni John Seybold 《International Journal on Document Analysis and Recognition》1999,2(1):24-29
Out-of-order diacriticals introduce significant complexity to the design of an online handwriting recognizer, because they
require some reordering of the time domain information. It is common in cursive writing to write the body of an `i' or `t'
during the writing of the word, and then to return and dot or cross the letter once the word is complete. The difficulty arises
because we have to look ahead, when scoring one of these letters, to find the mark occurring later in the writing stream that
completes the letter. We should also remember that we have used this mark, so that we don't use it again for a different letter,
and we should also penalize a word if there are some marks that look like diacriticals that are not used. One approach to
this problem is to scan the writing some distance into the future to identify candidate diacriticals, remove them in a preprocessing
step, and associate them with the matching letters earlier in the word. If done as a preliminary operation, this approach
is error-prone: marks that are not diacriticals may be incorrectly identified and removed, and true diacriticals may be skipped.
This paper describes a novel extension to a forward search algorithm that provides a natural mechanism for considering alternative
treatments of potential diacriticals, to see whether it is better to treat a given mark as a diacritical or not, and directly
compare the two outcomes by score.
Received October 30, 1998 / Revised January 25, 1999 相似文献
5.
6.
S. Jaeger S. Manke J. Reichert A. Waibel 《International Journal on Document Analysis and Recognition》2001,3(3):169-180
This paper presents the online handwriting recognition system NPen++ developed at the University of Karlsruhe and Carnegie
Mellon University. The NPen++ recognition engine is based on a multi-state time delay neural network and yields recognition
rates from 96% for a 5,000 word dictionary to 93.4% on a 20,000 word dictionary and 91.2% for a 50,000 word dictionary. The
proposed tree search and pruning technique reduces the search space considerably without losing too much recognition performance
compared to an exhaustive search. This enables the NPen++ recognizer to be run in real-time with large dictionaries. Initial
recognition rates for whole sentences are promising and show that the MS-TDNN architecture is suited to recognizing handwritten
data ranging from single characters to whole sentences.
Received September 3, 2000 / Revised October 9, 2000 相似文献
7.
Mathematical expression recognition: a survey 总被引:15,自引:0,他引:15
Kam-Fai Chan Dit-Yan Yeung 《International Journal on Document Analysis and Recognition》2000,3(1):3-15
Abstract. Automatic recognition of mathematical expressions is one of the key vehicles in the drive towards transcribing documents
in scientific and engineering disciplines into electronic form. This problem typically consists of two major stages, namely,
symbol recognition and structural analysis. In this survey paper, we will review most of the existing work with respect to
each of the two major stages of the recognition process. In particular, we try to put emphasis on the similarities and differences
between systems. Moreover, some important issues in mathematical expression recognition will be addressed in depth. All these
together serve to provide a clear overall picture of how this research area has been developed to date.
Received February 22, 2000 / Revised June 12, 2000 相似文献
8.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content
of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images
into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure
of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is
reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font,
face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the
performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate
lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection
of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection
and specificity of the lexicon.
Received May 1, 1998 / Revised October 20, 1998 相似文献
9.
John F. Pitrelli Amit Roy 《International Journal on Document Analysis and Recognition》2003,5(2-3):126-137
We discuss development of a word-unigram language model for online handwriting recognition. First, we tokenize a text corpus
into words, contrasting with tokenization methods designed for other purposes. Second, we select for our model a subset of
the words found, discussing deviations from an N-most-frequent-words approach. From a 600-million-word corpus, we generated a 53,000-word model which eliminates 45% of word-recognition
errors made by a character-level-model baseline system. We anticipate that our methods will be applicable to offline recognition
as well, and to some extent to other recognizers, such as speech recognizers and video retrieval systems.
Received: November 1, 2001 / Revised version: July 22, 2002 相似文献
10.
Morphological shared-weight neural networks (MSNN) combine the feature extraction capability of mathematical morphology with
the function-mapping capability of neural networks in a single trainable architecture. The MSNN method has been previously
demonstrated using a variety of imaging sensors, including TV, forward-looking infrared (FLIR) and synthetic aperture radar
(SAR). In this paper, we provide experimental results with laser radar (LADAR). We present three sets of experiments. In the
first set of experiments, we use the MSNN to detect different types of targets simultaneously. In the second set, we use the
MSNN to detect only a particular type of target. In the third set, we test a novel scenario, referred to as the Sims scenario:
we train the MSNN to recognize a particular type of target using very few examples. A detection rate of 86% with a reasonable
number of false alarms was achieved in the first set of experiments and a detection rate of close to 100% with very few false
alarms was achieved in the second and third sets of experiments. In all the experiments, a novel pre-processing method is
used to create a pseudo-intensity images from the original LADAR range images. 相似文献
11.
12.
The Dempster-Shafer theory and the convex Bayesian theory have recently been proposed as alternatives to the (strict) Bayesian
theory in the field of reasoning with uncertainty. These relatively new formalisms claim that missing information in the probabilistic
model of a process not necessarily disables uncertainty reasoning. However, this paper shows that this does not apply to processes
where the reasoning is part of a decision-making process, such as object recognition. In these cases, a complete probabilistic
model is required and can be obtained by estimating missing probabilistic information. An examplary approach towards the estimation
of uncertain probabilistic information is described in this paper for a multi-sensor system for recognition of electronic
components on printed circuit boards.
Received: 21 June 1998 / Accepted: 23 May 2000 相似文献
13.
Gyeonghwan Kim Venu Govindaraju Sargur N. Srihari 《International Journal on Document Analysis and Recognition》1999,2(1):37-44
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system
are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation
of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line
detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word
gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition,
concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic
constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed
for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are
described in this paper. Preliminary experiments show promising results in terms of speed and accuracy.
Received October 30, 1998 / Revised January 15, 1999 相似文献
14.
E. Kavallieratou N. Fakotakis G. Kokkinakis 《International Journal on Document Analysis and Recognition》2002,4(4):226-242
In this paper, an integrated offline recognition system for unconstrained handwriting is presented. The proposed system consists
of seven main modules: skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, slant
removing, word segmentation, and character segmentation and recognition, stemming from the implementation of already existing
algorithms as well as novel algorithms. This system has been tested on the NIST, IAM-DB, and GRUHD databases and has achieved
accuracy that varies from 65.6% to 100% depending on the database and the experiment. 相似文献
15.
Segmentation and recognition of Chinese bank check amounts 总被引:1,自引:0,他引:1
M.L. Yu P.C.K. Kwok C.H. Leung K.W. Tse 《International Journal on Document Analysis and Recognition》2001,3(4):207-217
This paper describes a system for the recognition of legal amounts on bank checks written in the Chinese language. It consists
of subsystems that perform preprocessing, segmentation, and recognition of the legal amount. In each step of the segmentation
and recognition phases, a list of possible choices are obtained. An approach is adopted whereby a large number of choices
can be processed effectively and efficiently in order to achieve the best recognition result. The contribution of this paper
is the proposal of a grammar checker for Chinese bank check amounts. It is found to be very effective in reducing the substitution
error rate. The recognition rate of the system is 74.0%, the error rate is 10.4%, and the reliability is 87.7%.
Received June 9, 2000 / Revised January 10, 2001 相似文献
16.
In extra-corporeal shock wave lithotripsy (ESWL), focused acoustic waves are used to fragment urinary stones. The success
of the treatment depends on coincidence between the stone position and the point of convergence of the waves. However, the
stone may move during the treatment. We developed a software called Echotrack which performs a real-time tracking of the stone
in ultrasound images and automatically adjusts the position of the generator of shock waves. Clinical tests carried out in
65 patients showed that the Echotrack is able to track the stones as long as they are visible in the images. The number of
shocks necessary to fragment the stones is reduced by 40%.
Received: 28 April 1998 / Accepted: 10 September 1998 相似文献
17.
Farzin Mokhtarian 《Machine Vision and Applications》1997,10(3):87-97
A complete and practical system for occluded object recognition has been developed which is very robust with respect to noise
and local deformations of shape (due to weak perspective distortion, segmentation errors and non-rigid material) as well as
scale, position and orientation changes of the objects. The system has been tested on a wide variety of free-form 3D objects.
An industrial application is envisaged where a fixed camera and a light-box are utilized to obtain images. Within the constraints
of the system, every rigid 3D object can be modeled by a limited number of classes of 2D contours corresponding to the object's
resting positions on the light-box. The contours in each class are related to each other by a 2D similarity transformation.
The Curvature Scale Space technique [26, 28] is then used to obtain a novel multi-scale segmentation of the image and the model contours. Object indexing [16, 32, 36] is used to narrow down the search space. An efficient local matching algorithm is utilized to select the best
matching models.
Received: 5 August 1996 / Accepted: 19 March 1997 相似文献
18.
In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen
(LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066
forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words
occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting
recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic
knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus.
The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation
of the text into lines and words.
Received September 28, 2001 / Revised October 10, 2001 相似文献
19.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
20.
A model-driven approach for real-time road recognition 总被引:6,自引:0,他引:6
This article describes a method designed to detect and track road edges starting from images provided by an on-board monocular
monochromic camera. Its implementation on specific hardware is also presented in the framework of the VELAC project. The method
is based on four modules: (1) detection of the road edges in the image by a model-driven algorithm, which uses a statistical
model of the lane sides which manages the occlusions or imperfections of the road marking – this model is initialized by an
off-line training step; (2) localization of the vehicle in the lane in which it is travelling; (3) tracking to define a new
search space of road edges for the next image; and (4) management of the lane numbers to determine the lane in which the vehicle
is travelling. The algorithm is implemented in order to validate the method in a real-time context. Results obtained on marked
and unmarked road images show the robustness and precision of the method.
Received: 18 November 2000 / Accepted: 7 May 2001 相似文献