共查询到20条相似文献,搜索用时 31 毫秒
1.
Hideaki Goto Hirotomo Aso 《International Journal on Document Analysis and Recognition》2002,4(4):258-268
Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with
various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs,
computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images
with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the
processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns
can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel
labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document
images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color
pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns
from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the
character strokes is more than about 1.5 pixels.
Received July 23, 2001 / Accepted November 5, 2001 相似文献
2.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content
of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images
into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure
of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is
reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font,
face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the
performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate
lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection
of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection
and specificity of the lexicon.
Received May 1, 1998 / Revised October 20, 1998 相似文献
3.
4.
B.B. Chaudhuri U. Garain 《International Journal on Document Analysis and Recognition》2001,3(3):138-149
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered.
It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital
style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these
styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination
of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized
text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i)
different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach
on a large number of good quality, as well as degraded, document images are presented.
Received July 12, 2000 / Revised October 1, 2000 相似文献
5.
In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen
(LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066
forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words
occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting
recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic
knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus.
The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation
of the text into lines and words.
Received September 28, 2001 / Revised October 10, 2001 相似文献
6.
An architecture for handwritten text recognition systems 总被引:1,自引:1,他引:0
Gyeonghwan Kim Venu Govindaraju Sargur N. Srihari 《International Journal on Document Analysis and Recognition》1999,2(1):37-44
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system
are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation
of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line
detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word
gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition,
concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic
constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed
for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are
described in this paper. Preliminary experiments show promising results in terms of speed and accuracy.
Received October 30, 1998 / Revised January 15, 1999 相似文献
7.
V. Bansal R.M.K. Sinha 《International Journal on Document Analysis and Recognition》2002,4(4):269-280
Abstract. This paper describes a method for the correction of optically read Devanagari character strings using a Hindi word dictionary.
The word dictionary is partitioned in order to reduce the search space besides preventing forced matching to an incorrect
word. The dictionary partitioning strategy takes into account the underlying OCR process. The dictionary words at the top
level have been divided into two partitions, namely: a short-words partition and the remaining words partition. The short-word partition is sub-partitioned using the envelope information of the words. The envelope consists of the number of top, lower, core modifiers along with the number of core charactersp. Devanagari characters are written in three strips. Most of the characters
referred to as core characters are written in the middle strip. The remaining words are further partitioned using tags. A tag is a string of fixed length associated with each partition. The correction process uses a distance matrix for a assigning
penalty for a mismatch. The distance matrix is based on the information about errors that the classification process is known
to make and the confidence figure that the classification process associates with its output. An improvement of approximately
20% in recognition performance is obtained. For a short word, 590 words are searched on average from 14 sub-partitions of
the short-words partition before an exact match is found. The average number of partitions and the average number of words
increase to 20 and 1585, respectively, when an exact match is not found. For tag-based partitions, on an average, 100 words
from 30 partitions are compared when either an exact match is found or a word within the preset threshold distance is found.
If an exact match or a match within a preset threshold is not found, the average number of partitions becomes 75 and 450 words
on an average are compared. To the best of our knowledge this is the first work on the use of a Hindi word dictionary for
OCR post-processing.
Received August 6, 2001 / Accepted August 22, 2001 相似文献
8.
Partha Pratim Roy Umapada Pal Josep Lladós 《International Journal on Document Analysis and Recognition》2014,17(4):343-358
Word searching in non-structural layout such as graphical documents is a difficult task due to arbitrary orientations of text words and the presence of graphical symbols. This paper presents an efficient approach for word searching in documents of non-structural layout using an efficient indexing and retrieval approach. The proposed indexing scheme stores spatial information of text characters of a document using a character spatial feature table (CSFT). The spatial feature of text component is derived from the neighbor component information. The character labeling of a multi-scaled and multi-oriented component is performed using support vector machines. For searching purpose, the positional information of characters is obtained from the query string by splitting it into possible combinations of character pairs. Each of these character pairs searches the position of corresponding text in document with the help of CSFT. Next, the searched text components are joined and formed into sequence by spatial information matching. String matching algorithm is performed to match the query word with the character pair sequence in documents. The experimental results are presented on two different datasets of graphical documents: maps dataset and seal/logo image dataset. The results show that the method is efficient to search query word from unconstrained document layouts of arbitrary orientation. 相似文献
9.
Paul Clark Majid Mirmehdi 《International Journal on Document Analysis and Recognition》2002,4(4):243-257
We present two different approaches to the location and recovery of text in images of real scenes. The techniques we describe
are invariant to the scale and 3D orientation of the text, and allow recovery of text in cluttered scenes. The first approach
uses page edges and other rectangular boundaries around text to locate a surface containing text, and to recover a fronto-parallel
view. This is performed using line detection, perceptual grouping, and comparison of potential text regions using a confidence
measure. The second approach uses low-level texture measures with a neural network classifier to locate regions of text in
an image. Then we recover a fronto-parallel view of each located paragraph of text by separating the individual lines of text
and determining the vanishing points of the text plane. We illustrate our results using a number of images.
Received May 20, 2001 / Accepted June 19, 2001 相似文献
10.
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given
page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in
a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary
text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the
most likely candidates for those words, using only widths of the word images. The identity of each word is determined using
a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using
a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page
images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on
90% of all the pages. These can serve as useful seeds to bootstrap font learning.
Received October 8, 1999 / Revised March 29, 2000 相似文献
11.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
12.
13.
Doe-Wan Kim Tapas Kanungo 《International Journal on Document Analysis and Recognition》2002,5(1):47-66
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition
(OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned
document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically
registered the original image to the rescanned one using four corner points and then transformed the original groundtruth
using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing
the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance
between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout
complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using
the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth
for microfilmed and FAXed versions of the University of Washington dataset documents.
Received: July 24, 2001 / Accepted: May 20, 2002 相似文献
14.
The next generation of interactive multimedia documents can contain both static media, e.g., text, graph, image, and continuous
media, e.g., audio and video, and can provide user interactions in distributed environments. However, the temporal information
of multimedia documents cannot be described using traditional document structures, e.g., Open Document Architecture (ODA)
and Standard Generalized Mark-up Language (SGML); the continuous transmission of media units also raises some new synchronization
problems, which have not been met before, for processing user interactions. Thus, developing a distributed interactive multimedia
document system should resolve the issues of document model, presentation control architecture, and control scheme. In this
paper, we (i) propose a new multimedia document model that contains the logical structure, the layout structure, and the temporal
structure to formally describe multimedia documents, and (ii) point out main interaction-based synchronization problems, and
propose a control architecture and a token-based control scheme to solve these interaction-based synchronization problems.
Based on the proposed document model, control architecture, and control scheme, a distributed interactive multimedia document
development mechanism, which is called MING-I, is developed on SUN workstations. 相似文献
15.
Kazem Taghva Eric Stofsky 《International Journal on Document Analysis and Recognition》2001,3(3):125-137
In this paper, we describe a spelling correction system designed specifically for OCR-generated text that selects candidate
words through the use of information gathered from multiple knowledge sources. This system for text correction is based on
static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system
incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of
the new system is presented as well.
Received August 16, 2000 / Revised October 6, 2000 相似文献
16.
17.
复杂彩色文本图像中字符的提取 总被引:4,自引:1,他引:4
从复杂彩色文本图像中提取和识别字符已经成为一个既困难又有趣的问题。本文给出了一个具有创新性和实用性的区域生长算法用于彩色图像的分割:彩色图像游程邻接算法CRAG(color run-length adjacency graph algorithm)。我们将该算法用于彩色文本图像,首先得到图像的彩色连通域,再对这些连通域的平均颜色进行颜色聚类,可得到若干个聚类中心,然后根据不同的颜色中心将图像分为相应的彩色层面,最后通过连通域分析判断所需的文字层。该生长算法修改并扩展了传统的BAG算法,并将其运用于彩色印刷体文本图像中,充分利用了彩色图像的颜色和位置信息。实验结果表明新的方法能很好的从彩色印刷图像中提取多种常见的艺术字,并具有较高的提取速度,同时保留了文字和背景图像的原始色彩,便于将来的图像恢复。 相似文献
18.
Identifying facsimile duplicates using radial pixel densities 总被引:2,自引:0,他引:2
P. Chatelain 《International Journal on Document Analysis and Recognition》2002,4(4):219-225
A method for detecting full layout facsimile duplicates based on radial pixel densities is proposed. It caters for facsimiles,
including text and/or graphics. Pages may be positioned upright or inverted on the scanner bed. The method is not dependent
on the computation of text skew or text orientation. Using a database of original documents, 92% of non-duplicates and upright
duplicates as well as 89% of inverted duplicates could be correctly identified. The method is vulnerable to double scanning.
This occurs when documents are copied using a photocopier and the copies are subsequently transmitted using a facsimile machine.
Received September 29, 2000 / Revised: August 23, 2001 相似文献
19.
Textual data is very important in a number of applications such as image database indexing and document understanding. The goal of automatic text location without character recognition capabilities is to extract image regions that contain only text. These regions can then be either fed to an optical character recognition module or highlighted for a user. Text location is a very difficult problem because the characters in text can vary in font, size, spacing, alignment, orientation, color and texture. Further, characters are often embedded in a complex background in the image. We propose a new text location algorithm that is suitable in a number of applications, including conversion of newspaper advertisements from paper documents to their electronic versions, World Wide Web search, color image indexing and video indexing. In many of these applications, it is not necessary to extract all the text, so we emphasize on extracting important text with large size and high contrast. Our algorithm is very fast and has been shown to be successful in extracting important text in a large number of test images. 相似文献
20.
E. Appiani F. Cesarini A.M. Colla M. Diligenti M. Gori S. Marinai G. Soda 《International Journal on Document Analysis and Recognition》2001,4(2):69-83
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described.
This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes.
The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically
index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled
users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying
reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents
automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to
dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning
passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing
strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining
to the specific document class. Experimental results are encouraging overall; in particular, document classification results
fulfill the requirements of high-volume application. Integration into production lines is under execution.
Received March 30, 2000 / Revised June 26, 2001 相似文献