共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a morphology-based text line extraction algorithm for extracting text regions from cluttered images. First
of all, the method defines a novel set of morphological operations for extracting important contrast regions as possible text
line candidates. The contrast feature is robust to lighting changes and invariant against different image transformations
like image scaling, translation, and skewing. In order to detect skewed text lines, a moment-based method is then used for
estimating their orientations. According to the orientation, an x-projection technique can be applied to extract various text geometries from the text-analogue segments for text verification.
However, due to noise, a text line region is often fragmented to different pieces of segments. Therefore, after the projection,
a novel recovery algorithm is then proposed for recovering a complete text line from its pieces of segments. After that, a
verification scheme is then proposed for verifying all extracted potential text lines according to their text geometries.
Experimental results show that the proposed method improves the state-of-the-art work in terms of effectiveness and robustness
for text line detection. 相似文献
2.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
3.
Hwan-Chul Park Se-Young Ok Young-Jung Yu Hwan-Gue Cho 《International Journal on Document Analysis and Recognition》2001,4(2):115-130
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer
vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In
this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm
for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation,
and density) of characters and propose a characteristic value for classification using the run-length frequency of the image
component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal
or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved
lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification
and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D
space according to the area of the bounding box and positional information from the document. We conducted tests with more
than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental
results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental
documents.
Received August 3, 2001 / Accepted August 8, 2001 相似文献
4.
In this paper, we present a new approach to extract characters on a license plate of a moving vehicle, given a sequence of
perspective-distortion-corrected license plate images. Different from many existing single-frame approaches, our method simultaneously
utilizes spatial and temporal information. We first model the extraction of characters as a Markov random field (MRF), where
the randomness is used to describe the uncertainty in pixel label assignment. With the MRF modeling, the extraction of characters
is formulated as the problem of maximizing a posteriori probability based on a given prior knowledge and observations. A genetic algorithm with local greedy mutation operator is
employed to optimize the objective function. Experiments and comparison study were conducted and some of our experimental
results are presented in the paper. It is shown that our approach provides better performance than other single frame methods.
Received: 13 August 1997 / Accepted: 7 October 1997 相似文献
5.
David Crandall Sameer Antani Rangachar Kasturi 《International Journal on Document Analysis and Recognition》2003,5(2-3):138-157
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically
index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's
semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary
color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about
the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video.
In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However,
text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once
in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial
problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television
programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing,
and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared
with existing work found in the literature.
Received: January 29, 2002 / Accepted: September 13, 2002
D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com
S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov
Correspondence to: David Crandall 相似文献
6.
Paul Clark Majid Mirmehdi 《International Journal on Document Analysis and Recognition》2002,4(4):243-257
We present two different approaches to the location and recovery of text in images of real scenes. The techniques we describe
are invariant to the scale and 3D orientation of the text, and allow recovery of text in cluttered scenes. The first approach
uses page edges and other rectangular boundaries around text to locate a surface containing text, and to recover a fronto-parallel
view. This is performed using line detection, perceptual grouping, and comparison of potential text regions using a confidence
measure. The second approach uses low-level texture measures with a neural network classifier to locate regions of text in
an image. Then we recover a fronto-parallel view of each located paragraph of text by separating the individual lines of text
and determining the vanishing points of the text plane. We illustrate our results using a number of images.
Received May 20, 2001 / Accepted June 19, 2001 相似文献
7.
Efficient extraction of primitives from line drawings composed of horizontal and vertical lines 总被引:6,自引:0,他引:6
The performance of the algorithms for the extraction of primitives for the interpretation of line drawings is usually affected
by the degradation of the information contained in the document due to factors such as low print contrast, defocusing, skew,
etc. In this paper, we are proposing two algorithms for the extraction of primitives with good performance under degradation.
The application of the algorithms is restricted to line drawings composed of horizontal and vertical lines. The performance
of the algorithms has been evaluated by using a protocol described in the literature.
Received: 6 August 1996 / Accepted: 16 July 1997 相似文献
8.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
9.
Identifying facsimile duplicates using radial pixel densities 总被引:2,自引:0,他引:2
P. Chatelain 《International Journal on Document Analysis and Recognition》2002,4(4):219-225
A method for detecting full layout facsimile duplicates based on radial pixel densities is proposed. It caters for facsimiles,
including text and/or graphics. Pages may be positioned upright or inverted on the scanner bed. The method is not dependent
on the computation of text skew or text orientation. Using a database of original documents, 92% of non-duplicates and upright
duplicates as well as 89% of inverted duplicates could be correctly identified. The method is vulnerable to double scanning.
This occurs when documents are copied using a photocopier and the copies are subsequently transmitted using a facsimile machine.
Received September 29, 2000 / Revised: August 23, 2001 相似文献
10.
11.
Markus Junker Rainer Hoch 《International Journal on Document Analysis and Recognition》1998,1(2):116-122
In the literature, many feature types are proposed for document classification. However, an extensive and systematic evaluation
of the various approaches has not yet been done. In particular, evaluations on OCR documents are very rare. In this paper
we investigate seven text representations based on n-grams and single words. We compare their effectiveness in classifying OCR texts and the corresponding correct ASCII texts
in two domains: business letters and abstracts of technical reports. Our results indicate that the use of n-grams is an attractive technique which can even compare to techniques relying on a morphological analysis. This holds for
OCR texts as well as for correct ASCII texts.
Received February 17, 1998 / Revised April 8, 1998 相似文献
12.
Abstract. Automatic acquisition of CAD models from existing objects requires accurate extraction of geometric and topological information
from the input data. This paper presents a range image segmentation method based on local approximation of scan lines. The
method employs edge models that are capable of detecting noise pixels as well as position and orientation discontinuities
of varying strengths. Region-based techniques are then used to achieve a complete segmentation. Finally, a geometric representation
of the scene, in the form of a surface CAD model, is produced. Experimental results on a large number of real range images
acquired by different range sensors demonstrate the efficiency and robustness of the method.
Received: 1 August 2000 / Accepted: 23 January 2002
Correspondence to: I. Khalifa 相似文献
13.
Amit Kumar Das Sanjoy Kumar Saha Bhabatosh Chanda 《International Journal on Document Analysis and Recognition》2002,4(3):183-190
Document image segmentation is the first step in document image analysis and understanding. One major problem centres on
the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the
Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but
some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms
of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation
algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also
produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid
and mismatched regions.
Received July 14, 2000 / Revised June 12, 2001[-1mm] 相似文献
14.
15.
Claudia Wenzel Heiko Maus 《International Journal on Document Analysis and Recognition》2001,3(4):248-260
Knowledge-based systems for document analysis and understanding (DAU) are quite useful whenever analysis has to deal with
the changing of free-form document types which require different analysis components. In this case, declarative modeling is
a good way to achieve flexibility. An important application domain for such systems is the business letter domain. Here, high
accuracy and the correct assignment to the right people and the right processes is a crucial success factor. Our solution
to this proposes a comprehensive knowledge-centered approach: we model not only comparatively static knowledge concerning
document properties and analysis results within the same declarative formalism, but we also include the analysis task and
the current context of the system environment within the same formalism. This allows an easy definition of new analysis tasks
and also an efficient and accurate analysis by using expectations about incoming documents as context information. The approach
described has been implemented within the VOPR (VOPR is an acronym for the Virtual Office PRototype.) system. This DAU system
gains the required context information from a commercial workflow management system (WfMS) by constant exchanges of expectations
and analysis tasks. Further interaction between these two systems covers the delivery of results from DAU to the WfMS and
the delivery of corrected results vice versa.
Received June 19, 1999 / Revised November 8, 2000 相似文献
16.
Using local deviations of vectorization to enhance the performance of raster-to-vector conversion systems 总被引:1,自引:0,他引:1
Eugene Bodansky Morakot Pilouk 《International Journal on Document Analysis and Recognition》2000,3(2):67-72
This paper presents a method of quantitatively measuring local vectorization errors that evaluates the deviation of the vectorization
of arbitrary (regular and irregular) raster linear objects. This measurement of the deviation does not depend on the thickness
of the linear object. One of the most time-consuming procedures of raster-to-vector conversion of large linear drawings is
manually verifying the results. Performance of raster-to-vector conversion systems can be enhanced with auto- localization
of places that have to be corrected. The local deviations can be used for testing results and automatically showing the parts
of resulting curves where deviations are greater than a threshold value and have to be corrected. 相似文献
17.
18.
Shuhua Wang Yang Cao Shijie Cai 《International Journal on Document Analysis and Recognition》2001,4(1):27-34
The most noticeable characteristic of a construction tender document is that its hierarchical architecture is not obviously
expressed but is implied in the citing information. Currently available methods cannot deal with such documents. In this paper,
the intra-page and inter-page relationships are analyzed in detail. The creation of citing relationships is essential to extracting
the logical structure of tender documents. The hierarchy of tender documents naturally leads to extracting and displaying
the logical structure as tree structure. This method is successfully implemented in VHTender, and is the key to the efficiency
and flexibility of the whole system.
Received February 28, 2000 / Revised October 20, 2000 相似文献
19.
Stefan Klink Thomas Kieninger 《International Journal on Document Analysis and Recognition》2001,4(1):18-26
Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document
‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid
in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are
the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules
can be formulated based on features which might be observed within one specific layout object. However, rules can also express
dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to
specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common
objects (e.g., lists).
Received June 19, 2000 / Revised November 8, 2000 相似文献
20.
Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout
of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific
models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building
a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics,
images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and
statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels
for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative
page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented
our classification scheme using decision tree classifiers and self-organizing maps.
Received June 15, 2000 / Revised November 15, 2000 相似文献