共查询到20条相似文献,搜索用时 203 毫秒
1.
Hon-Son Don 《International Journal on Document Analysis and Recognition》2001,4(2):131-138
A new thresholding method, called the noise attribute thresholding method (NAT), for document image binarization is presented
in this paper. This method utilizes the noise attribute features extracted from the images to make the selection of threshold
values for image thresholding. These features are based on the properties of noise in the images and are independent of the
strength of the signals (objects and background) in the image. A simple noise model is given to explain these noise properties.
The NAT method has been applied to the problem of removing text and figures printed on the back of the paper. Conventional
global thresholding methods cannot solve this kind of problem satisfactorily. Experimental results show that the NAT method
is very effective.
Received July 05, 1999 / Revised July 07, 2000 相似文献
2.
Lixin Fan Liying Fan Chew Lim Tan 《International Journal on Document Analysis and Recognition》2003,5(2-3):88-101
Abstract. For document images corrupted by various kinds of noise, direct binarization images may be severely blurred and degraded.
A common treatment for this problem is to pre-smooth input images using noise-suppressing filters. This article proposes an
image-smoothing method used for prefiltering the document image binarization. Conceptually, we propose that the influence
range of each pixel affecting its neighbors should depend on local image statistics. Technically, we suggest using coplanar matrices to capture the structural and textural distribution of similar pixels at each site. This property adapts the smoothing process
to the contrast, orientation, and spatial size of local image structures. Experimental results demonstrate the effectiveness
of the proposed method, which compares favorably with existing methods in reducing noise and preserving image features. In
addition, due to the adaptive nature of the similar pixel definition, the proposed filter output is more robust regarding
different noise levels than existing methods.
Received: October 31, 2001 / October 09, 2002
Correspondence to:L. Fan (e-mail: fanlixin@ieee.org) 相似文献
3.
Xiangyun Ye Mohamed Cheriet Ching Y. Suen 《International Journal on Document Analysis and Recognition》2001,4(2):84-96
The automation of business form processing is attracting intensive research interests due to its wide application and its
reduction of the heavy workload due to manual processing. Preparing clean and clear images for the recognition engines is
often taken for granted as a trivial task that requires little attention. In reality, handwritten data usually touch or cross
the preprinted form frames and texts, creating tremendous problems for the recognition engines. In this paper, we contribute
answers to two questions: “Why do we need cleaning and enhancement procedures in form processing systems?” and “How can we
clean and enhance the hand-filled items with easy implementation and high processing speed?” Here, we propose a generic system
including only cleaning and enhancing phases. In the cleaning phase, the system registers a template to the input form by
aligning corresponding landmarks. A unified morphological scheme is proposed to remove the form frames and restore the broken
handwriting from gray or binary images. When the handwriting is found touching or crossing preprinted texts, morphological
operations based on statistical features are used to clean it. In applications where a black-and-white scanning mode is adopted,
handwriting may contain broken or hollow strokes due to improper thresholding parameters. Therefore, we have designed a module
to enhance the image quality based on morphological operations. Subjective and objective evaluations have been studied to
show the effectiveness of the proposed procedures.
Received January 19, 2000 / Revised March 20, 2001 相似文献
4.
Image-based animation of facial expressions 总被引:1,自引:0,他引:1
Gideon Moiza Ayellet Tal Ilan Shimshoni David Barnett Yael Moses 《The Visual computer》2002,18(7):445-467
We present a novel technique for creating realistic facial animations given a small number of real images and a few parameters
for the in-between images. This scheme can also be used for reconstructing facial movies where the parameters can be automatically
extracted from the images. The in-between images are produced without ever generating a three-dimensional model of the face.
Since facial motion due to expressions are not well defined mathematically our approach is based on utilizing image patterns
in facial motion. These patterns were revealed by an empirical study which analyzed and compared image motion patterns in
facial expressions. The major contribution of this work is showing how parameterized “ideal” motion templates can generate
facial movies for different people and different expressions, where the parameters are extracted automatically from the image
sequence. To test the quality of the algorithm, image sequences (one of which was taken from a TV news broadcast) were reconstructed,
yielding movies hardly distinguishable from the originals.
Published online: 2 October 2002
Correspondence to: A. Tal
Work has been supported in part by the Israeli Ministry of Industry and Trade, The MOST Consortium 相似文献
5.
Hideaki Goto Hirotomo Aso 《International Journal on Document Analysis and Recognition》2002,4(4):258-268
Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with
various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs,
computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images
with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the
processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns
can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel
labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document
images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color
pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns
from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the
character strokes is more than about 1.5 pixels.
Received July 23, 2001 / Accepted November 5, 2001 相似文献
6.
根据中文支票识别的预处理过程中提取特定目标的需要,研究了多种二值化算法在预处理中的效果.通过分析2000张中文支票灰度图像的直方图,找到了可以用于图像分割的直方图梯度值信息,基于该梯度值信息,提出了一种用于提取支票图像中金额栏外围框线的二值化算法,使得支票灰度图像中的待提取框线更加清晰、凸出,更加易于定位.在中文支票预处理环境下,所提出的二值化算法在与其它多种二值化算法的对比测试中,表现出了更好的效果和更高的效率. 相似文献
7.
Hwan-Chul Park Se-Young Ok Young-Jung Yu Hwan-Gue Cho 《International Journal on Document Analysis and Recognition》2001,4(2):115-130
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer
vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In
this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm
for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation,
and density) of characters and propose a characteristic value for classification using the run-length frequency of the image
component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal
or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved
lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification
and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D
space according to the area of the bounding box and positional information from the document. We conducted tests with more
than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental
results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental
documents.
Received August 3, 2001 / Accepted August 8, 2001 相似文献
8.
A model-driven approach for real-time road recognition 总被引:6,自引:0,他引:6
This article describes a method designed to detect and track road edges starting from images provided by an on-board monocular
monochromic camera. Its implementation on specific hardware is also presented in the framework of the VELAC project. The method
is based on four modules: (1) detection of the road edges in the image by a model-driven algorithm, which uses a statistical
model of the lane sides which manages the occlusions or imperfections of the road marking – this model is initialized by an
off-line training step; (2) localization of the vehicle in the lane in which it is travelling; (3) tracking to define a new
search space of road edges for the next image; and (4) management of the lane numbers to determine the lane in which the vehicle
is travelling. The algorithm is implemented in order to validate the method in a real-time context. Results obtained on marked
and unmarked road images show the robustness and precision of the method.
Received: 18 November 2000 / Accepted: 7 May 2001 相似文献
9.
Amit Kumar Das Sanjoy Kumar Saha Bhabatosh Chanda 《International Journal on Document Analysis and Recognition》2002,4(3):183-190
Document image segmentation is the first step in document image analysis and understanding. One major problem centres on
the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the
Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but
some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms
of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation
algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also
produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid
and mismatched regions.
Received July 14, 2000 / Revised June 12, 2001[-1mm] 相似文献
10.
Anand Mishra Karteek Alahari C. V. Jawahar 《International Journal on Document Analysis and Recognition》2017,20(2):105-121
Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images. 相似文献
11.
Wolfgang Krüger 《Machine Vision and Applications》2001,13(1):38-50
This work investigates map-to-image registration for planar scenes in the context of robust parameter estimation. Registration
is posed as the problem of estimating a projective transformation which optimally aligns transformed model line segments from
a map with data line segments extracted from an image. Matching and parameter estimation is solved simultaneously by optimizing
an objective function which is based on M-estimators, and depends on overlap and the weighted orthogonal distance between
transformed model segments and data segments. An extensive series of registration experiments was conducted to test the performance
of the proposed parameter estimation algorithm. More than 200 000 registration experiments were run with different objective
functions for 12 aerial images and randomly corrupted maps distorted by randomly selected projective transformations.
Received: 10 August 2000 / Accepted: 29 January 2001 相似文献
12.
Abstract. Providing a customized result set based upon a user preference is the ultimate objective of many content-based image retrieval
systems. There are two main challenges in meeting this objective: First, there is a gap between the physical characteristics
of digital images and the semantic meaning of the images. Secondly, different people may have different perceptions on the
same set of images. To address both these challenges, we propose a model, named Yoda, that conceptualizes content-based querying
as the task of soft classifying images into classes. These classes can overlap, and their members are different for different
users. The “soft” classification is hence performed for each and every image feature, including both physical and semantic
features. Subsequently, each image will be ranked based on the weighted aggregation of its classification memberships. The
weights are user-dependent, and hence different users would obtain different result sets for the same query. Yoda employs
a fuzzy-logic based aggregation function for ranking images. We show that, in addition to some performance benefits, fuzzy
aggregation is less sensitive to noise and can support disjunctive queries as compared to weighted-average aggregation used
by other content-based image retrieval systems. Finally, since Yoda heavily relies on user-dependent weights (i.e., user profiles)
for the aggregation task, we utilize the users' relevance feedback to improve the profiles using genetic algorithms (GA).
Our learning mechanism requires fewer user interactions, and results in a faster convergence to the user's preferences as
compared to other learning techniques.
Correspondence to: Y.-S. Chen (E-mail: yishinc@usc.edu)
This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, NIH-NLM R01-LM07061, DARPA and
USAF under agreement nr. F30602-99-1-0524, and unrestricted cash gifts from NCR, Microsoft, and Okawa Foundation. 相似文献
13.
An architecture for handwritten text recognition systems 总被引:1,自引:1,他引:0
Gyeonghwan Kim Venu Govindaraju Sargur N. Srihari 《International Journal on Document Analysis and Recognition》1999,2(1):37-44
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system
are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation
of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line
detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word
gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition,
concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic
constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed
for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are
described in this paper. Preliminary experiments show promising results in terms of speed and accuracy.
Received October 30, 1998 / Revised January 15, 1999 相似文献
14.
Katrin Franke Mario Köppen 《International Journal on Document Analysis and Recognition》2001,3(4):218-231
Computer-based forensic handwriting analysis requires sophisticated methods for the pre-processing of digitized paper documents,
in order to provide high-quality digitized handwriting, which represents the original handwritten product as accurately as
possible. Due to the requirement of processing a huge amount of different document types, neither a standardized queue of
processing stages, fixed parameter sets nor fixed image operations are qualified for such pre-processing methods. Thus, we
present an open layered framework that covers adaptation abilities at the parameter, operator, and algorithm levels. Moreover,
an embedded module, which uses genetic programming, might generate specific filters for background removal on-the-fly. The
framework is understood as an assistance system for forensic handwriting experts and has been in use by the Bundeskriminalamt,
the federal police bureau in Germany, for two years. In the following, the layered framework will be presented, fundamental
document-independent filters for textured, homogeneous background removal and for foreground removal will be described, as
well as aspects of the implementation. Results of the framework-application will also be given.
Received July 12, 2000 / Revised October 13, 2000 相似文献
15.
Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout
of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific
models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building
a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics,
images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and
statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels
for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative
page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented
our classification scheme using decision tree classifiers and self-organizing maps.
Received June 15, 2000 / Revised November 15, 2000 相似文献
16.
Doe-Wan Kim Tapas Kanungo 《International Journal on Document Analysis and Recognition》2002,5(1):47-66
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition
(OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned
document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically
registered the original image to the rescanned one using four corner points and then transformed the original groundtruth
using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing
the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance
between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout
complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using
the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth
for microfilmed and FAXed versions of the University of Washington dataset documents.
Received: July 24, 2001 / Accepted: May 20, 2002 相似文献
17.
Abstract. This paper presents a novel technique for detecting possible defects in two-dimensional wafer images with repetitive patterns
using prior knowledge. The technique has a learning ability that can create a golden-block database from the wafer image itself,
then modify and refine its content when used in further inspections. The extracted building block is stored as a golden block
for the detected pattern. When new wafer images with the same periodical pattern arrive, we do not have to recalculate their
periods and building blocks. A new building block can be derived directly from the existing golden block after eliminating
alignment differences. If the newly derived building block has better quality than the stored golden block, then the golden
block is replaced with the new building block. With the proposed algorithm, our implementation shows that a significant amount
of processing time is saved. Also, the storage overhead of golden templates is reduced significantly by storing golden blocks
only.
Received: 21 February 2001 / Accepted: 21 April 2002
Correspondence to: S.-U. Guan 相似文献
18.
Jin Chen Daniel Lopresti George Nagy 《International Journal on Document Analysis and Recognition》2016,19(4):321-333
Many preprocessing techniques intended to normalize artifacts and clean noise induce anomalies in part due to the discretized nature of the document image and in part due to inherent ambiguity in the input image relative to the desired transformation. The potentially deleterious effects of common preprocessing methods are illustrated through a series of dramatic albeit contrived examples and then shown to affect real applications of ongoing interest to the community through three writer identification experiments conducted on Arabic handwriting. Retaining ruling lines detected by multi-line linear regression instead of repairing strokes broken by deleting ruling lines reduced the error rate by 4.5 %. Exploiting word position relative to detected rulings instead of ignoring it decreased errors by 5.5 %. Counteracting page skew by rotating extracted contours during feature extraction instead of rectifying the page image reduced the error by 1.4 %. All of these accuracy gains are shown to be statistically significant. Analogous methods are advocated for other document processing tasks as topics for future research. 相似文献
19.
Nikolai Gorski Valery Anisimov Emmanuel Augustin Olivier Baret Sergey Maximov 《International Journal on Document Analysis and Recognition》2001,3(4):196-206
This paper presents the current state of the A2iA CheckReaderTM – a commercial bank check recognition system. The system is designed to process the flow of payment documents associated
with the check clearing process: checks themselves, deposit slips, money orders, cash tickets, etc. It processes document
images and recognizes document amounts whatever their style and type – cursive, hand- or machine printed – expressed as numerals
or as phrases. The system is adapted to read payment documents issued in different English- or French-speaking countries.
It is currently in use at more than 100 large sites in five countries and processes daily over 10 million documents. The average
read rate at the document level varies from 65 to 85% with a misread rate corresponding to that of a human operator (1%).
Received October 13, 2000 / Revised December 4, 2000 相似文献
20.
NeTra: A toolbox for navigating large image databases 总被引:17,自引:0,他引:17
We present here an implementation of NeTra, a prototype image retrieval system that uses color, texture, shape and spatial
location information in segmented image regions to search and retrieve similar regions from the database. A distinguishing
aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object- or region-based
search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects.
Images are segmented into homogeneous regions at the time of ingest into the database, and image attributes that represent
each of these regions are computed. In addition to image segmentation, other important components of the system include an
efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation
allows the user to compose interesting queries such as “retrieve all images that contain regions that have the color of object
A, texture of object B, shape of object C, and lie in the upper of the image”, where the individual objects could be regions
belonging to different images. A Java-based web implementation of NeTra is available at http://vivaldi.ece.ucsb.edu/Netra. 相似文献