期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An empirical measure of the performance of a document image segmentation algorithm

Amit Kumar Das Sanjoy Kumar Saha Bhabatosh Chanda 《International Journal on Document Analysis and Recognition》2002,4(3):183-190

Document image segmentation is the first step in document image analysis and understanding. One major problem centres on the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid and mismatched regions. Received July 14, 2000 / Revised June 12, 2001[-1mm] 相似文献

2.

Efficient and flexible text extraction from document pages

Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79

This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation accuracy achieved by the algorithm as a function of noise and skew has been carried out. Received April 4, 1999 / Revised June 1, 1999 相似文献

3.

Rule-based document structure understanding with a fuzzy combination of layout and textual features

Stefan Klink Thomas Kieninger 《International Journal on Document Analysis and Recognition》2001,4(1):18-26

Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document ‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules can be formulated based on features which might be observed within one specific layout object. However, rules can also express dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common objects (e.g., lists). Received June 19, 2000 / Revised November 8, 2000 相似文献

4.

Document skew estimation without angle range restriction

Oleg Okun Matti Pietikäinen Jaakko Sauvola 《International Journal on Document Analysis and Recognition》1999,2(2-3):132-144

The existing skew estimation techniques usually assume that the input image is of high resolution and that the detectable angle range is limited. We present a more generic solution for this task that overcomes these restrictions. Our method is based on determination of the first eigenvector of the data covariance matrix. The solution comprises image resolution reduction, connected component analysis, component classification using a fuzzy approach, and skew estimation. Experiments on a large set of various document images and performance comparison with two Hough transform-based methods show a good accuracy and robustness for our method. Received October 10, 1998 / Revised version September 9, 1999 相似文献

5.

Affine-invariant curve normalization for object shape representation, classification, and retrieval 总被引：1，自引：0，他引：1

Yannis Avrithis Yiannis Xirouhakis Stefanos Kollias 《Machine Vision and Applications》2001,13(2):80-94

相似文献

6.

Evaluating the performance of table processing algorithms

J. Hu R.S. Kashi D. Lopresti G.T. Wilfong 《International Journal on Document Analysis and Recognition》2002,4(3):140-153

While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. Received July 18, 2000 / Accepted October 4, 2001 相似文献

7.

OCRSpell: an interactive spelling correction system for OCR errors in text

Kazem Taghva Eric Stofsky 《International Journal on Document Analysis and Recognition》2001,3(3):125-137

In this paper, we describe a spelling correction system designed specifically for OCR-generated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well. Received August 16, 2000 / Revised October 6, 2000 相似文献

8.

Automatic document classification and indexing in high-volume applications

E. Appiani F. Cesarini A.M. Colla M. Diligenti M. Gori S. Marinai G. Soda 《International Journal on Document Analysis and Recognition》2001,4(2):69-83

In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution. Received March 30, 2000 / Revised June 26, 2001 相似文献

9.

Extraction of type style-based meta-information from imaged documents

B.B. Chaudhuri U. Garain 《International Journal on Document Analysis and Recognition》2001,3(3):138-149

Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i) different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach on a large number of good quality, as well as degraded, document images are presented. Received July 12, 2000 / Revised October 1, 2000 相似文献

10.

Range image segmentation using local approximation of scan lines with application to CAD model acquisition

Inas Khalifa Medhat Moussa Mohamed Kamel 《Machine Vision and Applications》2003,13(5-6):263-274

Abstract. Automatic acquisition of CAD models from existing objects requires accurate extraction of geometric and topological information from the input data. This paper presents a range image segmentation method based on local approximation of scan lines. The method employs edge models that are capable of detecting noise pixels as well as position and orientation discontinuities of varying strengths. Region-based techniques are then used to achieve a complete segmentation. Finally, a geometric representation of the scene, in the form of a surface CAD model, is produced. Experimental results on a large number of real range images acquired by different range sensors demonstrate the efficiency and robustness of the method. Received: 1 August 2000 / Accepted: 23 January 2002 Correspondence to: I. Khalifa 相似文献

11.

A noise attribute thresholding method for document image binarization

Hon-Son Don 《International Journal on Document Analysis and Recognition》2001,4(2):131-138

A new thresholding method, called the noise attribute thresholding method (NAT), for document image binarization is presented in this paper. This method utilizes the noise attribute features extracted from the images to make the selection of threshold values for image thresholding. These features are based on the properties of noise in the images and are independent of the strength of the signals (objects and background) in the image. A simple noise model is given to explain these noise properties. The NAT method has been applied to the problem of removing text and figures printed on the back of the paper. Conventional global thresholding methods cannot solve this kind of problem satisfactorily. Experimental results show that the NAT method is very effective. Received July 05, 1999 / Revised July 07, 2000 相似文献

12.

Industrial bank check processing: the A2iA CheckReaderTM

Nikolai Gorski Valery Anisimov Emmanuel Augustin Olivier Baret Sergey Maximov 《International Journal on Document Analysis and Recognition》2001,3(4):196-206

This paper presents the current state of the A2iA CheckReaderTM – a commercial bank check recognition system. The system is designed to process the flow of payment documents associated with the check clearing process: checks themselves, deposit slips, money orders, cash tickets, etc. It processes document images and recognizes document amounts whatever their style and type – cursive, hand- or machine printed – expressed as numerals or as phrases. The system is adapted to read payment documents issued in different English- or French-speaking countries. It is currently in use at more than 100 large sites in five countries and processes daily over 10 million documents. The average read rate at the document level varies from 65 to 85% with a misread rate corresponding to that of a human operator (1%). Received October 13, 2000 / Revised December 4, 2000 相似文献

13.

Transforming paper documents into XML format with WISDOM++ 总被引：1，自引：1，他引：0

Oronzo Altamura Floriana Esposito Donato Malerba 《International Journal on Document Analysis and Recognition》2001,4(1):2-17

The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported. Received June 15, 2000 / Revised November 7, 2000 相似文献

14.

A computer-based system to support forensic studies on handwritten documents

Katrin Franke Mario Köppen 《International Journal on Document Analysis and Recognition》2001,3(4):218-231

Computer-based forensic handwriting analysis requires sophisticated methods for the pre-processing of digitized paper documents, in order to provide high-quality digitized handwriting, which represents the original handwritten product as accurately as possible. Due to the requirement of processing a huge amount of different document types, neither a standardized queue of processing stages, fixed parameter sets nor fixed image operations are qualified for such pre-processing methods. Thus, we present an open layered framework that covers adaptation abilities at the parameter, operator, and algorithm levels. Moreover, an embedded module, which uses genetic programming, might generate specific filters for background removal on-the-fly. The framework is understood as an assistance system for forensic handwriting experts and has been in use by the Bundeskriminalamt, the federal police bureau in Germany, for two years. In the following, the layered framework will be presented, fundamental document-independent filters for textured, homogeneous background removal and for foreground removal will be described, as well as aspects of the implementation. Results of the framework-application will also be given. Received July 12, 2000 / Revised October 13, 2000 相似文献

15.

Defect detection in textured surfaces using color ring-projection correlation 总被引：3，自引：0，他引：3

Du-Ming Tsai Ya-Hui Tsai 《Machine Vision and Applications》2003,13(4):194-200

In this paper, we present a correlation scheme that incorporates a color ring-projection representation for the automatic inspection of defects in textured surfaces. The proposed color ring projection transforms a 2-D color image into a 1-D color pattern as a function of radius. For a search window of width W, data dimensionality is reduced from in the 2-D image to O(W) in the 1-D ring-projection space. The complexity of computing a correlation function is significantly reduced accordingly. Since the color ring-projection representation is invariant to rotation, the proposed method can be applied for both isotropic and oriented textures at arbitrary orientations. Experiments on regular textured surfaces have shown the efficacy of the proposed method. Received: 30 March 2000 / Accepted: 24 July 2001 Correspondence to: D.-M. Tsai (e-mail: iedmtsai@saturn.yzu.edu.tw) 相似文献

16.

Classification of document pages using structure-based features

Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247

Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics, images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented our classification scheme using decision tree classifiers and self-organizing maps. Received June 15, 2000 / Revised November 15, 2000 相似文献

17.

Efficient extraction of primitives from line drawings composed of horizontal and vertical lines 总被引：6，自引：0，他引：6

Juan F. Arias Rangachar Kasturi 《Machine Vision and Applications》1997,10(4):214-221

The performance of the algorithms for the extraction of primitives for the interpretation of line drawings is usually affected by the degradation of the information contained in the document due to factors such as low print contrast, defocusing, skew, etc. In this paper, we are proposing two algorithms for the extraction of primitives with good performance under degradation. The application of the algorithms is restricted to line drawings composed of horizontal and vertical lines. The performance of the algorithms has been evaluated by using a protocol described in the literature. Received: 6 August 1996 / Accepted: 16 July 1997 相似文献

18.

Identifying facsimile duplicates using radial pixel densities 总被引：2，自引：0，他引：2

P. Chatelain 《International Journal on Document Analysis and Recognition》2002,4(4):219-225

A method for detecting full layout facsimile duplicates based on radial pixel densities is proposed. It caters for facsimiles, including text and/or graphics. Pages may be positioned upright or inverted on the scanner bed. The method is not dependent on the computation of text skew or text orientation. Using a database of original documents, 92% of non-duplicates and upright duplicates as well as 89% of inverted duplicates could be correctly identified. The method is vulnerable to double scanning. This occurs when documents are copied using a photocopier and the copies are subsequently transmitted using a facsimile machine. Received September 29, 2000 / Revised: August 23, 2001 相似文献

19.

Iterative model-based binarization algorithm for cheque images

Amer Dawoud Mohamed Kamel 《International Journal on Document Analysis and Recognition》2002,5(1):28-38

Binarization of document images with poor contrast, strong noise, complex patterns, and variable modalities in the gray-scale histograms is a challenging problem. A new binarization algorithm has been developed to address this problem for personal cheque images. The main contribution of this approach is optimizing the binarization of a part of the document image that suffers from noise interference, referred to as the Target Sub-Image (TSI), using information easily extracted from another noise-free part of the same image, referred to as the Model Sub-Image (MSI). Simple spatial features extracted from MSI are used as a model for handwriting strokes. This model captures the underlying characteristics of the writing strokes, and is invariant to the handwriting style or content. This model is then utilized to guide the binarization in the TSI. Another contribution is a new technique for the structural analysis of document images, which we call “Wavelet Partial Reconstruction” (WPR). The algorithm was tested on 4,200 cheque images and the results show significant improvement in binarization quality in comparison with other well-established algorithms. Received: October 10, 2001 / Accepted: May 7, 2002 This research was supported in part by NCR and NSERC's industrial postgraduate scholarship No. 239464. A simplified version of this paper has been presented at ICDAR 2001 [3]. 相似文献

20.

Intelligent non-visual navigation of complex HTML structures

E. Pontelli D. Gillan G. Gupta A. Karshmer E. Saad W. Xiong 《Universal Access in the Information Society》2002,2(1):56-69

This paper provides an overview of a project aimed at using knowledge-based technology to improve accessibility of the Web for visually impaired users. The focus is on the multi-dimensional components of Web pages (tables and frames); our cognitive studies demonstrate that spatial information is essential in comprehending tabular data, and this aspect has been largely overlooked in the existing literature. Our approach addresses these issues by using explicit representations of the navigational semantics of the documents and using a domain-specific language to query the semantic representation and derive navigation strategies. Navigational knowledge is explicitly generated and associated to the tabular and multi-dimensional HTML structures of documents. This semantic representation provides to the blind user an abstract representation of the layout of the document; the user is then allowed to issue commands from the domain-specific language to access and traverse the document according to its abstract layout. Published online: 6 November 2002 相似文献