首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we consider a statistical approach to augment a limited database of groundtruth documents for use in evaluation of optical character recognition software. A modified moving-blocks bootstrap procedure is used to construct surrogate documents for this purpose which prove to serve effectively and, in some regards, indistinguishably from groundtruth. The proposed method is validated through a rigorous statistical procedure. Received: March 30, 2000 / Revised: September 14, 2001  相似文献   

2.
This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that high accuracies can be achieved at moderate memory space and computation cost. The performance is measured in terms of classification accuracy, sensitivity to training sample size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers is enhanced by training with synthesized outlier data. The classifiers are tested on a large data set extracted from NIST SD19. As results, the test accuracies of the evaluated classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule and regularized discriminant analysis (RDA). It is shown that neural classifiers are more susceptible to small sample size than MQDF, although they yield higher accuracies on large sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier rejection even though it is not trained with outlier data. The results indicate that pattern classifiers have complementary advantages and they should be appropriately combined to achieve higher performance. Received: July 18, 2001 / Accepted: September 28, 2001  相似文献   

3.
Perceptual correction for colour grading of random textures   总被引:1,自引:0,他引:1  
We present a method of colour shade grading for industrial inspection of random textures, the differences of which are at the threshold of human perception. This method uses image restoration techniques to recover an unblurred version of the image, and then blurs it the same way as the human visual system does, to emulate the process of the image being captured by the human sensor. Subsequently, the colour image is transformed into a perceptually uniform colour space, where colour gra ding takes place. Received: 10 October 1998 / Accepted: 21 March 2000  相似文献   

4.
In the literature, many feature types are proposed for document classification. However, an extensive and systematic evaluation of the various approaches has not yet been done. In particular, evaluations on OCR documents are very rare. In this paper we investigate seven text representations based on n-grams and single words. We compare their effectiveness in classifying OCR texts and the corresponding correct ASCII texts in two domains: business letters and abstracts of technical reports. Our results indicate that the use of n-grams is an attractive technique which can even compare to techniques relying on a morphological analysis. This holds for OCR texts as well as for correct ASCII texts. Received February 17, 1998 / Revised April 8, 1998  相似文献   

5.
Stop word location and identification for adaptive text recognition   总被引:2,自引:0,他引:2  
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the most likely candidates for those words, using only widths of the word images. The identity of each word is determined using a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on 90% of all the pages. These can serve as useful seeds to bootstrap font learning. Received October 8, 1999 / Revised March 29, 2000  相似文献   

6.
In this paper, a two-stage HMM-based recognition method allows us to compensate for the possible loss in terms of recognition performance caused by the necessary trade-off between segmentation and recognition in an implicit segmentation-based strategy. The first stage consists of an implicit segmentation process that takes into account some contextual information to provide multiple segmentation-recognition hypotheses for a given preprocessed string. These hypotheses are verified and re-ranked in a second stage by using an isolated digit classifier. This method enables the use of two sets of features and numeral models: one taking into account both the segmentation and recognition aspects in an implicit segmentation-based strategy, and the other considering just the recognition aspects of isolated digits. These two stages have been shown to be complementary, in the sense that the verification stage compensates for the loss in terms of recognition performance brought about by the necessary tradeoff between segmentation and recognition carried out in the first stage. The experiments on 12,802 handwritten numeral strings of different lengths have shown that the use of a two-stage recognition strategy is a promising idea. The verification stage brought about an average improvement of 9.9% on the string recognition rates. On touching digit pairs, the method achieved a recognition rate of 89.6%. Received June 28, 2002 / Revised July 03, 2002  相似文献   

7.
We present a method of colour shade grading for industrial inspection of surfaces, the differences of which are at the threshold of human perception. This method converts the input data from the electronic sensor to the corresponding data as they would have been viewed using the human vision system. Then their differences are computed using a perceptually uniform colour space, thus approximating the way the human experts would grade the product. The transformation from the electronic sensor to the human sensor makes use of synthetic metameric data to determine the transformation parameters. The method has been tested using real data. Received: 17 November 1997 / Accepted: 15 September 1998  相似文献   

8.
We present a novel technique, called 2-Phase Service Model, for streaming videos to home users in a limited-bandwidth environment. This scheme first delivers some number of non-adjacent data fragments to the client in Phase 1. The missing fragments are then transmitted in Phase 2 as the client is playing back the video. This approach offers many benefits. The isochronous bandwidth required for Phase 2 can be controlled within the capability of the transport medium. The data fragments received during Phase 1 can be used to provide an excellent preview of the video. They can also be used to facilitate VCR-style operations such as fast-forward and fast-reverse. Systems designed based on this method are less expensive because the fast-forward and fast-reverse versions of the video files are no longer needed. Eliminating these files also improves system performance because mapping between the regular files and their fast-forward and fast-reverse versions is no longer part of the VCR operations. Furthermore, since each client machine handles its own VCR-style interaction, this technique is very scalable. We provide simulation results to show that 2-Phase Service Model is able to handle VCR functions efficiently. We also implement a video player called {\em FRVplayer}. With this prototype, we are able to judge that the visual quality of the previews and VCR-style operations is excellent. These features are essential to many important applications. We discuss the application of FRVplayer in the design of a video management system, called VideoCenter. This system is intended for Internet applications such as digital video libraries.  相似文献   

9.
Dot-matrix text recognition is a difficult problem, especially when characters are broken into several disconnected components. We present a dot-matrix text recognition system which uses the fact that dot-matrix fonts are fixed-pitch, in order to overcome the difficulty of the segmentation process. After finding the most likely pitch of the text, a decision is made as to whether the text is written in a fixed-pitch or proportional font. Fixed-pitch text is segmented using a pitch-based segmentation process that can successfully segment both touching and broken characters. We report performance results for the pitch estimation, fixed-pitch decision and segmentation, and recognition processes. Received October 18, 1999 / Revised April 21, 2000  相似文献   

10.
In this paper we describe the connectionist-based classification engine of an OCR system. The classification engine is based on a new modular connectionist architecture, where a multilayer perceptron (MLP) acting as a classifier is properly combined with a set of autoassociators – one for each class – trained to copy the input to the output layer. The MLP-based classifier selects a small group of classes with high score, that are afterwards verified by the corresponding autoassociators. The learning samples used to train the classifiers are constructed by means of a synthetic noise generator starting from few grey level characters labeled by the user. We report experimental results for comparing three neural architectures: an MLP-based classifier, an autoassociator-based classifier, and the proposed combined architecture. The experiments show that the proposed architecture exhibits the best performance, without increasing significantly the computational burden. Received March 6, 2000 / Revised July 12, 2000  相似文献   

11.
In this paper, we describe a spelling correction system designed specifically for OCR-generated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well. Received August 16, 2000 / Revised October 6, 2000  相似文献   

12.
13.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

14.
Optical character reader (OCR) misrecognition is a serious problem when OCR-recognized text is used for retrieval purposes in digital libraries. We have proposed fuzzy retrieval methods that, instead of correcting the errors manually, assume that errors remain in the recognized text. Costs are thereby reduced. The proposed methods generate multiple search terms for each input query term by referring to confusion matrices, which store all characters likely to be misrecognized and the respective probability of each misrecognition. The proposed methods can improve recall rates without decreasing precision rates. However, a few million search terms are occasionally generated in English-text fuzzy retrieval, giving an intolerable effect on retrieval speed. Therefore, this paper presents two remedies to reduce the number of generated search terms while maintaining retrieval effectiveness. One remedy is to restrict the number of errors included in each expanded search term, while the other is to introduce another validity value different to our conventional one. Experimental results indicate that the former remedy reduced the number of terms to about 50 and the latter to not more than 20. Received: 18 December 1998 / Revised: 31 May 1999  相似文献   

15.
A labelling approach for the automatic recognition of tables of contents (ToC) is described in this paper. A prototype is used for the electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labelling without using any a priori model. Labelling is based on part-of-speech tagging (PoS) which is initiated by a primary labelling of text components using some specific dictionaries. Significant tags are first grouped into homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to article fields: “title” and “authors”. Non-labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected articles. The designed prototype operates very well on different ToC layouts and character recognition qualities. Without manual intervention, a 96.3% rate of correct segmentation was obtained on 38 journals, including 2,020 articles, accompanied by a 93.0% rate of correct field extraction. Received April 5, 2000 / Revised February 19, 2001  相似文献   

16.
The Levenshtein distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein automata of degree n for a word W are defined as finite state automata that recognize the set of all words V where the Levenshtein distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W, a deterministic Levenshtein automaton of degree n for W in time linear to the length of W. Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein automata and leads to even improved efficiency. Evaluation results are given that also address variants of both methods that are based on modified Levenshtein distances where further primitive edit operations (transpositions, merges and splits) are used. Received: 13 February 2002 / Accepted: 13 March 2002  相似文献   

17.
In this paper, we consider the general problem of technical document interpretation, as applied to the documents of the French Telephonic Operator, France Télécom. More precisely, we focus the content of this paper on the computation of a new set of features allowing the classification of multioriented and multiscaled patterns. This set of invariants is based on the Fourier–Mellin Transform. The interests of this computation rely on the excellent classification rate obtained with this method and also on using this Fourier–Mellin transform within a “filtering mode”, with which we can solve the well known difficult problem of connected character recognition.  相似文献   

18.
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i) different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach on a large number of good quality, as well as degraded, document images are presented. Received July 12, 2000 / Revised October 1, 2000  相似文献   

19.
Recognizing acronyms and their definitions   总被引:1,自引:0,他引:1  
This paper introduces an automatic method for finding acronyms and their definitions in free text. The method is based on an inexact pattern matching algorithm applied to text surrounding the possible acronym. Evaluation shows both high recall and precision for a set of documents randomly selected from a larger set of full text documents. Received October 1, 1997 / Revised September 8, 1998  相似文献   

20.
The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader), which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multi-frame integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method, and intermediate character recognition results are used to improve the segmentation. We also include a method for locating text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号