首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
A new thresholding method, called the noise attribute thresholding method (NAT), for document image binarization is presented in this paper. This method utilizes the noise attribute features extracted from the images to make the selection of threshold values for image thresholding. These features are based on the properties of noise in the images and are independent of the strength of the signals (objects and background) in the image. A simple noise model is given to explain these noise properties. The NAT method has been applied to the problem of removing text and figures printed on the back of the paper. Conventional global thresholding methods cannot solve this kind of problem satisfactorily. Experimental results show that the NAT method is very effective. Received July 05, 1999 / Revised July 07, 2000  相似文献   

2.
Abstract. For document images corrupted by various kinds of noise, direct binarization images may be severely blurred and degraded. A common treatment for this problem is to pre-smooth input images using noise-suppressing filters. This article proposes an image-smoothing method used for prefiltering the document image binarization. Conceptually, we propose that the influence range of each pixel affecting its neighbors should depend on local image statistics. Technically, we suggest using coplanar matrices to capture the structural and textural distribution of similar pixels at each site. This property adapts the smoothing process to the contrast, orientation, and spatial size of local image structures. Experimental results demonstrate the effectiveness of the proposed method, which compares favorably with existing methods in reducing noise and preserving image features. In addition, due to the adaptive nature of the similar pixel definition, the proposed filter output is more robust regarding different noise levels than existing methods. Received: October 31, 2001 / October 09, 2002 Correspondence to:L. Fan (e-mail: fanlixin@ieee.org)  相似文献   

3.
The automation of business form processing is attracting intensive research interests due to its wide application and its reduction of the heavy workload due to manual processing. Preparing clean and clear images for the recognition engines is often taken for granted as a trivial task that requires little attention. In reality, handwritten data usually touch or cross the preprinted form frames and texts, creating tremendous problems for the recognition engines. In this paper, we contribute answers to two questions: “Why do we need cleaning and enhancement procedures in form processing systems?” and “How can we clean and enhance the hand-filled items with easy implementation and high processing speed?” Here, we propose a generic system including only cleaning and enhancing phases. In the cleaning phase, the system registers a template to the input form by aligning corresponding landmarks. A unified morphological scheme is proposed to remove the form frames and restore the broken handwriting from gray or binary images. When the handwriting is found touching or crossing preprinted texts, morphological operations based on statistical features are used to clean it. In applications where a black-and-white scanning mode is adopted, handwriting may contain broken or hollow strokes due to improper thresholding parameters. Therefore, we have designed a module to enhance the image quality based on morphological operations. Subjective and objective evaluations have been studied to show the effectiveness of the proposed procedures. Received January 19, 2000 / Revised March 20, 2001  相似文献   

4.
Image-based animation of facial expressions   总被引:1,自引:0,他引:1  
We present a novel technique for creating realistic facial animations given a small number of real images and a few parameters for the in-between images. This scheme can also be used for reconstructing facial movies where the parameters can be automatically extracted from the images. The in-between images are produced without ever generating a three-dimensional model of the face. Since facial motion due to expressions are not well defined mathematically our approach is based on utilizing image patterns in facial motion. These patterns were revealed by an empirical study which analyzed and compared image motion patterns in facial expressions. The major contribution of this work is showing how parameterized “ideal” motion templates can generate facial movies for different people and different expressions, where the parameters are extracted automatically from the image sequence. To test the quality of the algorithm, image sequences (one of which was taken from a TV news broadcast) were reconstructed, yielding movies hardly distinguishable from the originals. Published online: 2 October 2002 Correspondence to: A. Tal Work has been supported in part by the Israeli Ministry of Industry and Trade, The MOST Consortium  相似文献   

5.
Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs, computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the character strokes is more than about 1.5 pixels. Received July 23, 2001 / Accepted November 5, 2001  相似文献   

6.
李良华  罗彬杰 《计算机科学》2009,36(12):282-284
根据中文支票识别的预处理过程中提取特定目标的需要,研究了多种二值化算法在预处理中的效果.通过分析2000张中文支票灰度图像的直方图,找到了可以用于图像分割的直方图梯度值信息,基于该梯度值信息,提出了一种用于提取支票图像中金额栏外围框线的二值化算法,使得支票灰度图像中的待提取框线更加清晰、凸出,更加易于定位.在中文支票预处理环境下,所提出的二值化算法在与其它多种二值化算法的对比测试中,表现出了更好的效果和更高的效率.  相似文献   

7.
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation, and density) of characters and propose a characteristic value for classification using the run-length frequency of the image component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D space according to the area of the bounding box and positional information from the document. We conducted tests with more than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental documents. Received August 3, 2001 / Accepted August 8, 2001  相似文献   

8.
A model-driven approach for real-time road recognition   总被引:6,自引:0,他引:6  
This article describes a method designed to detect and track road edges starting from images provided by an on-board monocular monochromic camera. Its implementation on specific hardware is also presented in the framework of the VELAC project. The method is based on four modules: (1) detection of the road edges in the image by a model-driven algorithm, which uses a statistical model of the lane sides which manages the occlusions or imperfections of the road marking – this model is initialized by an off-line training step; (2) localization of the vehicle in the lane in which it is travelling; (3) tracking to define a new search space of road edges for the next image; and (4) management of the lane numbers to determine the lane in which the vehicle is travelling. The algorithm is implemented in order to validate the method in a real-time context. Results obtained on marked and unmarked road images show the robustness and precision of the method. Received: 18 November 2000 / Accepted: 7 May 2001  相似文献   

9.
Document image segmentation is the first step in document image analysis and understanding. One major problem centres on the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid and mismatched regions. Received July 14, 2000 / Revised June 12, 2001[-1mm]  相似文献   

10.
Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images.  相似文献   

11.
This work investigates map-to-image registration for planar scenes in the context of robust parameter estimation. Registration is posed as the problem of estimating a projective transformation which optimally aligns transformed model line segments from a map with data line segments extracted from an image. Matching and parameter estimation is solved simultaneously by optimizing an objective function which is based on M-estimators, and depends on overlap and the weighted orthogonal distance between transformed model segments and data segments. An extensive series of registration experiments was conducted to test the performance of the proposed parameter estimation algorithm. More than 200 000 registration experiments were run with different objective functions for 12 aerial images and randomly corrupted maps distorted by randomly selected projective transformations. Received: 10 August 2000 / Accepted: 29 January 2001  相似文献   

12.
Abstract. Providing a customized result set based upon a user preference is the ultimate objective of many content-based image retrieval systems. There are two main challenges in meeting this objective: First, there is a gap between the physical characteristics of digital images and the semantic meaning of the images. Secondly, different people may have different perceptions on the same set of images. To address both these challenges, we propose a model, named Yoda, that conceptualizes content-based querying as the task of soft classifying images into classes. These classes can overlap, and their members are different for different users. The “soft” classification is hence performed for each and every image feature, including both physical and semantic features. Subsequently, each image will be ranked based on the weighted aggregation of its classification memberships. The weights are user-dependent, and hence different users would obtain different result sets for the same query. Yoda employs a fuzzy-logic based aggregation function for ranking images. We show that, in addition to some performance benefits, fuzzy aggregation is less sensitive to noise and can support disjunctive queries as compared to weighted-average aggregation used by other content-based image retrieval systems. Finally, since Yoda heavily relies on user-dependent weights (i.e., user profiles) for the aggregation task, we utilize the users' relevance feedback to improve the profiles using genetic algorithms (GA). Our learning mechanism requires fewer user interactions, and results in a faster convergence to the user's preferences as compared to other learning techniques. Correspondence to: Y.-S. Chen (E-mail: yishinc@usc.edu) This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, NIH-NLM R01-LM07061, DARPA and USAF under agreement nr. F30602-99-1-0524, and unrestricted cash gifts from NCR, Microsoft, and Okawa Foundation.  相似文献   

13.
An architecture for handwritten text recognition systems   总被引:1,自引:1,他引:0  
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy. Received October 30, 1998 / Revised January 15, 1999  相似文献   

14.
Computer-based forensic handwriting analysis requires sophisticated methods for the pre-processing of digitized paper documents, in order to provide high-quality digitized handwriting, which represents the original handwritten product as accurately as possible. Due to the requirement of processing a huge amount of different document types, neither a standardized queue of processing stages, fixed parameter sets nor fixed image operations are qualified for such pre-processing methods. Thus, we present an open layered framework that covers adaptation abilities at the parameter, operator, and algorithm levels. Moreover, an embedded module, which uses genetic programming, might generate specific filters for background removal on-the-fly. The framework is understood as an assistance system for forensic handwriting experts and has been in use by the Bundeskriminalamt, the federal police bureau in Germany, for two years. In the following, the layered framework will be presented, fundamental document-independent filters for textured, homogeneous background removal and for foreground removal will be described, as well as aspects of the implementation. Results of the framework-application will also be given. Received July 12, 2000 / Revised October 13, 2000  相似文献   

15.
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics, images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented our classification scheme using decision tree classifiers and self-organizing maps. Received June 15, 2000 / Revised November 15, 2000  相似文献   

16.
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition (OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically registered the original image to the rescanned one using four corner points and then transformed the original groundtruth using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth for microfilmed and FAXed versions of the University of Washington dataset documents. Received: July 24, 2001 / Accepted: May 20, 2002  相似文献   

17.
Abstract. This paper presents a novel technique for detecting possible defects in two-dimensional wafer images with repetitive patterns using prior knowledge. The technique has a learning ability that can create a golden-block database from the wafer image itself, then modify and refine its content when used in further inspections. The extracted building block is stored as a golden block for the detected pattern. When new wafer images with the same periodical pattern arrive, we do not have to recalculate their periods and building blocks. A new building block can be derived directly from the existing golden block after eliminating alignment differences. If the newly derived building block has better quality than the stored golden block, then the golden block is replaced with the new building block. With the proposed algorithm, our implementation shows that a significant amount of processing time is saved. Also, the storage overhead of golden templates is reduced significantly by storing golden blocks only. Received: 21 February 2001 / Accepted: 21 April 2002 Correspondence to: S.-U. Guan  相似文献   

18.
Many preprocessing techniques intended to normalize artifacts and clean noise induce anomalies in part due to the discretized nature of the document image and in part due to inherent ambiguity in the input image relative to the desired transformation. The potentially deleterious effects of common preprocessing methods are illustrated through a series of dramatic albeit contrived examples and then shown to affect real applications of ongoing interest to the community through three writer identification experiments conducted on Arabic handwriting. Retaining ruling lines detected by multi-line linear regression instead of repairing strokes broken by deleting ruling lines reduced the error rate by 4.5 %. Exploiting word position relative to detected rulings instead of ignoring it decreased errors by 5.5 %. Counteracting page skew by rotating extracted contours during feature extraction instead of rectifying the page image reduced the error by 1.4  %. All of these accuracy gains are shown to be statistically significant. Analogous methods are advocated for other document processing tasks as topics for future research.  相似文献   

19.
This paper presents the current state of the A2iA CheckReaderTM – a commercial bank check recognition system. The system is designed to process the flow of payment documents associated with the check clearing process: checks themselves, deposit slips, money orders, cash tickets, etc. It processes document images and recognizes document amounts whatever their style and type – cursive, hand- or machine printed – expressed as numerals or as phrases. The system is adapted to read payment documents issued in different English- or French-speaking countries. It is currently in use at more than 100 large sites in five countries and processes daily over 10 million documents. The average read rate at the document level varies from 65 to 85% with a misread rate corresponding to that of a human operator (1%). Received October 13, 2000 / Revised December 4, 2000  相似文献   

20.
NeTra: A toolbox for navigating large image databases   总被引:17,自引:0,他引:17  
We present here an implementation of NeTra, a prototype image retrieval system that uses color, texture, shape and spatial location information in segmented image regions to search and retrieve similar regions from the database. A distinguishing aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object- or region-based search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects. Images are segmented into homogeneous regions at the time of ingest into the database, and image attributes that represent each of these regions are computed. In addition to image segmentation, other important components of the system include an efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation allows the user to compose interesting queries such as “retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper of the image”, where the individual objects could be regions belonging to different images. A Java-based web implementation of NeTra is available at http://vivaldi.ece.ucsb.edu/Netra.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号