首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Automated recognition of unconstrained handwriting continues to be a challenging research task. In contrast to the traditional role of handwriting recognition in applications such as postal automation and bank check reading, in this paper, we explore the use of handwriting recognition in designing CAPTCHAs for cyber security. CAPTCHAs (Completely Automatic Public Turing tests to tell Computers and Humans Apart) are automatic reverse Turing tests designed so that virtually all humans can pass the test, but state-of-the-art computer programs will fail. Machine-printed, text-based CAPTCHAs are now commonly used to defend against bot attacks. Our focus is on exploring the generation and use of handwritten CAPTCHAs. We have used a large repository of handwritten word images that current handwriting recognizers cannot read (even when provided with a lexicon) for this purpose and also used synthetic handwritten samples. We take advantage of both our knowledge of the common source of errors in automated handwriting recognition systems as well as the salient aspects of human reading. The simultaneous interplay of several Gestalt laws of perception and the geon theory of pattern recognition (that implies object recognition occurs by components) allows us to explore the parameters that truly separate human and machine abilities.  相似文献   

2.
子图验证码     
本文提出一种简单的方法用于区分人类用户和计算机程序,称之为子图验证码.在子图验证码中,采用中文随机特征码,通过设置字体、背景加噪、扭曲图像等步骤完成对子图验证码的预处理.之后,以子图形式把子图验证码呈现出来.考虑到计算机程序在识别中文、噪声、粘连字符、扭曲图像、分离图像等方面的缺陷,人类用户很容易被区分出来,子图验证码正是利用这一特点来区分人类用户和计算机程序.最后,子图验证码采用C#语言并结合ASP.NET技术实现.  相似文献   

3.
Offline handwritten Amharic word recognition   总被引:1,自引:0,他引:1  
This paper describes two approaches for Amharic word recognition in unconstrained handwritten text using HMMs. The first approach builds word models from concatenated features of constituent characters and in the second method HMMs of constituent characters are concatenated to form word model. In both cases, the features used for training and recognition are a set of primitive strokes and their spatial relationships. The recognition system does not require segmentation of characters but requires text line detection and extraction of structural features, which is done by making use of direction field tensor. The performance of the recognition system is tested by a dataset of unconstrained handwritten documents collected from various sources, and promising results are obtained.  相似文献   

4.
5.
Reference line information has been used for diverse purposes in handwriting research, including word case classification, OCR, and holistic word recognition. In this paper, we argue that the commonly used global reference lines are inadequate for many handwritten phrase recognition applications. Individual words may be written at different orientations or vertically displaced with respect to one another. A function used to approximate the implicit baseline will not be differentiable or even continuous at some points. We have presented the case for local reference lines and illustrate its successful use in a system that verifies street name phrases in a postal application.  相似文献   

6.
Xian  Venu  Sargur 《Pattern recognition》2000,33(12):1967-1973
Researchers have thus far focused on the recognition of alpha and numeric characters in isolation as well as in context. In this paper we introduce a new genre of problems where the input pattern is taken to be a pair of characters. This adds to the complexity of the classification task. The 10 class digit recognition problem is now transformed into a 100 class problem where the classes are {00,…, 99}. Similarly, the alpha character recognition problem is transformed to a 26×26 class problem, where the classes are {AA,…, ZZ}. If lower-case characters are also considered the number of classes increases further. The justification for adding to the complexity of the classification task is described in this paper. There are many applications where the pairs of characters occur naturally as an indivisible unit. Therefore, an approach which recognizes pairs of characters, whether or not they are separable, can lead to superior results. In fact, the holistic method described in this paper outperforms the traditional approaches that are based on segmentation. The correct recognition rate on a set of US state abbreviations and digit pairs, touching in various ways, is above 86%.  相似文献   

7.
This paper investigates the automatic reading of unconstrained omni-writer handwritten texts. It shows how to endow the reading system with learning faculties necessary to adapt the recognition to each writer's handwriting. In the first part of this paper, we explain how the recognition system can be adapted to a current handwriting by exploiting the graphical context defined by the writer's invariants. This adaptation is guaranteed by activating interaction links over the whole text between the recognition procedures of word entities and those of letter entities. In the second part, we justify the need of an open multiple-agent architecture to support the implementation of such a principle of adaptation. The proposed platform allows to plug expert treatments dedicated to handwriting analysis. We show that this platform helps to implement specific collaboration or cooperation schemes between agents which bring out new trends in the automatic reading of handwritten texts.  相似文献   

8.
9.
10.
A new paradigm, which models the relationships between handwriting and topic categories, in the context of medical forms, is presented. The ultimate goals are: (1) a robust method which categorizes medical forms into specified categories, and (2) the use of such information for practical applications such as an improved recognition of medical handwriting or retrieval of medical forms as in a search engine. Medical forms have diverse, complex and large lexicons consisting of English, Medical and Pharmacology corpus. Our technique shows that a few recognized characters, returned by handwriting recognition, can be used to construct a linguistic model capable of representing a medical topic category. This allows (1) a reduced lexicon to be constructed, thereby improving handwriting recognition performance, and (2) PCR (Pre-Hospital Care Report) forms to be tagged with a topic category and subsequently searched by information retrieval systems. We present an improvement of over 7% in raw recognition rate and a mean average precision of 0.28 over a set of 1,175 queries on a data set of unconstrained handwritten medical forms filled in emergency environments. This work was supported by the National Science Foundation.  相似文献   

11.
The state-of-the-art modified quadratic discriminant function (MQDF) based approach for online handwritten Chinese character recognition (HCCR) assumes that the feature vectors of each character class can be modeled by a Gaussian distribution with a mean vector and a full covariance matrix. In order to achieve a high recognition accuracy, enough number of leading eigenvectors of the covariance matrix have to be retained in MQDF. This paper presents a new approach to modeling each inverse covariance matrix by basis expansion, where expansion coefficients are character-dependent while a common set of basis matrices are shared by all the character classes. Consequently, our approach can achieve a much better accuracy–memory tradeoff. The usefulness of the proposed approach to designing compact HCCR systems has been confirmed and demonstrated by comparative experiments on popular Nakayosi and Kuchibue Japanese character databases.  相似文献   

12.
This paper presents a genetic programming based approach for optimizing the feature extraction step of a handwritten character recognizer. This recognizer uses a simple multilayer perceptron as a classifier and operates on a hierarchical feature space of orientation, curvature, and center of mass primitives. The nodes of the hierarchy represent rectangular sub-regions of their parent node, the tree root corresponding to the character's bounding box. Within each sub-region, a variable number of fuzzy features are extracted. Genetic programming is used to simultaneously learn the best hierarchy and the best combination of fuzzy features. Moreover, the fuzzy features are not predetermined, they are inferred from the evolution process which runs a two-objective selection operator. The first objective maximizes the recognition rate, and the second minimizes the feature space size. Results on Unipen data show that, using this approach, robust representations could be obtained that out-performed comparable human designed hierarchical fuzzy regional representations.  相似文献   

13.
14.
Several automation tools have been developed over the years for forensic document examination (FDE) of handwritten items. Integrating the developed tools into a unified framework is considered and the essential role of the human in the process is discussed. The task framework is developed by considering the approach of computational thinking whose components are abstraction, algorithms, mathematical models and ability to scale. Beginning with the human FDE procedure expressed in algorithmic form, mathematical and software implementations of individual steps of the algorithm are described. Advantages of the framework are discussed, including efficiency (ability to scale to tasks with many handwritten items), reproducibility and validation/improvement of existing manual procedures. It is indicated that as with other expert systems, such as for medical diagnosis, current automation tools are useful only as part of a larger manually intensive procedure. This viewpoint is illustrated with a well-known FDE case, concerning the Lindbergh kidnapping with a new hypothesis – in this case, there are multiple questioned documents, possibility of multiple writers of the same document, determining whether the writing is disguised, known writing is formal while questioned writing is informal, etc. Observations are made for future developments, where human examiners provide handwriting characteristics while computational methods provide the necessary statistical analysis.  相似文献   

15.
针对验证码的本质特征、形式化定义、今后发展方向和研究重点等问题, 通过深入、细致地分析和研究现有大量验证码, 给出了验证码的本质特征描述及形式化定义, 并从信息类型分类(共五种)、识别方式分类(共两种)和交互性分类(共两种)三个维度给出了验证码的20个种类; 分析了20种验证码类型的技术特点, 研究了其攻防对策, 给出了各类验证码今后的研究重点、难点及其研究方向。重点探讨了动态验证码和隐性验证码(包括语义验证码), 特别针对验证码通用攻击的攻防对策, 提出了验证码领域的一些新思路和新研究方法。  相似文献   

16.
The novel prototype extraction method presented in this paper aims to advancing in the comprehension of handwriting generation and improving on-line recognition systems. The extraction process is performed in two stages. First, using Fuzzy ARTMAP we group character instances according to classification criteria. Then, an algorithm refines these groups and computes the prototypes. Experimental results on the UNIPEN international database show that the proposed system is able to extract a low number of prototypes that are easily recognizable. In addition, the extraction method is able to condense knowledge that can be successfully used to initialize an LVQ-based recognizer, achieving an average recognition rate of 90.15%, comparable to that reached by human readers.  相似文献   

17.
18.
This paper introduces the theoretical foundation for the development of a pen-based system dedicated to helping to teach handwriting in primary schools. Knowledge given by a kinematic theory of rapid human movements is used. The system proposed includes a letter model generator which is used to create letter shapes with a human-like kinematics. The system generates feedback to pupils after a multilevel analysis of the handwriting. The analysis presented deals with shape conformity, shape error identification, fluency analysis and kinematic parameter evaluation. Discussion on how fluency measurement and error quantification can be useful in developing a learning metric is also presented.  相似文献   

19.
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy. Received October 30, 1998 / Revised January 15, 1999  相似文献   

20.
Hit lists are at the core of retrieval systems. The top ranks are important, especially if user feedback is used to train the system. Analysis of hit lists revealed counter-intuitive instances in the top ranks for good classifiers. In this study, we propose that two functions need to be optimised: (a) in order to reduce a massive set of instances to a likely subset among ten thousand or more classes, separability is required. However, the results need to be intuitive after ranking, reflecting (b) the prototypicality of instances. By optimising these requirements sequentially, the number of distracting images is strongly reduced, followed by nearest-centroid based instance ranking that retains an intuitive (low-edit distance) ranking. We show that in handwritten word-image retrieval, precision improvements of up to 35 percentage points can be achieved, yielding up to 100% top hit precision and 99% top-7 precision in data sets with 84 000 instances, while maintaining high recall performances. The method is conveniently implemented in a massive scale, continuously trainable retrieval engine, Monk.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号