期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Unsupervised writer adaptation applied to handwritten text recognition

Ali Nosary Thierry Paquet 《Pattern recognition》2004,37(2):385-388

This paper deals with the problem of off-line handwritten text recognition. It presents a system of text recognition that exploits an original principle of adaptation to the handwriting to be recognized. The adaptation principle is based on the automatic learning, during the recognition, of the graphical characteristics of the handwriting. This on-line adaptation of the recognition system relies on the iteration of two steps: a word recognition step that allows to label the writer's representations (allographs) on the whole text and a re-evaluation step of character models. Tests carried out on a sample of 15 writers, all unknown by the system, show the interest of the proposed adaptation scheme since we obtain during iterations an improvement of recognition rates both at the letter and the word levels. 相似文献

2.

Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

Tonghua Su Tianwen Zhang Dejun Guan 《International Journal on Document Analysis and Recognition》2007,10(1):27-38

A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database. 相似文献

3.

基于手写体的身份识别

郭兴银戴志强《微处理机》2002,2(3):33-34

以前许多文章曾介绍过一些基于手写体的书写人身份识别技术，其中多数都假设所写的文本是固定的。本文中，我们试图通过一种自动的不依赖文本的书写人识别听新颖算法，来消除这种假设，假定不同的人手写体存在明显的区别，我们采用一种综合方法，它基于纹理分析，每个人的手写体都被看成一种不同的纹理。原则上，我们可以采用任意一种标准的纹理识别算法（例如：多通道伽柏滤波器方法）。在对40名书写人的1000份测试文档的分类中，测试结果非常令人满意，识别率最高达到了96%。相似文献

4.

A fuzzy-syntactic approach to allograph modeling for cursive scriptrecognition

Parizeau M. Plamondon R. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(7):702-712

相似文献

5.

Handwritten text separation from annotated machine printed documents using Markov Random Fields

Xujun Peng Srirangaraj Setlur Venu Govindaraju Ramachandrula Sitaram 《International Journal on Document Analysis and Recognition》2013,16(1):1-16

The convenience of search, both on the personal computer hard disk as well as on the web, is still limited mainly to machine printed text documents and images because of the poor accuracy of handwriting recognizers. The focus of research in this paper is the segmentation of handwritten text and machine printed text from annotated documents sometimes referred to as the task of “ink separation” to advance the state-of-art in realizing search of hand-annotated documents. We propose a method which contains two main steps—patch level separation and pixel level separation. In the patch level separation step, the entire document is modeled as a Markov Random Field (MRF). Three different classes (machine printed text, handwritten text and overlapped text) are initially identified using G-means based classification followed by a MRF based relabeling procedure. A MRF based classification approach is then used to separate overlapped text into machine printed text and handwritten text using pixel level features forming the second step of the method. Experimental results on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment show that our method is robust and provides good text separation performance. 相似文献

6.

以笔画为研究对象的笔迹伪动态特征提取方法

王民孙向南刘利朱晓娟曾宝莹《计算机工程与应用》2016,52(18):179-182

针对以往的以文字结体为研究对象的离线笔迹特征提取方法在文本相关度较低时无法获取稳定特征的问题,提出了一种以笔画为研究对象的笔迹伪动态特征提取方法,摆脱了结体依存性的束缚。引入概率统计思想,采用网格窗口提取笔画的运笔走势和宽度变化等伪动态特征。分别采用加权欧式距离、加权卡方距离和加权Manhattan距离计算笔迹相似度。在HIT-MW和HIT-SW库上进行实验,文本相关度较高时首选和前10选鉴别正确率分别为95.9%和99.5%;文本相关度较低时首选和前10选鉴别正确率分别为91.9%和99.0%。实验表明,以笔画为研究对象的笔迹伪动态特征提取方法在低文本相关度下仍能取得较好效果。相似文献

7.

Machine printed text and handwriting identification in noisy document images 总被引：1，自引：0，他引：1

Zheng Y Li H Doermann D 《IEEE transactions on pattern analysis and machine intelligence》2004,26(3):337-353

In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections. 相似文献

8.

Fuzzy state machines to recognize totally unconstructed handwritten strokes

ISI Abuhaiba S Dattat MJJ Holt 《Image and vision computing》1995,13(10):755-769

An automatic off-line character recognition system for totally unconstrained handwritten strokes is presented. A stroke representation is developed and described using five types of feature. Fuzzy state machines are defined to work as recognizers of strokes. An algorithm to obtain a deterministic fuzzy state machine from a stroke representation, that is capable of recognizing that stroke and its variants is presented. An algorithm is developed to merge two fuzzy state machines into one machine. The use of fuzzy machines to recognize strokes is clarified through a recognition algorithm. The learning algorithm is a complex of the previous algorithms. A set of 20 stroke classes was used in the learning and recognition stages. The system was trained on 5890 unnormalized strokes written by five writers. The learning stage produced a fuzzy state machine of 2705 states and 8640 arcs. A total of 6865 unnormalized strokes, written freely by five writers other than the writers of the learning stage, was used in testing. The recognition, rejection and error rates were 94.8%, 1.2% and 4.0%, respectively. The system can be more developed to deal with cursive handwriting. 相似文献

9.

Individuality of alphabet knowledge in online writer identification

Guo Xian Tan Christian Viard-Gaudin Alex C. Kot 《International Journal on Document Analysis and Recognition》2010,13(2):147-157

Allograph prototype approaches for writer identification have been gaining popularity recently due to its simplicity and promising identification rates. Character prototypes that are used as allographs produce a consistent set of templates that models the handwriting styles of writers, thereby allowing high accuracies to be attained. We hypothesize that the alphabet knowledge inherent in such character prototypes can provide additional writer information pertaining to their styles of writing and their identities. This paper utilizes a character prototype approach to establish evidence that knowledge of the alphabet offers additional clues which help in the writer identification process. This paper then introduces an alphabet information coefficient (AIC) to better exploit such alphabet knowledge for writer identification. Our experiments showed an increase in writer identification accuracy from 66.0 to 87.0% on a database of 200 reference writers when alphabet knowledge was used. Experiments related to the reduction in dimensionality of the writer identification system are also reported. Our results show that the discriminative power of the alphabet can be used to reduce the complexity while maintaining the same level of performance for the writer identification system. 相似文献

10.

Evaluating the Potential Effectiveness of Automatic Document Analysis

James R. Lewis 《International Journal of Speech Technology》2004,7(1):35-43

This paper documents the motivation, method and results of seven experiments conducted to investigate the properties of automatic document analysis (for the purpose of automatic vocabulary expansion of a personalized language model in a speech dictation system). The results indicated that automatic document analysis of corrected text should improve the accuracy of text dictated in the future, as long as the future text is similar to the analyzed text. None of the manipulations had a measurable effect (either good or bad) when the analyzed text was uncorrected dictation or future text that was not similar to analyzed text. These results were the same for both trained and untrained acoustic models. 相似文献

11.

Handwritten Chinese text editing and recognition system

Shusen Zhou Qingcai Chen Xiaolong Wang 《Multimedia Tools and Applications》2014,71(3):1363-1380

This paper describes a handwritten Chinese text editing and recognition system that can edit handwritten text and recognize it with a client-server mode. First, the client end samples and redisplays the handwritten text by using digital ink technics, segments handwritten characters, edits them and saves original handwritten information into a self-defined document. The self-defined document saves coordinates of all sampled points of handwriting characters. Second, the server recognizes handwritten document based on the proposed Gabor feature extraction and affinity propagation clustering (GFAP) method, and returns the recognition results to client end. Moreover, the server can also collect the labeled handwritten characters and fine tune the recognizer automatically. Experimental results on HIT-OR3C database show that our handwriting recognition method improves the recognition performance remarkably. 相似文献

12.

A method for automatic classification of gender based on text- independent handwriting

Maken Payal Gupta Abhishek 《Multimedia Tools and Applications》2021,80(16):24573-24602

Handwriting recognition is used for the prediction of various demographic traits such as age, gender, nationality, etc. Out of all the applications gender prediction is mainly admired topic among researchers. The relation between gender and handwriting can be seen from the physical appearance of the handwriting. This research work predicts gender from handwriting using the landmarks of differences between the two genders. We use the shape or visual appearance of the handwriting for extracting features of the handwriting such as slanteness (direction), area (no of pixels occupied by text), perimeter (length of edges), etc. Classification is carried out using the Support Vector Machine (SVM) as a classifier which transforms the nonlinear problem into linear using its kernel trick, logistic regression, KNN and at the end to enhance the classification rates we use Majority Voting. The experimental results obtained on a dataset of 282 writers with 2 samples per writer shows that the proposed method attains appealing performance on writer detection and text-independent environment.

相似文献

13.

An unconstrained handwriting recognition system

E. Kavallieratou N. Fakotakis G. Kokkinakis 《International Journal on Document Analysis and Recognition》2002,4(4):226-242

In this paper, an integrated offline recognition system for unconstrained handwriting is presented. The proposed system consists of seven main modules: skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, slant removing, word segmentation, and character segmentation and recognition, stemming from the implementation of already existing algorithms as well as novel algorithms. This system has been tested on the NIST, IAM-DB, and GRUHD databases and has achieved accuracy that varies from 65.6% to 100% depending on the database and the experiment. 相似文献

14.

An over-segmentation method for single-touching Chinese handwriting with learning-based filtering

Liang Xu Fei Yin Qiu-Feng Wang Cheng-Lin Liu 《International Journal on Document Analysis and Recognition》2014,17(1):91-104

The segmentation of touching characters is still a challenging task, posing a bottleneck for offline Chinese handwriting recognition. In this paper, we propose an effective over-segmentation method with learning-based filtering using geometric features for single-touching Chinese handwriting. First, we detect candidate cuts by skeleton and contour analysis to guarantee a high recall rate of character separation. A filter is designed by supervised learning and used to prune implausible cuts to improve the precision. Since the segmentation rules and features are independent of the string length, the proposed method can deal with touching strings with more than two characters. The proposed method is evaluated on both the character segmentation task and the text line recognition task. The results on two large databases demonstrate the superiority of the proposed method in dealing with single-touching Chinese handwriting. 相似文献

15.

Identification of different script lines from multi-script documents 总被引：1，自引：0，他引：1

U. Pal B. B. Chaudhuri 《Image and vision computing》2002,20(13-14)

相似文献

16.

A writer identification system for on-line whiteboard data

Andreas Schlapbach Marcus Liwicki Horst Bunke 《Pattern recognition》2008,41(7):2381-2397

In this paper we address the task of writer identification of on-line handwriting captured from a whiteboard. Different sets of features are extracted from the recorded data and used to train a text and language independent on-line writer identification system. The system is based on Gaussian mixture models (GMMs) which provide a powerful yet simple means of representing the distribution of the features extracted from the handwritten text. The training data of all writers are used to train a universal background model (UBM) from which a client specific model is obtained by adaptation. Different sets of features are described and evaluated in this work. The system is tested using text from 200 different writers. A writer identification rate of 98.56% on the paragraph and of 88.96% on the text line level is achieved. 相似文献

17.

Text-independent writer identification and verification using textural and allographic features

Bulacu M Schomaker L 《IEEE transactions on pattern analysis and machine intelligence》2007,29(4):701-717

The identification of a person on the basis of scanned images of handwriting is a useful biometric modality with application in forensic and historic document analysis and constitutes an exemplary study area within the research field of behavioral biometrics. We developed new and very effective techniques for automatic writer identification and verification that use probability distribution functions (PDFs) extracted from the handwriting images to characterize writer individuality. A defining property of our methods is that they are designed to be independent of the textual content of the handwritten samples. Our methods operate at two levels of analysis: the texture level and the character-shape (allograph) level. At the texture level, we use contour-based joint directional PDFs that encode orientation and curvature information to give an intimate characterization of individual handwriting style. In our analysis at the allograph level, the writer is considered to be characterized by a stochastic pattern generator of ink-trace fragments, or graphemes. The PDF of these simple shapes in a given handwriting sample is characteristic for the writer and is computed using a common shape codebook obtained by grapheme clustering. Combining multiple features (directional, grapheme, and run-length PDFs) yields increased writer identification and verification performance. The proposed methods are applicable to free-style handwriting (both cursive and isolated) and have practical feasibility, under the assumption that a few text lines of handwritten material are available in order to obtain reliable probability estimates 相似文献

18.

Prototype extraction and adaptive OCR 总被引：1，自引：0，他引：1

Yihong Xu Nagy G. 《IEEE transactions on pattern analysis and machine intelligence》1999,21(12):1280-1296

相似文献

19.

面向连续叠写的高精简中文手写识别方法研究

苏统华戴洪良张健马培军邓胜春《计算机科学》2015,42(7):300-304

连续手写识别是中文手写输入技术的核心,自然、快捷地输入中文信息一直是模式识别乃至人工智能领域追求的目标。提出了一种有效克服小屏幕限制的连续叠写汉字识别方法。该方法基于切分-识别集成的解码框架,先使用过切分算法处理输入的书写轨迹;然后启用一种新颖的感知机算法判定字符的边界;随后采用来自字符分类模型、几何模型和语言模型的多种上下文信息进行路径解码。为适应不同类型的移动终端,特别提出了一种高效压缩字符分类模型的方法,以有效减少字符识别过程对存储和内存的占用。该识别方法已在Android平台上部署,并进行了大规模的测试实验。实验结果证实了该识别方法的性能和效率。相似文献

20.

Reading cursive handwriting by alignment of letter prototypes

Shimon Edelman Tamar Flash Shimon Ullman 《International Journal of Computer Vision》1990,5(3):303-331

We describe a new approach to the visual recognition of cursive handwriting. An effort is made to attain human-like performance by using a method based on pictorial alignment and on a model of the process of handwriting. The alignment approach permits recognition of character instances that appear embedded in connected strings. A system embodying this approach has been implemented and tested on five different word sets. The performance was stable both across words and across writers. The system exhibited a substantial ability to interpret cursive connected strings without recourse to lexical knowledge.SU is partially supported by NSF grant IRI-8900267. 相似文献