首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper deals with the problem of off-line handwritten text recognition. It presents a system of text recognition that exploits an original principle of adaptation to the handwriting to be recognized. The adaptation principle is based on the automatic learning, during the recognition, of the graphical characteristics of the handwriting. This on-line adaptation of the recognition system relies on the iteration of two steps: a word recognition step that allows to label the writer's representations (allographs) on the whole text and a re-evaluation step of character models. Tests carried out on a sample of 15 writers, all unknown by the system, show the interest of the proposed adaptation scheme since we obtain during iterations an improvement of recognition rates both at the letter and the word levels.  相似文献   

2.
A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.  相似文献   

3.
以前许多文章曾介绍过一些基于手写体的书写人身份识别技术,其中多数都假设所写的文本是固定的。本文中,我们试图通过一种自动的不依赖文本的书写人识别听新颖算法,来消除这种假设,假定不同的人手写体存在明显的区别,我们采用一种综合方法,它基于纹理分析,每个人的手写体都被看成一种不同的纹理。原则上,我们可以采用任意一种标准的纹理识别算法(例如:多通道伽柏滤波器方法)。在对40名书写人的1000份测试文档的分类中,测试结果非常令人满意,识别率最高达到了96%。  相似文献   

4.
5.
The convenience of search, both on the personal computer hard disk as well as on the web, is still limited mainly to machine printed text documents and images because of the poor accuracy of handwriting recognizers. The focus of research in this paper is the segmentation of handwritten text and machine printed text from annotated documents sometimes referred to as the task of “ink separation” to advance the state-of-art in realizing search of hand-annotated documents. We propose a method which contains two main steps—patch level separation and pixel level separation. In the patch level separation step, the entire document is modeled as a Markov Random Field (MRF). Three different classes (machine printed text, handwritten text and overlapped text) are initially identified using G-means based classification followed by a MRF based relabeling procedure. A MRF based classification approach is then used to separate overlapped text into machine printed text and handwritten text using pixel level features forming the second step of the method. Experimental results on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment show that our method is robust and provides good text separation performance.  相似文献   

6.
针对以往的以文字结体为研究对象的离线笔迹特征提取方法在文本相关度较低时无法获取稳定特征的问题,提出了一种以笔画为研究对象的笔迹伪动态特征提取方法,摆脱了结体依存性的束缚。引入概率统计思想,采用网格窗口提取笔画的运笔走势和宽度变化等伪动态特征。分别采用加权欧式距离、加权卡方距离和加权Manhattan距离计算笔迹相似度。在HIT-MW和HIT-SW库上进行实验,文本相关度较高时首选和前10选鉴别正确率分别为95.9%和99.5%;文本相关度较低时首选和前10选鉴别正确率分别为91.9%和99.0%。实验表明,以笔画为研究对象的笔迹伪动态特征提取方法在低文本相关度下仍能取得较好效果。  相似文献   

7.
In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.  相似文献   

8.
An automatic off-line character recognition system for totally unconstrained handwritten strokes is presented. A stroke representation is developed and described using five types of feature. Fuzzy state machines are defined to work as recognizers of strokes. An algorithm to obtain a deterministic fuzzy state machine from a stroke representation, that is capable of recognizing that stroke and its variants is presented. An algorithm is developed to merge two fuzzy state machines into one machine. The use of fuzzy machines to recognize strokes is clarified through a recognition algorithm. The learning algorithm is a complex of the previous algorithms. A set of 20 stroke classes was used in the learning and recognition stages. The system was trained on 5890 unnormalized strokes written by five writers. The learning stage produced a fuzzy state machine of 2705 states and 8640 arcs. A total of 6865 unnormalized strokes, written freely by five writers other than the writers of the learning stage, was used in testing. The recognition, rejection and error rates were 94.8%, 1.2% and 4.0%, respectively. The system can be more developed to deal with cursive handwriting.  相似文献   

9.
Allograph prototype approaches for writer identification have been gaining popularity recently due to its simplicity and promising identification rates. Character prototypes that are used as allographs produce a consistent set of templates that models the handwriting styles of writers, thereby allowing high accuracies to be attained. We hypothesize that the alphabet knowledge inherent in such character prototypes can provide additional writer information pertaining to their styles of writing and their identities. This paper utilizes a character prototype approach to establish evidence that knowledge of the alphabet offers additional clues which help in the writer identification process. This paper then introduces an alphabet information coefficient (AIC) to better exploit such alphabet knowledge for writer identification. Our experiments showed an increase in writer identification accuracy from 66.0 to 87.0% on a database of 200 reference writers when alphabet knowledge was used. Experiments related to the reduction in dimensionality of the writer identification system are also reported. Our results show that the discriminative power of the alphabet can be used to reduce the complexity while maintaining the same level of performance for the writer identification system.  相似文献   

10.
This paper documents the motivation, method and results of seven experiments conducted to investigate the properties of automatic document analysis (for the purpose of automatic vocabulary expansion of a personalized language model in a speech dictation system). The results indicated that automatic document analysis of corrected text should improve the accuracy of text dictated in the future, as long as the future text is similar to the analyzed text. None of the manipulations had a measurable effect (either good or bad) when the analyzed text was uncorrected dictation or future text that was not similar to analyzed text. These results were the same for both trained and untrained acoustic models.  相似文献   

11.
This paper describes a handwritten Chinese text editing and recognition system that can edit handwritten text and recognize it with a client-server mode. First, the client end samples and redisplays the handwritten text by using digital ink technics, segments handwritten characters, edits them and saves original handwritten information into a self-defined document. The self-defined document saves coordinates of all sampled points of handwriting characters. Second, the server recognizes handwritten document based on the proposed Gabor feature extraction and affinity propagation clustering (GFAP) method, and returns the recognition results to client end. Moreover, the server can also collect the labeled handwritten characters and fine tune the recognizer automatically. Experimental results on HIT-OR3C database show that our handwriting recognition method improves the recognition performance remarkably.  相似文献   

12.

Handwriting recognition is used for the prediction of various demographic traits such as age, gender, nationality, etc. Out of all the applications gender prediction is mainly admired topic among researchers. The relation between gender and handwriting can be seen from the physical appearance of the handwriting. This research work predicts gender from handwriting using the landmarks of differences between the two genders. We use the shape or visual appearance of the handwriting for extracting features of the handwriting such as slanteness (direction), area (no of pixels occupied by text), perimeter (length of edges), etc. Classification is carried out using the Support Vector Machine (SVM) as a classifier which transforms the nonlinear problem into linear using its kernel trick, logistic regression, KNN and at the end to enhance the classification rates we use Majority Voting. The experimental results obtained on a dataset of 282 writers with 2 samples per writer shows that the proposed method attains appealing performance on writer detection and text-independent environment.

  相似文献   

13.
In this paper, an integrated offline recognition system for unconstrained handwriting is presented. The proposed system consists of seven main modules: skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, slant removing, word segmentation, and character segmentation and recognition, stemming from the implementation of already existing algorithms as well as novel algorithms. This system has been tested on the NIST, IAM-DB, and GRUHD databases and has achieved accuracy that varies from 65.6% to 100% depending on the database and the experiment.  相似文献   

14.
The segmentation of touching characters is still a challenging task, posing a bottleneck for offline Chinese handwriting recognition. In this paper, we propose an effective over-segmentation method with learning-based filtering using geometric features for single-touching Chinese handwriting. First, we detect candidate cuts by skeleton and contour analysis to guarantee a high recall rate of character separation. A filter is designed by supervised learning and used to prune implausible cuts to improve the precision. Since the segmentation rules and features are independent of the string length, the proposed method can deal with touching strings with more than two characters. The proposed method is evaluated on both the character segmentation task and the text line recognition task. The results on two large databases demonstrate the superiority of the proposed method in dealing with single-touching Chinese handwriting.  相似文献   

15.
16.
In this paper we address the task of writer identification of on-line handwriting captured from a whiteboard. Different sets of features are extracted from the recorded data and used to train a text and language independent on-line writer identification system. The system is based on Gaussian mixture models (GMMs) which provide a powerful yet simple means of representing the distribution of the features extracted from the handwritten text. The training data of all writers are used to train a universal background model (UBM) from which a client specific model is obtained by adaptation. Different sets of features are described and evaluated in this work. The system is tested using text from 200 different writers. A writer identification rate of 98.56% on the paragraph and of 88.96% on the text line level is achieved.  相似文献   

17.
The identification of a person on the basis of scanned images of handwriting is a useful biometric modality with application in forensic and historic document analysis and constitutes an exemplary study area within the research field of behavioral biometrics. We developed new and very effective techniques for automatic writer identification and verification that use probability distribution functions (PDFs) extracted from the handwriting images to characterize writer individuality. A defining property of our methods is that they are designed to be independent of the textual content of the handwritten samples. Our methods operate at two levels of analysis: the texture level and the character-shape (allograph) level. At the texture level, we use contour-based joint directional PDFs that encode orientation and curvature information to give an intimate characterization of individual handwriting style. In our analysis at the allograph level, the writer is considered to be characterized by a stochastic pattern generator of ink-trace fragments, or graphemes. The PDF of these simple shapes in a given handwriting sample is characteristic for the writer and is computed using a common shape codebook obtained by grapheme clustering. Combining multiple features (directional, grapheme, and run-length PDFs) yields increased writer identification and verification performance. The proposed methods are applicable to free-style handwriting (both cursive and isolated) and have practical feasibility, under the assumption that a few text lines of handwritten material are available in order to obtain reliable probability estimates  相似文献   

18.
19.
连续手写识别是中文手写输入技术的核心,自然、快捷地输入中文信息一直是模式识别乃至人工智能领域追求的目标。提出了一种有效克服小屏幕限制的连续叠写汉字识别方法。该方法基于切分-识别集成的解码框架,先使用过切分算法处理输入的书写轨迹;然后启用一种新颖的感知机算法判定字符的边界;随后采用来自字符分类模型、几何模型和语言模型的多种上下文信息进行路径解码。为适应不同类型的移动终端,特别提出了一种高效压缩字符分类模型的方法,以有效减少字符识别过程对存储和内存的占用。该识别方法已在Android平台上部署,并进行了大规模的测试实验。实验结果证实了该识别方法的性能和效率。  相似文献   

20.
We describe a new approach to the visual recognition of cursive handwriting. An effort is made to attain human-like performance by using a method based on pictorial alignment and on a model of the process of handwriting. The alignment approach permits recognition of character instances that appear embedded in connected strings. A system embodying this approach has been implemented and tested on five different word sets. The performance was stable both across words and across writers. The system exhibited a substantial ability to interpret cursive connected strings without recourse to lexical knowledge.SU is partially supported by NSF grant IRI-8900267.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号