首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
Writer-adaptation is the process of converting a writer-independent handwriting recognition system into a writer-dependent system. It can greatly increasing recognition accuracy, given adequate writer models. The limited amount of data a writer provides during training constrains the models' complexity. We show how appropriate use of writer-independent models is important for the adaptation. Our approach uses writer-independent writing style models (lexemes) to identify the styles present in a particular writer's training data. These models are then updated using the writer's data. Lexemes in the writer's data for which an inadequate number of training examples is available are replaced with the writer-independent models. We demonstrate the feasibility of this approach on both isolated handwritten character recognition and unconstrained word recognition tasks. Our results show an average reduction in error rate of 16.3 percent for lowercase characters as compared against representing each of the writer's character classes with a single model. In addition, an average error rate reduction of 9.2 percent is shown on handwritten words using only a small amount of data for adaptation  相似文献   

2.
3.
Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%.  相似文献   

4.
5.
This paper proposes an automatic text-independent writer identification framework that integrates an industrial handwriting recognition system, which is used to perform an automatic segmentation of an online handwritten document at the character level. Subsequently, a fuzzy c-means approach is adopted to estimate statistical distributions of character prototypes on an alphabet basis. These distributions model the unique handwriting styles of the writers. The proposed system attained an accuracy of 99.2% when retrieved from a database of 120 writers. The only limitation is that a minimum length of text needs to be present in the document in order for sufficient accuracy to be achieved. We have found that this minimum length of text is about 160 characters or approximately equivalent to 3 lines of text. In addition, the discriminative power of different alphabets on the accuracy is also reported.  相似文献   

6.
7.
This paper describes and analyses the performance of a novel feature extraction technique for the recognition of segmented/cursive characters that may be used in the context of a segmentation-based handwritten word recognition system. The modified direction feature (MDF) extraction technique builds upon the direction feature (DF) technique proposed previously that extracts direction information from the structure of character contours. This principal was extended so that the direction information is integrated with a technique for detecting transitions between background and foreground pixels in the character image.In order to improve on the DF extraction technique, a number of modifications were undertaken. With a view to describe the character contour more effectively, a re-design of the direction number determination technique was performed. Also, an additional global feature was introduced to improve the recognition accuracy for those characters that were most frequently confused with patterns of similar appearance. MDF was tested using a neural network-based classifier and compared to the DF and transition feature (TF) extraction techniques. MDF outperformed both DF and TF techniques using a benchmark dataset and compared favourably with the top results in the literature. A recognition accuracy of above 89% is reported on characters from the CEDAR dataset.  相似文献   

8.
9.
10.
11.
由于手写哈萨克字符结构的特殊性,仅提取几种单一的字符特征进行识别时正确率较低,识别效果较差。由此采用改进的PCA方法定位单词基线位置,对每个字符提取包括笔画密度特征、投影特征、轮廓特征等在内的36种特征,使用K-W检验对各特征的分类能力进行比较,并采用线性判别函数进行分类,取得了较高的识别精度。实验结果表明,该系统针对脱机字符识别率达到94%以上。  相似文献   

12.
A novel SVM-based handwritten Tamil character recognition system   总被引:1,自引:0,他引:1  
This paper describes a system for recognizing offline handwritten Tamil characters using support vector machine (SVM). Data samples are collected from different writers on A4 sized documents. They are scanned using a flat bed scanner at a resolution of 300 dpi and stored as gray-scale images. Various preprocessing operations are performed on the digitized image to enhance the quality of the image. Pixel densities are calculated for 64 different zones of the image and these values are used as the features of a character. These features are used to train the SVM. The SVM is tested for the first time to recognize handwritten Tamil characters. The system has achieved a very good recognition accuracy of 82.04% on the handwritten Tamil character database.  相似文献   

13.
In this paper, we propose an off-line recognition method for handwritten Korean characters based on stroke extraction and representation. To recognize handwritten Korean characters, it is required to extract strokes and stroke sequence to describe an input of two-dimensional character as one-dimensional representation. We define 28 primitive strokes to represent characters and introduce 300 stroke separation rules to extract proper strokes from Korean characters. To find a stroke sequence, we use stroke code and stroke relationship between consecutive strokes. The input characters are recognized by using character recognition trees. The proposed method has been tested for the most frequently used 1000 characters by 400 different writers and showed recognition rate of 94.3%.  相似文献   

14.
A handwritten Chinese character off-line recognizer based on contextual vector quantization (CVQ) of every pixel of an unknown character image has been constructed. Each template character is represented by a codebook. When an unknown image is matched against a template character, each pixel of the image is quantized according to the associated codebook by considering not just the feature vector observed at each pixel, but those observed at its neighbors and their quantization as well. Structural information such as stroke counts observed at each pixel are captured to form a cellular feature vector. Supporting a vocabulary of 4616 simplified Chinese characters and alphanumeric and punctuation symbols, the writer-independent recognizer has an average recognition rate of 77.2 percent. Three statistical language models for postprocessing have been studied for their effectiveness in upgrading the recognition rate of the system. Among them, the CVQ-based language model is the most effective one upgrading the recognition rate by 10.4 percent on the average  相似文献   

15.
16.
17.
李宇霞  孙永奇  闫茹  朱卫国 《计算机工程》2021,47(1):255-263,274
光学字符识别技术可有效提高票据应用中票据信息录入的工作效率。针对票据的复杂背景与不规范手写字符降低票据识别准确率的问题,结合卷积神经网络图像识别与语义可靠性,提出一种可靠性优先的路径搜索方法,以降低模糊字符对搜索路径的干扰。利用基于公司名结构特点的前后缀推断策略,有效解决公司名前后缀识别错误问题。采用结巴中文分词与字符位置信息检查识别结果中的错误,并将长短期记忆语言模型与在传统字形相似度基础上引入的汉字部件相似度相结合进行纠错。实验结果表明,通过将纠错策略与该方法相结合可有效提高公司名识别准确率至93.08%。  相似文献   

18.
将粗分类应用于脱机手写汉字识别中,采用这种多层次分类策略,能有效地改善识别的性能,提高识别精度。本文提出了一种利用四角区域结构特征对手写汉字进行粗分类的方法。在对汉字基本笔画进行分析的基础之上,根据手写汉字形变的特点以及识别算法的要求,定义一组新的笔画单元,并将这些笔画单元与汉字特定区域内的结构进行比对,得到一组4位结构特征编码,以此作为脱机手写汉字粗分类的依据。对GB2312一级字库中的部分手写汉字进行采样和识别实验,结果证明改进的四角结构特征用于粗分类的有效性。  相似文献   

19.
20.
离线手写数字识别是光学字符识别的一个重要分支,在银行票据识别、邮政编码识别等领域有着广泛的应用。由于单一分类器在识别率上很难达到要求,人们提出了各种集成分类器识别方案。通过对离线手写数字的特征提取,从特征互补的角度出发,采用了最小距离分类器、树分类器和BP网络分类器进行多分类器互补集成,提出了基于置信度的多分类器互补集成方法。通过实验对比,基于置信度的多分类器互补集成手写数字识别在识别率和识别速度上达到了满意的结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号