首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
A method of recognition of arabic cursive handwriting   总被引:1,自引:0,他引:1  
In spite of the progress of machine recognition techniques of Latin, Kana, and Chinese characters over the two past decades, the machine recognition of Arabic characters has remained almost untouched. In this correspondence, a structural recognition method of Arabic cursively handwritten words is proposed. In this method, words are first segmented into strokes. Those strokes are then classified using their geometrical and topological properties. Finally, the relative position of the classified strokes are examined, and the strokes are combined in several steps into a string of characters that represents the recognized word. Experimental results on texts handwritten by two persons showed high recognition accuracy.  相似文献   

3.
In this paper, a structural method of recognising Arabic handwritten characters is proposed. The major problem in cursive text recognition is the segmentation into characters or into representative strokes. When we segment the cursive portions of words, we take into account the contextual properties of the Arabic grammar and the junction segments connecting the characters to each other along the writing line. The problem of overlapping characters is resolved with a contour-following algorithm associated with the labelling of the detected contours. In the recognition phase, the characters are gathered into ten families of candidate characters with similar shapes. Then a heterarchical analysis follows that checks the pattern via goal-directed feedback control.  相似文献   

4.
The aim of our work is to present a new method based on structural characteristics and a fuzzy classifier for off-line recognition of handwritten Arabic characters in all their forms (beginning, end, middle and isolated). The proposed method can be integrated in any handwritten Arabic words recognition system based on an explicit segmentation process. First, three preprocessing operations are applied on character images: thinning, contour tracing and connected components detection. These operations extract structural characteristics used to divide the set of characters into five subsets. Next, features are extracted using invariant pseudo-Zernike moments. Classification was done using the Fuzzy ARTMAP neural network, which is very fast in training and supports incremental learning. Five Fuzzy ARTMAP neural networks were employed; each one is designed to recognize one subset of characters. The recognition process is achieved in two steps: in the first one, a clustering method affects characters to one of the five character subsets. In the second one, the pseudo-Zernike features are used by the appropriate Fuzzy ARTMAP classifier to identify the character. Training process and tests were performed on a set of character images manually extracted from the IFN/ENIT database. A height recognition rate was reported.  相似文献   

5.
6.
7.
8.
This paper presents a comparative study of two machine learning techniques for recognizing handwritten Arabic words, where hidden Markov models (HMMs) and dynamic Bayesian networks (DBNs) were evaluated. The work proposed is divided into three stages, namely preprocessing, feature extraction and classification. Preprocessing includes baseline estimation and normalization as well as segmentation. In the second stage, features are extracted from each of the normalized words, where a set of new features for handwritten Arabic words is proposed, based on a sliding window approach moving across the mirrored word image. The third stage is for classification and recognition, where machine learning is applied using HMMs and DBNs. In order to validate the techniques, extensive experiments were conducted using the IFN/ENIT database which contains 32,492 Arabic words. Experimental results and quantitative evaluations showed that HMM outperforms DBN in terms of higher recognition rate and lower complexity.  相似文献   

9.
离线手写数字识别是光学字符识别的一个重要分支,在银行票据识别、邮政编码识别等领域有着广泛的应用。由于单一分类器在识别率上很难达到要求,人们提出了各种集成分类器识别方案。通过对离线手写数字的特征提取,从特征互补的角度出发,采用了最小距离分类器、树分类器和BP网络分类器进行多分类器互补集成,提出了基于置信度的多分类器互补集成方法。通过实验对比,基于置信度的多分类器互补集成手写数字识别在识别率和识别速度上达到了满意的结果。  相似文献   

10.
Handwritten digit recognition has long been a challenging problem in the field of optical character recognition and of great importance in industry. This paper develops a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve performance of isolated Farsi/Arabic handwritten digit recognition, we use Bag of Visual Words (BoVW) technique to construct images feature vectors. Each visual word is described by Scale Invariant Feature Transform (SIFT) method. For learning feature vectors, Quantum Neural Networks (QNN) classifier is used. Experimental results on a very popular Farsi/Arabic handwritten digit dataset (HODA dataset) show that proposed method can achieve the highest recognition rate compared to other state of the arts methods.  相似文献   

11.
12.
13.
This paper considers the development of a real-time Arabic handwritten character recognition system. The shape of an Arabic character depends on its position in a given word. The system assumes that characters result from a reliable segmentation stage, thus, the position of the character is known a priori. Thus, four different sets of character shapes have been independently considered. Each set is further divided into four subsets depending on the number of strokes in the character. The system has been heavily tested and the average recognition rate has been found to be 99.6% where most of the misrecognized characters were actually written with little care. Thus, the system can be reliably used for the recognition of on-line handwritten characters entered via a graphic tablet.  相似文献   

14.
In this paper, we focus on information extraction from optical character recognition (OCR) output. Since the content from OCR inherently has many errors, we present robust algorithms for information extraction from OCR lattices instead of merely looking them up in the top-choice (1-best) OCR output. Specifically, we address the challenge of named entity detection in noisy OCR output and show that searching for named entities in the recognition lattice significantly improves detection accuracy over 1-best search. While lattice-based named entity (NE) detection improves NE recall from OCR output, there are two problems with this approach: (1) the number of false alarms can be prohibitive for certain applications and (2) lattice-based search is computationally more expensive than 1-best NE lookup. To mitigate the above challenges, we present techniques for reducing false alarms using confidence measures and for reducing the amount of computation involved in performing the NE search. Furthermore, to demonstrate that our techniques are applicable across multiple domains and languages, we experiment with optical character recognition systems for videotext in English and scanned handwritten text in Arabic.  相似文献   

15.
The problem of handwritten digit recognition has long been an open problem in the field of pattern classification and of great importance in industry. The heart of the problem lies within the ability to design an efficient algorithm that can recognize digits written and submitted by users via a tablet, scanner, and other digital devices. From an engineering point of view, it is desirable to achieve a good performance within limited resources. To this end, we have developed a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve the overall performance achieved in classification task, the literature suggests combining the decision of multiple classifiers rather than using the output of the best classifier in the ensemble; so, in this new approach, an ensemble of classifiers is used for the recognition of handwritten digit. The classifiers used in proposed system are based on singular value decomposition (SVD) algorithm. The experimental results and the literature show that the SVD algorithm is suitable for solving sparse matrices such as handwritten digit. The decisions obtained by SVD classifiers are combined by a novel proposed combination rule which we named reliable multi-phase particle swarm optimization. We call the method “Reliable” because we have introduced a novel reliability parameter which is applied to tackle the problem of PSO being trapped in local minima. In comparison with previous methods, one of the significant advantages of the proposed method is that it is not sensitive to the size of training set. Unlike other methods, the proposed method uses just 15 % of the dataset as a training set, while other methods usually use (60–75) % of the whole dataset as the training set. To evaluate the proposed method, we tested our algorithm on Farsi/Arabic handwritten digit dataset. What makes the recognition of the handwritten Farsi/Arabic digits more challenging is that some of the digits can be legally written in different shapes. Therefore, 6000 hard samples (600 samples per class) are chosen by K-nearest neighbor algorithm from the HODA dataset which is a standard Farsi/Arabic digit dataset. Experimental results have shown that the proposed method is fast, accurate, and robust against the local minima of PSO. Finally, the proposed method is compared with state of the art methods and some ensemble classifier based on MLP, RBF, and ANFIS with various combination rules.  相似文献   

16.
In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network model predicts the font size of an Arabic word, then the word is normalized to an 18pt font size that will be used to train the next two models. The second model is used to segment a word into characters. The problem of words segmentation in the Arabic language, as in many similar cursive languages, presents a challenge to the OCR systems. This paper presents a multichannel neural network to solve the offline segmentation of machine-printed Arabic documents. The segmented characters are then fed as an input to a convolutional neural network for Arabic characters recognition. The font size prediction model produced a test accuracy of 99.1%. The accuracy of the segmentation model using one font is 98.9%, while four-font model showed 95.5% accuracy. The whole pipeline showed an accuracy of 94.38% on Arabic Transparent font of size 18pt from APTI data set.  相似文献   

17.
18.
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier.  相似文献   

19.
在字符识别系统中,字符的有效分割是识别的关键。针对手写汉字字间距及字内距无规则可循,字符间极易发生粘连、交错等现象,提出一种多步分割方法。该方法首先利用Viterbi算法将原字符串切分成互不连通的分割块,使非粘连汉字、交错汉字得到正确分割;对于其中宽度较大存在粘连字符的分割块,从候选分割点入手,用非线性分割路径将粘连部分分开;最后再应用A*算法找到全局最佳分割位置,使过分割的字符得到完整合并。实验结果表明,该方法对于手写汉字的分割是可行、有效的。  相似文献   

20.
The Arabic alphabet is used in around 27 languages, including Arabic, Persian, Kurdish, Urdu, and Jawi. Many researchers have developed systems for recognizing cursive handwritten Arabic words, using both holistic and segmentation-based approaches. This paper introduces a system that achieves high accuracy using efficient segmentation, feature extraction, and recurrent neural network (RNN). We describe a robust rule-based segmentation algorithm that uses special feature points identified in the word skeleton to segment the cursive words into graphemes. We show that careful selection from a wide range of features extracted during and after the segmentation stage produces a feature set that significantly reduces the label error. We demonstrate that using same RNN recognition engine, the segmentation approach with efficient feature extraction gives better results than a holistic approach that extracts features from raw pixels. We evaluated this segmentation approach against an improved version of the holistic system MDLSTM that won the ICDAR 2009 Arabic handwritten word recognition competition. On the IfN/ENIT database of handwritten Arabic words, the segmentation approach reduces the average label error by 18.5 %, the sequence error by 22.3 %, and the execution time by 31 %, relative to MDLSTM. This approach also has the best published accuracies on two IfN/ENIT test sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号