期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance of hidden Markov model and dynamic Bayesian network classifiers on handwritten Arabic word recognition

Jawad H. AlKhateeb Olivier Pauplin Jinchang Ren Jianmin Jiang 《Knowledge》2011,24(5):680-688

This paper presents a comparative study of two machine learning techniques for recognizing handwritten Arabic words, where hidden Markov models (HMMs) and dynamic Bayesian networks (DBNs) were evaluated. The work proposed is divided into three stages, namely preprocessing, feature extraction and classification. Preprocessing includes baseline estimation and normalization as well as segmentation. In the second stage, features are extracted from each of the normalized words, where a set of new features for handwritten Arabic words is proposed, based on a sliding window approach moving across the mirrored word image. The third stage is for classification and recognition, where machine learning is applied using HMMs and DBNs. In order to validate the techniques, extensive experiments were conducted using the IFN/ENIT database which contains 32,492 Arabic words. Experimental results and quantitative evaluations showed that HMM outperforms DBN in terms of higher recognition rate and lower complexity. 相似文献

2.

Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking

Jawad H AlKhateeb Jinchang Ren 《Pattern recognition letters》2011,32(8):1081-1088

相似文献

3.

Region growing based segmentation algorithm for typewritten and handwritten text recognition

Khalid Saeed Majida Albakoor 《Applied Soft Computing》2009,9(2):608-617

This paper presents a new technique of high accuracy to recognize both typewritten and handwritten English and Arabic texts without thinning. After segmenting the text into lines (horizontal segmentation) and the lines into words, it separates the word into its letters. Separating a text line (row) into words and a word into letters is performed by using the region growing technique (implicit segmentation) on the basis of three essential lines in a text row. This saves time as there is no need to skeletonize or to physically isolate letters from the tested word whilst the input data involves only the basic information—the scanned text. The baseline is detected, the word contour is defined and the word is implicitly segmented into its letters according to a novel algorithm described in the paper. The extracted letter with its dots is used as one unit in the system of recognition. It is resized into a 9 × 9 matrix following bilinear interpolation after applying a lowpass filter to reduce aliasing. Then the elements are scaled to the interval [0,1]. The resulting array is considered as the input to the designed neural network. For typewritten texts, three types of Arabic letter fonts are used—Arial, Arabic Transparent and Simplified Arabic. The results showed an average recognition success rate of 93% for Arabic typewriting. This segmentation approach has also found its application in handwritten text where words are classified with a relatively high recognition rate for both Arabic and English languages. The experiments were performed in MATLAB and have shown promising results that can be a good base for further analysis and considerations of Arabic and other cursive language text recognition as well as English handwritten texts. For English handwritten classification, a success rate of about 80% in average was achieved while for Arabic handwritten text, the algorithm performance was successful in about 90%. The recent results have shown increasing success for both Arabic and English texts. 相似文献

4.

Arabic handwriting recognition using structural and syntactic pattern attributes

《Pattern recognition》2013,46(1):141-154

相似文献

5.

Development of an efficient neural-based segmentation technique for Arabic handwriting recognition

Husam A. Al Hamad Author Vitae Raed Abu Zitar^{Author Vitae} 《Pattern recognition》2010,43(8):2773-2798

相似文献

6.

Word matching using single closed contours for indexing handwritten historical documents

Tomasz Adamek Noel E. O’Connor Alan F. Smeaton 《International Journal on Document Analysis and Recognition》2007,9(2-4):153-165

相似文献

7.

脱机自由手写英文单词的识别 总被引：1，自引：0，他引：1

梁佳玉刘昌平黄磊《计算机应用》2004,24(9):41-43

介绍了一个基于隐马尔科夫模型的、采用模糊分割方式的脱机手写英文单词识别系统。该系统由图像预处理、特征提取、基于HMM的训练和识别四个模块组成。图像预处理中包括二值化、平滑去噪、倾斜校正和参考线提取。然后通过宽度不固定的滑动窗提取特征，前两组特征是整体形状和象素分布特征，另外又引入了Sobel梯度特征。HMM模型采用嵌入式的Baum-Welch算法训练，这种训练方式无需分割单词。最后用Viterbi算法识别。对字典中的每个单词，采用字母模型线性连接成单词模型。相似文献

8.

Extraction of key letters for cursive script recognition

M. Cheriet C. Y. Suen 《Pattern recognition letters》1993,14(12):1009-1017

相似文献

9.

基于加权贝叶斯的脱机手写阿文单词识别

许亚美何继爱《中文信息学报》2021,35(2):133-140

针对手写阿拉伯单词书写连笔,且相似词较多的特点,该文提出一种新的脱机手写文字识别算法。该算法以固定组件为成分拆分阿拉伯单词,构建自组件特征至单词类别的加权贝叶斯推理模型。算法结合单词组件分割、多级混合式组件识别、组件加权系数估计等,计算单词类别的后验概率并得到单词识别结果。在IFN/ENIT库上的实验,获得了90.03%的单词识别率,证实组件分解对笔画连写具有鲁棒性,组件识别能提高相似词的辨别能力,而且该算法所需训练类别少,易向大词汇量识别扩展。相似文献

10.

A survey on Arabic character segmentation

Yasser M. Alginahi 《International Journal on Document Analysis and Recognition》2013,16(2):105-126

相似文献

11.

Arabic word descriptor for handwritten word indexing and lexicon reduction

Youssouf Chherawala Mohamed Cheriet 《Pattern recognition》2014

相似文献

12.

A segmentation-free approach to text recognition with application to Arabic text

Badr Al-Badr Robert M. Haralick 《International Journal on Document Analysis and Recognition》1998,1(3):147-166

相似文献

13.

Neural Networks Pipeline for Offline Machine Printed Arabic OCR

Mohamed A. Radwan Mahmoud I. Khalil Hazem M. Abbas 《Neural Processing Letters》2018,48(2):769-787

In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network model predicts the font size of an Arabic word, then the word is normalized to an 18pt font size that will be used to train the next two models. The second model is used to segment a word into characters. The problem of words segmentation in the Arabic language, as in many similar cursive languages, presents a challenge to the OCR systems. This paper presents a multichannel neural network to solve the offline segmentation of machine-printed Arabic documents. The segmented characters are then fed as an input to a convolutional neural network for Arabic characters recognition. The font size prediction model produced a test accuracy of 99.1%. The accuracy of the segmentation model using one font is 98.9%, while four-font model showed 95.5% accuracy. The whole pipeline showed an accuracy of 94.38% on Arabic Transparent font of size 18pt from APTI data set. 相似文献

14.

Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)

Noor?A.?Jebril Email author Hussein?R.?Al-Zoubi Qasem?Abu Al-Haija 《Pattern Recognition and Image Analysis》2018,28(2):321-345

Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%. 相似文献

15.

Structural analysis of Arabic handwriting: segmentation and recognition

Katerin Romeo-Pakker Abderrahim Ameur Christian Olivier Yves Lecourtier 《Machine Vision and Applications》1995,8(4):232-240

In this paper, a structural method of recognising Arabic handwritten characters is proposed. The major problem in cursive text recognition is the segmentation into characters or into representative strokes. When we segment the cursive portions of words, we take into account the contextual properties of the Arabic grammar and the junction segments connecting the characters to each other along the writing line. The problem of overlapping characters is resolved with a contour-following algorithm associated with the labelling of the detected contours. In the recognition phase, the characters are gathered into ten families of candidate characters with similar shapes. Then a heterarchical analysis follows that checks the pattern via goal-directed feedback control. 相似文献

16.

Confidence- and margin-based MMI/MPE discriminative training for off-line handwriting recognition

Philippe Dreuw Georg Heigold Hermann Ney 《International Journal on Document Analysis and Recognition》2011,14(3):273-288

We present a novel confidence- and margin-based discriminative training approach for model adaptation of a hidden Markov model (HMM)-based handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood (ML) trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used to train writer-independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. The proposed methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT Arabic handwriting database, where the word error rate is decreased by 33% relative compared to a ML trained baseline system. On the large-vocabulary line recognition task of the IAM English handwriting database, the word error rate is decreased by 25% relative. 相似文献

17.

HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents

Zaher Al Aghbari Salama Brook 《Expert systems with applications》2009,36(8):10942-10951

相似文献

18.

W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents

Youssouf Chherawala Mohamed Cheriet 《Pattern recognition》2012,45(9):3277-3287

This paper proposes a holistic lexicon-reduction method for ancient and modern handwritten Arabic documents. The word shape is represented by the weighted topological signature vector (W-TSV), which encodes graph data into a low-dimensional vector space. Three directed acyclic graph (DAG) representations are proposed for Arabic word shapes, based on topological and geometrical features. Lexicon reduction is achieved by a nearest neighbors search in the W-TSV space. The proposed framework has been tested on the IFN/ENIT and the Ibn Sina databases, achieving respectively a degree of reduction of 83.5% and 92.9% for an accuracy of reduction of 90%. 相似文献

19.

Improved linear density technique for segmentation in Arabic handwritten text recognition

Al Hamad Husam Ahmed Abualigah Laith Shehab Mohammad Al-Shqeerat Khalil H. A. Otair Mohammad 《Multimedia Tools and Applications》2022,81(20):28531-28558

相似文献

20.

Modeling and recognition of cursive words with hidden Markov models

Wongyu Seong-Whan Jin H. 《Pattern recognition》1995,28(12):1941-1953

In this paper, a new method for modeling and recognizing cursive words with hidden Markov models (HMM) is presented. In the proposed method, a sequence of thin fixed-width vertical frames are extracted from the image, capturing the local features of the handwriting. By quantizing the feature vectors of each frame, the input word image is represented as a Markov chain of discrete symbols. A handwritten word is regarded as a sequence of characters and optional ligatures. Hence, the ligatures are also explicitly modeled. With this view, an interconnection network of character and ligature HMMs is constructed to model words of indefinite length. This model can ideally describe any form of handwritten words, including discretely spaced words, pure cursive words and unconstrained words of mixed styles. Experiments have been conducted with a standard database to evaluate the performance of the overall scheme. The performance of various search strategies based on the forward and backward score has been compared. Experiments on the use of a preclassifier based on global features show that this approach may be useful for even large-vocabulary recognition tasks. 相似文献