WRITER IDENTIFICATION OF ARABIC TEXT USING STATISTICAL AND STRUCTURAL FEATURES |
| |
Authors: | Sameh M Awaida |
| |
Affiliation: | Qassim University , Qassim , Kingdom of Saudi Arabia |
| |
Abstract: | This article addresses writer identification of handwritten Arabic text. Several types of structural and statistical features were extracted from Arabic handwriting text. A novel approach was used to extract structural features that build on some of the main characteristics of the Arabic language. Connected component features for Arabic handwritten text as well as gradient distribution features, windowed gradient distribution features, contour chain code distribution features, and windowed contour chain code distribution features were extracted. A nearest neighbor (NN) classifier was used with the Euclidean distance measure. Data reduction algorithms (viz. principal component analysis PCA], linear discriminant analysis LDA], multiple discriminant analysis MDA], multidimensional scaling MDS], and forward/backward feature selection algorithm) were used. A database of 500 paragraphs handwritten in Arabic by 250 writers was used. The paragraphs used were randomly generated from a large corpus. NN provided the best accuracy in text-independent writer identification with top-1 result of 88.0%, top-5 result of 96.0%, and top-10 result of 98.5% for the first 100 writers. Extending the work to include all 250 writers and with the backward feature selection algorithm (using 54 out of 83 features), the system attained a top-1 result of 75.0%, top-5 result of 91.8%, and top-10 result of 95.4%. |
| |
Keywords: | Arabic writer identification system feature combination feature extraction handwriting analysis handwritten text text-independent writer identification |
|
|