首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
2.
3.
Handwritten digit recognition has long been a challenging problem in the field of optical character recognition and of great importance in industry. This paper develops a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve performance of isolated Farsi/Arabic handwritten digit recognition, we use Bag of Visual Words (BoVW) technique to construct images feature vectors. Each visual word is described by Scale Invariant Feature Transform (SIFT) method. For learning feature vectors, Quantum Neural Networks (QNN) classifier is used. Experimental results on a very popular Farsi/Arabic handwritten digit dataset (HODA dataset) show that proposed method can achieve the highest recognition rate compared to other state of the arts methods.  相似文献   

4.
5.
The problem of handwritten digit recognition has long been an open problem in the field of pattern classification and of great importance in industry. The heart of the problem lies within the ability to design an efficient algorithm that can recognize digits written and submitted by users via a tablet, scanner, and other digital devices. From an engineering point of view, it is desirable to achieve a good performance within limited resources. To this end, we have developed a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve the overall performance achieved in classification task, the literature suggests combining the decision of multiple classifiers rather than using the output of the best classifier in the ensemble; so, in this new approach, an ensemble of classifiers is used for the recognition of handwritten digit. The classifiers used in proposed system are based on singular value decomposition (SVD) algorithm. The experimental results and the literature show that the SVD algorithm is suitable for solving sparse matrices such as handwritten digit. The decisions obtained by SVD classifiers are combined by a novel proposed combination rule which we named reliable multi-phase particle swarm optimization. We call the method “Reliable” because we have introduced a novel reliability parameter which is applied to tackle the problem of PSO being trapped in local minima. In comparison with previous methods, one of the significant advantages of the proposed method is that it is not sensitive to the size of training set. Unlike other methods, the proposed method uses just 15 % of the dataset as a training set, while other methods usually use (60–75) % of the whole dataset as the training set. To evaluate the proposed method, we tested our algorithm on Farsi/Arabic handwritten digit dataset. What makes the recognition of the handwritten Farsi/Arabic digits more challenging is that some of the digits can be legally written in different shapes. Therefore, 6000 hard samples (600 samples per class) are chosen by K-nearest neighbor algorithm from the HODA dataset which is a standard Farsi/Arabic digit dataset. Experimental results have shown that the proposed method is fast, accurate, and robust against the local minima of PSO. Finally, the proposed method is compared with state of the art methods and some ensemble classifier based on MLP, RBF, and ANFIS with various combination rules.  相似文献   

6.
7.
8.
In the present article, new techniques have been introduced for revealing the individual features of a person??s handwriting pattern from the scanned images of handwritten text lines to facilitate text-independent writer identification. These techniques are aimed at designing a dynamic model which can be formalized according to any handwritten text line. Various combinations of the extracted features are applied to three well known classifiers for evaluating the contribution of features to define the correct identification rate. The K-NN, GMM, and Normal Density Discriminant Function Bayes classifiers are used in the present identification model. The experimental studies are conducted using two datasets obtained from the IAM database. The first dataset has already been proposed and used in the literature, whereas the second dataset is an expanded version of the first dataset and has been constituted for the first time in this study to analyze the performance of the extracted features under conditions such as an increased number of writers to discriminate in the database and a decreased number of text lines per writer. The remarkable identification rates obtained from the three classifiers on both datasets clearly indicate that the proposed feature extraction techniques can be effectively used in writer identification systems.  相似文献   

9.
Word spotting is the answer to the question whether the document contains the user’s query word. One of the main challenges of keyword spotting at the testing stage is that some testing non-classes are not included in training classes. Hence, this paper presents a robust handwritten word-spotting method for handwritten documents using genetic programming (GP). Using this technique, a tree is created as a classifier which separates the target class (keyword) from the other classes (non-keyword). The new components of the proposed classifier include proper chromosome and new classification fitness function. The proposed chromosome was based on the relationship between features and each chromosome (tree) mapped the features to a real number. Then, a margin was obtained from the real number. To evaluate the generality of the proposed method, several experiments have been designed and implemented on three standard datasets (namely IFN/ENIT Arabic for Arabic, IFN/Farsi for Persian, and George Washington for English). The results of experiments carried out on these three datasets show that the proposed method has much higher precision and recall than previous methods  相似文献   

10.
The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database.  相似文献   

11.
12.
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier.  相似文献   

13.
14.
Online reviews regarding purchasing services or products offered are the main source of users’ opinions. To gain fame or profit, generally, spam reviews are written to demote or promote certain targeted products or services. This practice is called review spamming. During the last few years, various techniques have been recommended to solve the problem of spam reviews. Previous spam detection study focuses on English reviews, with a lesser interest in other languages. Spam review detection in Arabic online sources is an innovative topic despite the vast amount of data produced. Thus, this study develops an Automated Spam Review Detection using optimal Stacked Gated Recurrent Unit (SRD-OSGRU) on Arabic Opinion Text. The presented SRD-OSGRU model mainly intends to classify Arabic reviews into two classes: spam and truthful. Initially, the presented SRD-OSGRU model follows different levels of data preprocessing to convert the actual review data into a compatible format. Next, unigram and bigram feature extractors are utilized. The SGRU model is employed in this study to identify and classify Arabic spam reviews. Since the trial-and-error adjustment of hyperparameters is a tedious process, a white shark optimizer (WSO) is utilized, boosting the detection efficiency of the SGRU model. The experimental validation of the SRD-OSGRU model is assessed under two datasets, namely DOSC dataset. An extensive comparison study pointed out the enhanced performance of the SRD-OSGRU model over other recent approaches.  相似文献   

15.
16.
17.
18.
Learning sparse feature representations is a useful instrument for solving an unsupervised learning problem. In this paper, we present three labeled handwritten digit datasets, collectively called n-MNIST by adding noise to the MNIST dataset, and three labeled datasets formed by adding noise to the offline Bangla numeral database. Then we propose a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets. On the MNIST, n-MNIST and noisy Bangla datasets, our framework shows promising results and outperforms traditional Deep Belief Networks.  相似文献   

19.
20.
Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号