首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
提出一种联合两种特征的手写体维文字符识别算法。该算法对手写体维文字符图像进行实值Gabor能量特征和方向线素网格特征的提取,将实值Gabor滤波器的128维能量特征和方向线素的128维网格特征结合起来,使用KNN分类器对两种特征进行联合分类。对手写体维文字符数据库中的样本分别进行手写体维文字符特征识别和维文字符笔迹特征识别。实验结果表明,和采用一种特征的识别算法比较,进一步提高了手写体维文字符的识别率。该算法也可用于手写体阿拉伯文字符的识别。  相似文献   

2.
3.
4.
5.
This paper describes a handwritten Chinese text editing and recognition system that can edit handwritten text and recognize it with a client-server mode. First, the client end samples and redisplays the handwritten text by using digital ink technics, segments handwritten characters, edits them and saves original handwritten information into a self-defined document. The self-defined document saves coordinates of all sampled points of handwriting characters. Second, the server recognizes handwritten document based on the proposed Gabor feature extraction and affinity propagation clustering (GFAP) method, and returns the recognition results to client end. Moreover, the server can also collect the labeled handwritten characters and fine tune the recognizer automatically. Experimental results on HIT-OR3C database show that our handwriting recognition method improves the recognition performance remarkably.  相似文献   

6.
7.
Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%.  相似文献   

8.
手写汉字识别是手写汉字输入的基础。目前智能设备中的手写汉字输入法无法根据用户的汉字书写习惯,动态调整识别模型以提升手写汉字的正确识别率。通过对最新深度学习算法及训练模型的研究,提出了一种基于用户手写汉字样本实时采集的个性化手写汉字输入系统的设计方法。该方法将采集用户的手写汉字作为增量样本,通过对服务器端训练生成的手写汉字识别模型的再次训练,使识别模型能够更好地适应该用户的书写习惯,提升手写汉字输入系统的识别率。最后,在该理论方法的基础上,结合新设计的深度残差网络,进行了手写汉字识别的对比实验。实验结果显示,通过引入实时采集样本的再次训练,手写汉字识别模型的识别率有较大幅度的提升,能够更有效的满足用户在智能设备端对手写汉字输入系统的使用需求。  相似文献   

9.
This paper considers the development of a real-time Arabic handwritten character recognition system. The shape of an Arabic character depends on its position in a given word. The system assumes that characters result from a reliable segmentation stage, thus, the position of the character is known a priori. Thus, four different sets of character shapes have been independently considered. Each set is further divided into four subsets depending on the number of strokes in the character. The system has been heavily tested and the average recognition rate has been found to be 99.6% where most of the misrecognized characters were actually written with little care. Thus, the system can be reliably used for the recognition of on-line handwritten characters entered via a graphic tablet.  相似文献   

10.

Automated techniques for Arabic content recognition are at a beginning period contrasted with their partners for the Latin and Chinese contents recognition. There is a bulk of handwritten Arabic archives available in libraries, data centers, historical centers, and workplaces. Digitization of these documents facilitates (1) to preserve and transfer the country’s history electronically, (2) to save the physical storage space, (3) to proper handling of the documents, and (4) to enhance the retrieval of information through the Internet and other mediums. Arabic handwritten character recognition (AHCR) systems face several challenges including the unlimited variations in human handwriting and the leakage of large and public databases. In the current study, the segmentation and recognition phases are addressed. The text segmentation challenges and a set of solutions for each challenge are presented. The convolutional neural network (CNN), deep learning approach, is used in the recognition phase. The usage of CNN leads to significant improvements across different machine learning classification algorithms. It facilitates the automatic feature extraction of images. 14 different native CNN architectures are proposed after a set of try-and-error trials. They are trained and tested on the HMBD database that contains 54,115 of the handwritten Arabic characters. Experiments are performed on the native CNN architectures and the best-reported testing accuracy is 91.96%. A transfer learning (TF) and genetic algorithm (GA) approach named “HMB-AHCR-DLGA” is suggested to optimize the training parameters and hyperparameters in the recognition phase. The pre-trained CNN models (VGG16, VGG19, and MobileNetV2) are used in the later approach. Five optimization experiments are performed and the best combinations are reported. The highest reported testing accuracy is 92.88%.

  相似文献   

11.
12.
A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.  相似文献   

13.
14.
针对单一尺度的Gabor滤波器组只对某一特定粗细的手写体汉字敏感的缺点,提出了一种新颖的多尺度局部Gabor滤波器组。为了评估该方法的识别性能,提出了一个基于Gabor特征的手写体汉字识别系统,实验表明多尺度全局Gabor滤波器组在识别性能上明显提高,局部Gabor滤波器组在基本保持识别性能的情况下,特征维数明显降低,计算量和内存需求减少。该方法的创新之处在于选取局部Gabor滤波器,对863 HCL2000手写体汉字数据库的识别,最高平均识别率达到了92.32%,表明了该方法在手写体汉字识别中的有效性。  相似文献   

15.
The aim of our work is to present a new method based on structural characteristics and a fuzzy classifier for off-line recognition of handwritten Arabic characters in all their forms (beginning, end, middle and isolated). The proposed method can be integrated in any handwritten Arabic words recognition system based on an explicit segmentation process. First, three preprocessing operations are applied on character images: thinning, contour tracing and connected components detection. These operations extract structural characteristics used to divide the set of characters into five subsets. Next, features are extracted using invariant pseudo-Zernike moments. Classification was done using the Fuzzy ARTMAP neural network, which is very fast in training and supports incremental learning. Five Fuzzy ARTMAP neural networks were employed; each one is designed to recognize one subset of characters. The recognition process is achieved in two steps: in the first one, a clustering method affects characters to one of the five character subsets. In the second one, the pseudo-Zernike features are used by the appropriate Fuzzy ARTMAP classifier to identify the character. Training process and tests were performed on a set of character images manually extracted from the IFN/ENIT database. A height recognition rate was reported.  相似文献   

16.
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier.  相似文献   

17.
Segmentation is the most challenging part of Arabic handwriting recognition due to the unique characteristics of Arabic writing that allow the same shape to denote different characters. An Arabic handwriting recognition system cannot be successful without using an appropriate segmentation method. In this paper, a very effective and efficient off-line Arabic handwriting recognition approach is proposed. The proposed approach has three stages. Firstly, all characters are simplified to single-pixel-thin images that preserve the fundamental writing characteristics. Secondly, the image pixels are normalized into horizontal and vertical lines only. Therefore, the different writing styles can be unified and the shapes of characters are standardized. Finally, these orthogonal lines are coded as unique vectors; each vector represents one letter of a word. To evaluate the proposed techniques, we have tested our approach on two different datasets. Our experimental results show that the proposed approach has superior performance over the state-of-the-art approaches.  相似文献   

18.
Even though a lot of researches have been conducted in order to solve the problem of unconstrained handwriting recognition, an effective solution is still a serious challenge. In this article, we address two Arabic handwriting recognition-related issues. Firstly, we present IESK-arDB, a new multi-propose off-line Arabic handwritten database. It is publicly available and contains more than 4,000 word images, each equipped with binary version, thinned version as well as a ground truth information stored in separate XML file. Additionally, it contains around 6,000 character images segmented from the database. A letter frequency analysis showed that the database exhibits letter frequencies similar to that of large corpora of digital text, which proof the database usefulness. Secondly, we proposed a multi-phase segmentation approach that starts by detecting and resolving sub-word overlaps, then hypothesizing a large number of segmentation points that are later reduced by a set of heuristic rules. The proposed approach has been successfully tested on IESK-arDB. The results were very promising, indicating the efficiency of the suggested approach.  相似文献   

19.
提取有效的特征一直是笔迹鉴别的关键问题,针对传统Gabor滤波器特征提取方法存在的不足,充分利用Gabor滤波系数间的相关关系,提出一种融合全局特征和局部特征的特征提取方法。该方法先通过字符笔画的方向梯度直方图(HOG)来优化Gabor滤波器的角度参数,再利用高斯马尔科夫随机场(GMRF)模型对Gabor滤波图像中的不同局部结构信息进行描述,最终得到笔迹图像的整体特征。以楷书四大家的真迹样本和收集的英文手稿作为实验数据,采用最小加权欧式距离分类器对笔迹样本进行分类,通过五重交叉验证法分别得到97.6%和88.3%的正确分类率,表明该方法提取的特征具有较强的笔迹表征能力,是一种有效的笔迹特征提取方法。  相似文献   

20.
The task of handwritten Chinese character recognition is one of the most challenging areas of human handwriting classification. The main reason for this is related to the writing system itself which encompasses thousands of characters, coupled with high levels of diversity in personal writing styles and attributes. Much of the existing work for both online and off-line handwritten Chinese character recognition has focused on methods which employ feature extraction and segmentation steps. The preprocessed data from these steps form the basis for the subsequent classification and recognition phases. This paper proposes an approach for handwritten Chinese character recognition and classification using only an image alignment technique and does not require the aforementioned steps. Rather than extracting features from the image, which often means building models from very large training data, the proposed method instead uses the mean image transformations as a basis for model building. The use of an image-only model means that no subjective tuning of the feature extraction is required. In addition by employing a fuzzy-entropy-based metric, the work also entails improved ability to model different types of uncertainty. The classifier is a simple distance-based nearest neighbour classification system based on template matching. The approach is applied to a publicly available real-world database of handwritten Chinese characters and demonstrates that it can achieve high classification accuracy and is robust in the presence of noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号