首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier.  相似文献   

2.
笔迹鉴别的目的是区分不同的书写者,而笔划提取是笔迹鉴别的基础。本文提出了一种用于笔迹鉴别的手写汉字笔划提取算法,该算法定义了凹凸点与四种基本笔划相交类型的对应关系,通过字符图像轮廓上的凹凸点检测来确定笔划相交区域和相交类型;接着,在各个相交区域上,根据其笔划相交类型进行形状分割;最后,用对笔划轮廓两侧对应点进行跟踪的方法来进行细化。我们将该算法与基于细化和基于段化的笔划提取算法进行比较,实验结果表明,该提取算法具有比较高的准确率和有效性,因此本文提出的基于形状分割的手写汉字笔划提取方法具有较高的可操作性和实用价值。  相似文献   

3.
The style of people's handwriting is a biometric feature that is used in person authentication. In this paper, we have proposed a text independent method for Persian writer identification. In the proposed method, pattern based features are extracted from data using Gabor and XGabor filter. The extracted features are represented for each person by using a graph that is called FRG (feature relation graph). This graph is constructed using relations between extracted features by employing a fuzzy method. The fuzzy method determines the similarity between features extracted from different handwritten instances of each person. In the identification phase, a graph similarity approach is employed to determine the similarity of the FRG generated from the test data and the FRGs generated by training data. The experimental results were satisfactory and the proposed method got about 100% accuracy on a dataset with 100 writers when enough training data was used. However, this method has been applied on Persian handwritings but we believe it can be extended on other languages especially in data representation and classification parts.  相似文献   

4.
In this paper we address the task of writer identification of on-line handwriting captured from a whiteboard. Different sets of features are extracted from the recorded data and used to train a text and language independent on-line writer identification system. The system is based on Gaussian mixture models (GMMs) which provide a powerful yet simple means of representing the distribution of the features extracted from the handwritten text. The training data of all writers are used to train a universal background model (UBM) from which a client specific model is obtained by adaptation. Different sets of features are described and evaluated in this work. The system is tested using text from 200 different writers. A writer identification rate of 98.56% on the paragraph and of 88.96% on the text line level is achieved.  相似文献   

5.
Many techniques have been reported for handwriting-based writer identification. None of these techniques assume that the written text is in Arabic. In this paper we present a new technique for feature extraction based on hybrid spectral–statistical measures (SSMs) of texture. We show its effectiveness compared with multiple-channel (Gabor) filters and the grey-level co-occurrence matrix (GLCM), which are well-known techniques yielding a high performance in writer identification in Roman handwriting. Texture features were extracted for wide range of frequency and orientation because of the nature of the spread of Arabic handwriting compared with Roman handwriting, and the most discriminant features were selected with a model for feature selection using hybrid support vector machine–genetic algorithm techniques. Four classification techniques were used: linear discriminant classifier (LDC), support vector machine (SVM), weighted Euclidean distance (WED), and the K nearest neighbours (K_NN) classifier. Experiments were performed using Arabic handwriting samples from 20 different people and very promising results of 90.0% correct identification were achieved.  相似文献   

6.
This paper presents a new technique of high accuracy to recognize both typewritten and handwritten English and Arabic texts without thinning. After segmenting the text into lines (horizontal segmentation) and the lines into words, it separates the word into its letters. Separating a text line (row) into words and a word into letters is performed by using the region growing technique (implicit segmentation) on the basis of three essential lines in a text row. This saves time as there is no need to skeletonize or to physically isolate letters from the tested word whilst the input data involves only the basic information—the scanned text. The baseline is detected, the word contour is defined and the word is implicitly segmented into its letters according to a novel algorithm described in the paper. The extracted letter with its dots is used as one unit in the system of recognition. It is resized into a 9 × 9 matrix following bilinear interpolation after applying a lowpass filter to reduce aliasing. Then the elements are scaled to the interval [0,1]. The resulting array is considered as the input to the designed neural network. For typewritten texts, three types of Arabic letter fonts are used—Arial, Arabic Transparent and Simplified Arabic. The results showed an average recognition success rate of 93% for Arabic typewriting. This segmentation approach has also found its application in handwritten text where words are classified with a relatively high recognition rate for both Arabic and English languages. The experiments were performed in MATLAB and have shown promising results that can be a good base for further analysis and considerations of Arabic and other cursive language text recognition as well as English handwritten texts. For English handwritten classification, a success rate of about 80% in average was achieved while for Arabic handwritten text, the algorithm performance was successful in about 90%. The recent results have shown increasing success for both Arabic and English texts.  相似文献   

7.
The aim of writer identification is determining the writer of a piece of handwriting from a set of writers. In this paper, we present an architecture for writer identification in old handwritten music scores. Even though an important amount of music compositions contain handwritten text, the aim of our work is to use only music notation to determine the author. The main contribution is therefore the use of features extracted from graphical alphabets. Our proposal consists in combining the identification results of two different approaches, based on line and textural features. The steps of the ensemble architecture are the following. First of all, the music sheet is preprocessed for removing the staff lines. Then, music lines and texture images are generated for computing line features and textural features. Finally, the classification results are combined for identifying the writer. The proposed method has been tested on a database of old music scores from the seventeenth to nineteenth centuries, achieving a recognition rate of about 92% with 20 writers.  相似文献   

8.
In the present article, new techniques have been introduced for revealing the individual features of a person??s handwriting pattern from the scanned images of handwritten text lines to facilitate text-independent writer identification. These techniques are aimed at designing a dynamic model which can be formalized according to any handwritten text line. Various combinations of the extracted features are applied to three well known classifiers for evaluating the contribution of features to define the correct identification rate. The K-NN, GMM, and Normal Density Discriminant Function Bayes classifiers are used in the present identification model. The experimental studies are conducted using two datasets obtained from the IAM database. The first dataset has already been proposed and used in the literature, whereas the second dataset is an expanded version of the first dataset and has been constituted for the first time in this study to analyze the performance of the extracted features under conditions such as an increased number of writers to discriminate in the database and a decreased number of text lines per writer. The remarkable identification rates obtained from the three classifiers on both datasets clearly indicate that the proposed feature extraction techniques can be effectively used in writer identification systems.  相似文献   

9.
This paper proposes an automatic text-independent writer identification framework that integrates an industrial handwriting recognition system, which is used to perform an automatic segmentation of an online handwritten document at the character level. Subsequently, a fuzzy c-means approach is adopted to estimate statistical distributions of character prototypes on an alphabet basis. These distributions model the unique handwriting styles of the writers. The proposed system attained an accuracy of 99.2% when retrieved from a database of 120 writers. The only limitation is that a minimum length of text needs to be present in the document in order for sufficient accuracy to be achieved. We have found that this minimum length of text is about 160 characters or approximately equivalent to 3 lines of text. In addition, the discriminative power of different alphabets on the accuracy is also reported.  相似文献   

10.
基于多通道分解与匹配的笔迹鉴别研究   总被引:17,自引:0,他引:17  
笔迹鉴别是通过分析手写字符的书写风格来判断书写人身份的一门技术.笔迹鉴别 的关键步骤是提取反映书写风格的笔迹特征.笔迹特征包括笔划位置、方向、搭配关系等,它 们可以通过图像多通道分解提取和表达出来.本文提出一种用于笔迹鉴别的二值图像多通道 分解方法,利用字符的笔划方向性先进行方向分解,然后对每个方向的子图像进行频带分解. 用分解后的采样信号值作为笔迹特征,用特征匹配方法进行书写人识别,得到了很好的实验 结果.  相似文献   

11.
The writer identification system identifies individuals based on their handwriting is a frequent topic in biometric authentication and verification systems. Due to its importance, numerous studies have been conducted in various languages. Researchers have established several learning methods for writer identification including supervised and unsupervised learning. However, supervised methods require a large amount of annotation data, which is impossible in most scenarios. On the other hand, unsupervised writer identification methods may be limited and dependent on feature extraction that cannot provide the proper objectives to the architecture and be misinterpreted. This paper introduces an unsupervised writer identification system that analyzes the data and recognizes the writer based on the inter-feature relations of the data to resolve the uncertainty of the features. A pairwise architecture-based Autoembedder was applied to generate clusterable embeddings for handwritten text images. Furthermore, the trained baseline architecture generates the embedding of the data image, and the K-means algorithm is used to distinguish the embedding of individual writers. The proposed model utilized the IAM dataset for the experiment as it is inconsistent with contributions from the authors but is easily accessible for writer identification tasks. In addition, traditional evaluation metrics are used in the proposed model. Finally, the proposed model is compared with a few unsupervised models, and it outperformed the state-of-the-art deep convolutional architectures in recognizing writers based on unlabeled data.  相似文献   

12.
针对不同书写者书写同一字的分类问题,在C 均值法和马氏距离测度的基础之上,提出了一种动态聚类算法,并讨论了签字的总体特征选择问题。利用该聚类算法对不同书写者的签字进行二分分类得到了较好的效果。实验显示,选择一组代表书写者书写风格的特征是分类成败的关键。文中选取的五个总体特征应用到非模仿的签字鉴别中有较好效果。  相似文献   

13.
14.
以手写汉字的基本笔画为研究对象,提取笔画的起笔、收笔和笔压作为特征量,进行笔迹鉴定的研究.研究采用了10位书写者,每位书写者各书写70个汉字作为样本,提取4种基本笔画,进行笔迹鉴定的实验,实验取得了较为满意的鉴定率.本研究克服了以往笔迹鉴定中结体依存的不足,适用于所有的汉字.  相似文献   

15.
提出一种基于滑动窗口的局部轮廓结构特征的文本无关的笔迹鉴别方法。该方法利用概率分布函数对笔迹中出现的各类局部轮廓形状结构的分布进行了描述,并采用卡方距离度量方法对笔迹进行最终的相似性度量。实验结果表明,在包含240人的HIT-MW中文笔迹库上有效地提高了鉴别正确率。  相似文献   

16.
由于手写哈萨克字符结构的特殊性,仅提取几种单一的字符特征进行识别时正确率较低,识别效果较差。由此采用改进的PCA方法定位单词基线位置,对每个字符提取包括笔画密度特征、投影特征、轮廓特征等在内的36种特征,使用K-W检验对各特征的分类能力进行比较,并采用线性判别函数进行分类,取得了较高的识别精度。实验结果表明,该系统针对脱机字符识别率达到94%以上。  相似文献   

17.
18.
We propose an effective method for automatic writer recognition from unconstrained handwritten text images. Our method relies on two different aspects of writing: the presence of redundant patterns in the writing and its visual attributes. Analyzing small writing fragments, we seek to extract the patterns that an individual employs frequently as he writes. We also exploit two important visual attributes of writing, orientation and curvature, by computing a set of features from writing samples at different levels of observation. Finally we combine the two facets of handwriting to characterize the writer of a handwritten sample. The proposed methodology evaluated on two different data sets exhibits promising results on writer identification and verification.  相似文献   

19.
20.
现有的手写汉字脱机笔迹鉴别方法存在只能针对特定字符或需要大量样本字符等问题,为此提出一种基于笔画曲率特征的笔迹鉴别方法。首先运用数学形态学对采集的笔迹图像进行预处理,在横、竖、撇、捺四个方向提取具有代表性的笔画骨架,然后对笔画骨架进行圆的重构,提取四个方向笔画圆的曲率作为特征值组成笔迹特征矩,根据待鉴别的笔迹特征矩与数据库中笔迹特征矩向量夹角相似性度量结果对样本做出判断。实验结果表明该文方法对于待鉴别样本字符的内容没有要求,样本字符数量要求低、应用范围广、鲁棒性强。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号