期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A survey on Arabic character segmentation

Yasser M. Alginahi 《International Journal on Document Analysis and Recognition》2013,16(2):105-126

相似文献

2.

徐蔚然郭军潘兴德《计算机学报》2003,26(7):802-805

支票中的待识别文字既可能是印刷体,也可能是手写体．由于印刷体与手写体文字的预处理方法和识别算法不同,因此准确判断文字的字体(手写体或印刷体)是获得高精度识别结果的关键技术之一．该文根据贝叶斯决策理论的最小错误率判决规则,提出了基于评判子的字体判断方法．利用贝叶斯评判子,该文还提出了一种可分性判据：评判子散度;同时还给出了一种估算评判子函数的方法．在无拒识情况下,对12158张实际银行支票的测试中,该方法的正确率为99．4％．相似文献

3.

The optical character recognition of Urdu-like cursive scripts

Saeeda Naz Khizar Hayat Muhammad Imran Razzak Muhammad Waqas Anwar Sajjad A. Madani Samee U. Khan 《Pattern recognition》2014

相似文献

4.

Recognition of handwritten Lanna Dhamma characters using a set of optimally designed moment features

Papangkorn Inkeaw Phasit Charoenkwan Hui-Ling Huang Sanparith Marukatat Shinn-Ying Ho Jeerayut Chaijaruwanich 《International Journal on Document Analysis and Recognition》2017,20(4):259-274

相似文献

5.

基于流形学习的单字符字体辨别 总被引：1，自引：1，他引：0

下载免费PDF全文

何秀玲杨扬陈增照喻莹董才林《计算机工程与应用》2008,44(6):206-209

文字种类识别及字体辨别已成为继印刷体文字识别以后新的国内外研究的热点,关于单字的手写体和印刷体辨别的研究不多,但在表单中却极为常用。对于字体辨别问题,引入流形学习算法局部线性嵌套（LLE）,假定数据为存在于嵌入高维空间的一个低维流形。提出了用于单字字体辨别的LLE泛化方法及邻域和内在维数的参数估计方法,基于印刷体/手写体汉字字符及数字的辨别实验表明,其性能优于直接支持向量机（SVM）分类,且经过LLE降维后的数据直接用线性判别分析方法（LDA）分类可以获得与LLE计算后SVM分类相近甚至更高的正确率和更快的分类速度。相似文献

6.

Using topic models for OCR correction

Faisal Farooq Anurag Bhardwaj Venu Govindaraju 《International Journal on Document Analysis and Recognition》2009,12(3):153-164

Despite several decades of research in document analysis, recognition of unconstrained handwritten documents is still considered a challenging task. Previous research in this area has shown that word recognizers perform adequately on constrained handwritten documents which typically use a restricted vocabulary (lexicon). But in the case of unconstrained handwritten documents, state-of-the-art word recognition accuracy is still below the acceptable limits. The objective of this research is to improve word recognition accuracy on unconstrained handwritten documents by applying a post-processing or OCR correction technique to the word recognition output. In this paper, we present two different methods for this purpose. First, we describe a lexicon reduction-based method by topic categorization of handwritten documents which is used to generate smaller topic-specific lexicons for improving the recognition accuracy. Second, we describe a method which uses topic-specific language models and a maximum-entropy based topic categorization model to refine the recognition output. We present the relative merits of each of these methods and report results on the publicly available IAM database. 相似文献

7.

A benchmark image database of isolated Bangla handwritten compound characters

Nibaran Das Kallol Acharya Ram Sarkar Subhadip Basu Mahantapas Kundu Mita Nasipuri 《International Journal on Document Analysis and Recognition》2014,17(4):413-431

相似文献

8.

Local features enhancement using deep auto-encoder scheme for the recognition of the proposed handwritten Arabic-Maghrebi characters database

Djaghbellou Soumia Attia Abdelouahab Bouziane Abderraouf Akhtar Zahid 《Multimedia Tools and Applications》2022,81(22):31553-31571

相似文献

9.

Automated evaluation of OCR zoning 总被引：1，自引：0，他引：1

Kanai J. Rice S.V. Nartker T.A. Nagy G. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(1):86-90

Many current optical character recognition (OCR) systems attempt to decompose printed pages into a set of zones, each containing a single column of text, before converting the characters into coded form. The authors present a methodology for automatically assessing the accuracy of such decompositions, and demonstrate its use in evaluating six OCR systems 相似文献

10.

High accuracy optical character recognition using neural networkswith centroid dithering

Avi-Itzhak H.I. Diep T.A. Garland H. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(2):218-224

Optical character recognition (OCR) refers to a process whereby printed documents are transformed into ASCII files for the purpose of compact storage, editing, fast retrieval, and other file manipulations through the use of a computer. The recognition stage of an OCR process is made difficult by added noise, image distortion, and the various character typefaces, sizes, and fonts that a document may have. In this study a neural network approach is introduced to perform high accuracy recognition on multi-size and multi-font characters; a novel centroid-dithering training process with a low noise-sensitivity normalization procedure is used to achieve high accuracy results. The study consists of two parts. The first part focuses on single size and single font characters, and a two-layered neural network is trained to recognize the full set of 94 ASCII character images in 12-pt Courier font. The second part trades accuracy for additional font and size capability, and a larger two-layered neural network is trained to recognize the full set of 94 ASCII character images for all point sizes from 8 to 32 and for 12 commonly used fonts. The performance of these two networks is evaluated based on a database of more than one million character images from the testing data set 相似文献

11.

Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)

Noor?A.?Jebril Email author Hussein?R.?Al-Zoubi Qasem?Abu Al-Haija 《Pattern Recognition and Image Analysis》2018,28(2):321-345

Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%. 相似文献

12.

Unconstrained handwritten digit recognition using perceptual shape primitives

Kalyan S. Dash Niladri B. Puhan Ganapati Panda 《Pattern Analysis & Applications》2018,21(2):413-436

相似文献

13.

A method for selecting constrained hand-printed character shapes for machine recognition

Shinghal R Suen CY 《IEEE transactions on pattern analysis and machine intelligence》1982,(1):74-78

Since handwritten characters vary in shape and writing-stroke sequence, it is desirable to develop a standard set of characters that are of high quality, so that not only are they easy to write, but they are also most suitable for machine recognition. A database of more than 100 000 alphanumeric patterns was assembled. It consisted of 174 models of the alphanumeric characters written by both left-handed and right-handed subjects. Based on frequency density and distance measurements, a nietric called the dispersion factor was computed to rank the various models. The principle of the metric is discussed, and results are given indicating the high quality models of the alphanumerics. 相似文献

14.

Development of an efficient neural-based segmentation technique for Arabic handwriting recognition

Husam A. Al Hamad Author Vitae Raed Abu Zitar^{Author Vitae} 《Pattern recognition》2010,43(8):2773-2798

相似文献

15.

Offline recognition of handwritten Bangla characters: an efficient two-stage approach

U. Bhattacharya M. Shridhar S. K. Parui P. K. Sen B. B. Chaudhuri 《Pattern Analysis & Applications》2012,15(4):445-458

相似文献

16.

手写中文地址识别后处理方法的研究 总被引：1，自引：0，他引：1

龙翀庄丽朱小燕黄开竹孙俊堀田悦伸直井聡《中文信息学报》2006,20(6):71-76

OCR(光学字符识别技术)作为方便有效的字体识别技术,在办公自动化、信息恢复、数字图书馆等方面发挥着日益重要的作用。语言模型在OCR后处理,特别是在中文的文字识别后处理方面有着广泛的应用。本文针对手写中文地址的后处理,讨论了语言模型的粒度对识别正确率的影响,分析了基于字和基于词的语言模型各自的优点和缺点,并采用了基于词的语言模型,在此基础上提出了加权词图搜索算法。实验证明,在58269条中文手写地址的测试集上,手写地址的整体识别率由原来的28.56%上升到了75.66% ,错误率下降了65.93% ,大大提高了系统的性能。相似文献

17.

A hierarchical approach to recognition of handwritten Bangla characters

Subhadip Basu Author VitaeAuthor Vitae Ram Sarkar Author VitaeAuthor Vitae Mita Nasipuri^{Author Vitae} Dipak Kumar Basu Author Vitae 《Pattern recognition》2009,42(7):1467-1484

相似文献

18.

Finding words in alphabet soup: Inference on freeform character recognition for historical scripts

Nicholas R. Howe Shaolei Feng R. ManmathaAuthor vitae 《Pattern recognition》2009,42(12):3338-3347

This paper develops word recognition methods for historical handwritten cursive and printed documents. It employs a powerful segmentation-free letter detection method based upon joint boosting with histograms of gradients as features. Efficient inference on an ensemble of hidden Markov models can select the most probable sequence of candidate character detections to recognize complete words in ambiguous handwritten text, drawing on character n-gram and physical separation models. Experiments with two corpora of handwritten historic documents show that this approach recognizes known words more accurately than previous efforts, and can also recognize out-of-vocabulary words. 相似文献

19.

几种脱机手写女书特征提取法的比较研究

万晨《计算机与数字工程》2011,39(1):129-130,162

目前,OCR技术已取得了多项研究成果,但是几乎没有人从事女书文字识别的研究。女书的特征提取是女书文字识别中的一个重点和难点。文章通过比较几种特征提取算法,提出一种基于G-DCD改进的特征提取算法,并用该算法正确提取手写体女书文字特征。相似文献

20.

Robust named entity detection from optical character recognition output

Krishna Subramanian Rohit Prasad Prem Natarajan 《International Journal on Document Analysis and Recognition》2011,14(2):189-200

In this paper, we focus on information extraction from optical character recognition (OCR) output. Since the content from OCR inherently has many errors, we present robust algorithms for information extraction from OCR lattices instead of merely looking them up in the top-choice (1-best) OCR output. Specifically, we address the challenge of named entity detection in noisy OCR output and show that searching for named entities in the recognition lattice significantly improves detection accuracy over 1-best search. While lattice-based named entity (NE) detection improves NE recall from OCR output, there are two problems with this approach: (1) the number of false alarms can be prohibitive for certain applications and (2) lattice-based search is computationally more expensive than 1-best NE lookup. To mitigate the above challenges, we present techniques for reducing false alarms using confidence measures and for reducing the amount of computation involved in performing the NE search. Furthermore, to demonstrate that our techniques are applicable across multiple domains and languages, we experiment with optical character recognition systems for videotext in English and scanned handwritten text in Arabic. 相似文献