首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database.  相似文献   

2.
3.
A novel SVM-based handwritten Tamil character recognition system   总被引:1,自引:0,他引:1  
This paper describes a system for recognizing offline handwritten Tamil characters using support vector machine (SVM). Data samples are collected from different writers on A4 sized documents. They are scanned using a flat bed scanner at a resolution of 300 dpi and stored as gray-scale images. Various preprocessing operations are performed on the digitized image to enhance the quality of the image. Pixel densities are calculated for 64 different zones of the image and these values are used as the features of a character. These features are used to train the SVM. The SVM is tested for the first time to recognize handwritten Tamil characters. The system has achieved a very good recognition accuracy of 82.04% on the handwritten Tamil character database.  相似文献   

4.
This paper presents an effective automated analysis system for mixed documents consisting of handwritten texts and graphic images. In the preprocessing step, an input image is binarized, then graphic regions are separated from text parts using chain codes of connected components. In the character recognition step, we recognize two different sets of handwritten characters: Korean and alphanumeric characters. Considering the structural complexity and variations of Korean characters, we separate them based on partial recognition results of vowels and extract primitive phonemes using a branch and bound algorithm based on dynamic programming (DP) matching. Finally, to validate recognition results, a dictionary and knowledge are employed. Computer simulation with 50 test documents shows that the proposed algorithm analyzes effectively mixed documents.  相似文献   

5.
6.
Handwriting recognition requires tools and techniques that recognize complex character patterns and represent imprecise, common-sense knowledge about the general appearance of characters, words and phrases. Neural networks and fuzzy logic are complementary tools for solving such problems. Neural networks, which are highly nonlinear and highly interconnected for processing imprecise information, can finely approximate complicated decision boundaries. Fuzzy set methods can represent degrees of truth or belonging. Fuzzy logic encodes imprecise knowledge and naturally maintains multiple hypotheses that result from the uncertainty and vagueness inherent in real problems. By combining the complementary strengths of neural and fuzzy approaches into a hybrid system, we can attain an increased recognition capability for solving handwriting recognition problems. This article describes the application of neural and fuzzy methods to three problems: recognition of handwritten words; recognition of numeric fields; and location of handwritten street numbers in address images  相似文献   

7.
张显杰  张之明 《计算机应用》2022,42(8):2394-2400
手写体文本识别技术可以将手写文档转录成可编辑的数字文档。但由于手写的书写风格迥异、文档结构千变万化和字符分割识别精度不高等问题,基于神经网络的手写体英文文本识别仍面临着许多挑战。针对上述问题,提出基于卷积神经网络(CNN)和Transformer的手写体英文文本识别模型。首先利用CNN从输入图像中提取特征,而后将特征输入到Transformer编码器中得到特征序列每一帧的预测,最后经过链接时序分类(CTC)解码器获得最终的预测结果。在公开的IAM(Institut für Angewandte Mathematik)手写体英文单词数据集上进行了大量的实验结果表明,该模型获得了3.60%的字符错误率(CER)和12.70%的单词错误率(WER),验证了所提模型的可行性。  相似文献   

8.
手写文本识别方法主要应用于文本输入技术,对人机交互领域的发展起关键作用。针对多数在线输入法无法识别中英文混合手写识别的问题,提出一种在线中英文混合手写文本识别方法。通过对文本笔画进行基于水平相对位置、垂直重叠率、面积重叠率规则的整合以及连笔切分,得到一系列字符片段,同时利用笔画个数、宽高比、中心偏离、平滑度等几何特征和识别置信度,对字符片段进行中英文分类。在此基础上,根据分类结果并结合自然语言模型的路径评价及动态规划搜索算法,分别对候选的中、英文字符片段进行合并处理,得到待识别的中、英文字符序列,并将其分别送入卷积神经网络的中、英文识别模型中,得到手写文本识别结果。实验结果表明,在线手写中英文混合文本识别正确率达93.67%,不仅能切分在线手写中文文本行,而且对包含字符连笔的在线手写中英文文本行也有较好的切分效果。  相似文献   

9.
10.
本文针对信封地址的识别,设计了一种手写汉字文本切分的协动计算方法。由于考虑了汉字及其左右部件搭配的语义信息,从而取得了较高的正确切分率。在1000份样本信封文本中,无连字时为100%,有连字时也有95%。  相似文献   

11.
任民宏 《微计算机信息》2007,23(15):221-222
针对手写输入法中手写字符识别技术的应用需求,提出了一种利用矢量字符的矢量方向编码技术和概率论知识设计的手写字符识别系统,避免了传统手写输入法中字符识别技术的平滑、除噪、归一化等预处理过程。实践证明提取特征少,识别速度快,准确率高。  相似文献   

12.
手写文档的非结构化,导致对手写文档的编辑很困难。文本行是手写文档中一个显著的结构,它的可靠提取对于更高级别结构化文档(图形与文字分离,段结构的提取,文字的提取)及编辑文档非常重要。目前关于手写文档的结构化,分为联机和脱机两种。使用联机算法提取文本行,然后讨论文本行的提取对手势设计的影响。  相似文献   

13.
14.
The aim of our work is to present a new method based on structural characteristics and a fuzzy classifier for off-line recognition of handwritten Arabic characters in all their forms (beginning, end, middle and isolated). The proposed method can be integrated in any handwritten Arabic words recognition system based on an explicit segmentation process. First, three preprocessing operations are applied on character images: thinning, contour tracing and connected components detection. These operations extract structural characteristics used to divide the set of characters into five subsets. Next, features are extracted using invariant pseudo-Zernike moments. Classification was done using the Fuzzy ARTMAP neural network, which is very fast in training and supports incremental learning. Five Fuzzy ARTMAP neural networks were employed; each one is designed to recognize one subset of characters. The recognition process is achieved in two steps: in the first one, a clustering method affects characters to one of the five character subsets. In the second one, the pseudo-Zernike features are used by the appropriate Fuzzy ARTMAP classifier to identify the character. Training process and tests were performed on a set of character images manually extracted from the IFN/ENIT database. A height recognition rate was reported.  相似文献   

15.
This paper presents an attempt to integrate neural computation with a domain knowledge technique to resolve the problem of the wide variety in handwritten Chinese characters. Despite their complexity, Chinese characters can be seen as structured patterns. Therefore, we propose a symbolic representation to describe these structural formations. In particular, we consider the Fuzzy Attributed Production Rule (FAPR) as a possible symbolic representation. On the neural computational side, we study Fukushima's Neocognitron model, which has been successfully demonstrated to recognize handwritten alphanumerics. Despite its power and tolerance capabilities, the supervised training scheme used by Fukushima is impractical for a large character set such as Chinese characters. We thus propose a ruleembedded Neocognitron network which can be readily mapped with structure-knowledge of Chinese characters as represented in FAPRs. In this paper, we demonstrate how 50 Chinese characters are mapped onto the network, and that the system performance in tolerating character structure deviations is satisfactory.  相似文献   

16.
针对脱机手写维吾尔文本行图像中单词切分问题,提出了FCM融合K-means的聚类算法。通过该算法得到单词内距离和单词间距离两种分类。以聚类结果为依据,对文字区域进行合并,得到切分点,再对切分点内的文字进行连通域标注,进行着色处理。以50幅不同的人书写的维吾尔脱机手写文本图像为实验对象,共有536行和4?002个单词,正确切分率达到80.68%。实验结果表明,该方法解决了手写维吾尔文在切分过程中,单词间距离不规律带来的切分困难的问题和一些单词间重叠的问题。同时实现了大篇幅手写文本图像的整体处理。  相似文献   

17.
18.
Existing word embeddings learning algorithms only employ the contexts of words, but different text documents use words and their relevant parts of speech very differently. Based on the preceding assumption, in order to obtain appropriate word embeddings and further improve the effect of text classification, this paper studies in depth a representation of words combined with their parts of speech. First, using the parts of speech and context of words, a more expressive word embeddings can be obtained. Further, to improve the efficiency of look‐up tables, we construct a two‐dimensional table that is in the <word, part of speech> format to represent words in text documents. Finally, the two‐dimensional table and a Bayesian theorem are used for text classification. Experimental results show that our model has achieved more desirable results on standard data sets. And it has more preferable versatility and portability than alternative models.  相似文献   

19.
研究LeNet-5在扫描文档中手写体日期字符识别的应用,由于文档扫描的过程中会引入各种噪声,特别是光照和颜色干扰,直接使用LeNet-5算法不能取得较好效果。先在整份文档中对特定待识别字符的进行定位和划分,并对划分出的字符图像进行去噪、灰度化和二值化处理等预处理,接着将字符图像分割成一个个单个字符,然后在LeNet-5网络基础上结合模型匹配法实现对手写体日期字符的识别。分析在不同参数组合下的识别效果,调整算法模型参数有效地提升了模型对于实际对象的性能,实现出一种能够对手写体日期字符集实现较好识别效果的算法。实验结果表明了算法的有效性,并应用于具体工程实践。  相似文献   

20.
ART2神经网络在手写体汉字识别中的应用   总被引:4,自引:0,他引:4  
该文提出了一种基于神经网络的手写体汉字识别方法,该算法充分利用神经网络的自适应学习能力。ART2网络通过竞争学习和自稳机制原理实现分类,可以在非平稳的、有干扰的环境中进行无教师无监督的自学习。其学习过程是自组织的实时学习,能够迅速识别已学习过的样本,并能迅速适应未学习过的新对象。考虑到Gabor滤波器具有优良的方向性,该算法采用Gabor特征作为字符特征。Gabor特征反映字符的空间分布特征,而且可以组合成高维矢量,特别适用于汉字识别这大型模式识别场合。实验结果显示,该算法对测试样本识别正确率达到94%,比其他方法更准确、更可靠。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号