首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
Abstract

Because neural networks specialize in handling ambiguous data, they are especially suited for such applications as speech recognition and optical character recognition (OCR). OCR applications are usually ambiguous because their data is generated by an inconsistent factor—the individual. This article provides an overview of neural networks and describes how this technology can be integrated with OCR technology to create neural OCR networks that can significantly improve the process of optical character recognition.  相似文献   

4.
5.
6.
7.
8.
9.
10.
11.
12.
An overview of character recognition methodologies   总被引:3,自引:0,他引:3  
This work presents an overview of character recognition methodologies that have evolved in this century. At first the scanning devices that are used in character recognition will be explained, then some points will be stressed on the major research works that have made a great impact in character recognition. From a methodological point of view we will present the different steps that have been employed in OCR. And finally the most important industrial character recognisers will be covered along with the character data bases that are used in testing the various algorithms.  相似文献   

13.
14.
Because neural networks specialize in handling ambiguous data, they are especially suited for such applications as speech recognition and optical character recognition (OCR). OCR applications are usually ambiguous because their data is generated by an inconsistent factor—the individual. This article provides an overview of neural networks and describes how this technology can be integrated with OCR technology to create neural OCR networks that can significantly improve the process of optical character recognition.  相似文献   

15.
目的 针对仪表、电梯等标牌上一些字符间距较小,传统分割方法分割不准确,字符识别率不高的问题,提出了一种标牌粘连字符自适应定位分割重建识别算法。方法 首先对标牌图像进行中值滤波、二值化等预处理;其次运用数学形态学方法对预处理后的图像进行开运算及腐蚀,将字符间一些无用的信息去掉,增大字符间距;继而通过形心算法找出每个字符的几何中心,并通过Sobel边缘检测算子根据几何中心获取每个字符边框,建立ROI(region of interest),再返回标牌原图利用已经建立的ROI从中分割字符,依据国家字符间距相关标准,在分割的每个字符后加一定像素宽的矩形间隔条后重建字符图像,再进行OCR(optical character recognition)字符识别。结果 经过对993块标牌进行字符识别实验,算法的识别率达到95.7%。结论 实验结果表明本文算法是对标牌字符识别的一种有效算法。  相似文献   

16.
17.

Optical character recognition (OCR) systems help to digitize paper-based historical achieves. However, poor quality of scanned documents and limitations of text recognition techniques result in different kinds of errors in OCR outputs. Post-processing is an essential step in improving the output quality of OCR systems by detecting and cleaning the errors. In this paper, we present an automatic model consisting of both error detection and error correction phases for OCR post-processing. We propose a novel approach of OCR post-processing error correction using correction pattern edits and evolutionary algorithm which has been mainly used for solving optimization problems. Our model adopts a variant of the self-organizing migrating algorithm along with a fitness function based on modifications of important linguistic features. We illustrate how to construct the table of correction pattern edits involving all types of edit operations and being directly learned from the training dataset. Through efficient settings of the algorithm parameters, our model can be performed with high-quality candidate generation and error correction. The experimental results show that our proposed approach outperforms various baseline approaches as evaluated on the benchmark dataset of ICDAR 2017 Post-OCR text correction competition.

  相似文献   

18.
19.
In this paper, we focus on information extraction from optical character recognition (OCR) output. Since the content from OCR inherently has many errors, we present robust algorithms for information extraction from OCR lattices instead of merely looking them up in the top-choice (1-best) OCR output. Specifically, we address the challenge of named entity detection in noisy OCR output and show that searching for named entities in the recognition lattice significantly improves detection accuracy over 1-best search. While lattice-based named entity (NE) detection improves NE recall from OCR output, there are two problems with this approach: (1) the number of false alarms can be prohibitive for certain applications and (2) lattice-based search is computationally more expensive than 1-best NE lookup. To mitigate the above challenges, we present techniques for reducing false alarms using confidence measures and for reducing the amount of computation involved in performing the NE search. Furthermore, to demonstrate that our techniques are applicable across multiple domains and languages, we experiment with optical character recognition systems for videotext in English and scanned handwritten text in Arabic.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号