首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 289 毫秒
1.
In this paper, we present a system that automatically translates Arabic text embedded in images into English. The system consists of three components: text detection from images, character recognition, and machine translation. We formulate the text detection as a binary classification problem and apply gradient boosting tree (GBT), support vector machine (SVM), and location-based prior knowledge to improve the F1 score of text detection from 78.95% to 87.05%. The detected text images are processed by off-the-shelf optical character recognition (OCR) software. We employ an error correction model to post-process the noisy OCR output, and apply a bigram language model to reduce word segmentation errors. The translation module is tailored with compact data structure for hand-held devices. The experimental results show substantial improvements in both word recognition accuracy and translation quality. For instance, in the experiment of Arabic transparent font, the BLEU score increases from 18.70 to 33.47 with use of the error correction module.  相似文献   

2.
Traditionally, a corpus is a large structured set of text, electronically stored and processed. Corpora have become very important in the study of languages. They have opened new areas of linguistic research, which were unknown until recently. Corpora are also key to the development of optical character recognition (OCR) applications. Access to a corpus of both language and images is essential during OCR development, particularly while training and testing a recognition application. Excellent corpora have been developed for Latin-based languages, but few relate to the Arabic language. This limits the penetration of both corpus linguistics and OCR in Arabic-speaking countries. This paper describes the construction and provides a comprehensive study and analysis of a multi-modal Arabic corpus (MMAC) that is suitable for use in both OCR development and linguistics. MMAC currently contains six million Arabic words and, unlike previous corpora, also includes connected segments or pieces of Arabic words (PAWs) as well as naked pieces of Arabic words (NPAWs) and naked words (NWords); PAWs and Words without diacritical marks. Multi-modal data is generated from both text, gathered from a wide variety of sources, and images of existing documents. Text-based data is complemented by a set of artificially generated images showing each of the Words, NWords, PAWs and NPAWs involved. Applications are provided to generate a natural-looking degradation to the generated images. A ground truth annotation is offered for each such image, while natural images showing small paragraphs and full pages are augmented with representations of the text they depict. A statistical analysis and verification of the dataset has been carried out and is presented. MMAC was also tested using commercial OCR software and is publicly and freely available.  相似文献   

3.
4.
5.
An omnifont open-vocabulary OCR system for English and Arabic   总被引:2,自引:0,他引:2  
We present an omnifont, unlimited-vocabulary OCR system for English and Arabic. The system is based on hidden Markov models (HMM), an approach that has proven to be very successful in the area of automatic speech recognition. We focus on two aspects of the OCR system. First, we address the issue of how to perform OCR on omnifont and multi-style data, such as plain and italic, without the need to have a separate model for each style. The amount of training data from each style, which is used to train a single model, becomes an important issue in the face of the conditional independence assumption inherent in the use of HMMs. We demonstrate mathematically and empirically how to allocate training data among the different styles to alleviate this problem. Second, we show how to use a word-based HMM system to perform character recognition with unlimited vocabulary. The method includes the use of a trigram language model on character sequences. Using all these techniques, we have achieved character error rates of 1.1 percent on data from the University of Washington English Document Image Database and 3.3 percent on data from the DARPA Arabic OCR Corpus  相似文献   

6.

Optical character recognition (OCR) has proved a powerful tool for the digital analysis of printed historical documents. However, its ability to localize and identify individual glyphs is challenged by the tremendous variety in historical type design, the physicality of the printing process, and the state of conservation. We propose to mitigate these problems by a downstream fine-tuning step that corrects for pathological and undesirable extraction results. We implement this idea by using a joint energy-based model which classifies individual glyphs and simultaneously prunes potential out-of-distribution (OOD) samples like rubrications, initials, or ligatures. During model training, we introduce specific margins in the energy spectrum that aid this separation and explore the glyph distribution’s typical set to stabilize the optimization procedure. We observe strong classification at 0.972 AUPRC across 42 lower- and uppercase glyph types on a challenging digital reproduction of Johannes Balbus’ Catholicon, matching the performance of purely discriminative methods. At the same time, we achieve OOD detection rates of 0.989 AUPRC and 0.946 AUPRC for OOD ‘clutter’ and ‘ligatures’ which substantially improves upon recently proposed OOD detection techniques. The proposed approach can be easily integrated into the postprocessing phase of current OCR to aid reproduction and shape analysis research.

  相似文献   

7.
In this paper, we focus on information extraction from optical character recognition (OCR) output. Since the content from OCR inherently has many errors, we present robust algorithms for information extraction from OCR lattices instead of merely looking them up in the top-choice (1-best) OCR output. Specifically, we address the challenge of named entity detection in noisy OCR output and show that searching for named entities in the recognition lattice significantly improves detection accuracy over 1-best search. While lattice-based named entity (NE) detection improves NE recall from OCR output, there are two problems with this approach: (1) the number of false alarms can be prohibitive for certain applications and (2) lattice-based search is computationally more expensive than 1-best NE lookup. To mitigate the above challenges, we present techniques for reducing false alarms using confidence measures and for reducing the amount of computation involved in performing the NE search. Furthermore, to demonstrate that our techniques are applicable across multiple domains and languages, we experiment with optical character recognition systems for videotext in English and scanned handwritten text in Arabic.  相似文献   

8.
9.
The aim of our work is to present a new method based on structural characteristics and a fuzzy classifier for off-line recognition of handwritten Arabic characters in all their forms (beginning, end, middle and isolated). The proposed method can be integrated in any handwritten Arabic words recognition system based on an explicit segmentation process. First, three preprocessing operations are applied on character images: thinning, contour tracing and connected components detection. These operations extract structural characteristics used to divide the set of characters into five subsets. Next, features are extracted using invariant pseudo-Zernike moments. Classification was done using the Fuzzy ARTMAP neural network, which is very fast in training and supports incremental learning. Five Fuzzy ARTMAP neural networks were employed; each one is designed to recognize one subset of characters. The recognition process is achieved in two steps: in the first one, a clustering method affects characters to one of the five character subsets. In the second one, the pseudo-Zernike features are used by the appropriate Fuzzy ARTMAP classifier to identify the character. Training process and tests were performed on a set of character images manually extracted from the IFN/ENIT database. A height recognition rate was reported.  相似文献   

10.
This paper provides a thorough evaluation of a set of six important Arabic OCR systems available in the market; namely: Abbyy FineReader, Leadtools, Readiris, Sakhr, Tesseract and NovoVerus. We test the OCR systems using a randomly selected images from the well known Arabic Printed Text Image database (250 images from the APTI database) and using a set of 8 images from an Arabic book. The APTI database contains 45.313.600 of both decomposable and non-decomposable word images. In the evaluation, we conduct two tests. The first test is based on usual metrics used in the literature. In the second test, we provide a novel measure for Arabic language, which can be used for other non-Latin languages.  相似文献   

11.

The number of traffic accidents in Brazil has reached alarming levels and is currently one of the leading causes of death in the country. With the number of vehicles on the roads increasing rapidly, these problems will tend to worsen. Consequently, huge investments in resources to increase road safety will be required. The vertical R-19 system for optical character recognition of regulatory traffic signs (maximum speed limits) according to Brazilian Standards developed in this work uses a camera positioned at the front of the vehicle, facing forward. This is so that images of traffic signs can be captured, enabling the use of image processing and analysis techniques for sign detection. This paper proposes the detection and recognition of speed limit signs based on a cascade of boosted classifiers working with haar-like features. The recognition of the sign detected is achieved based on the optimum-path forest classifier (OPF), support vector machines (SVM), multilayer perceptron, k-nearest neighbor (kNN), extreme learning machine, least mean squares, and least squares machine learning techniques. The SVM, OPF and kNN classifiers had average accuracies higher than 99.5 %; the OPF classifier with a linear kernel took an average time of 87 \(\upmu\)s to recognize a sign, while kNN took 11,721 \(\upmu\)s and SVM 12,595 \(\upmu\)s. This sign detection approach found and recognized successfully 11,320 road signs from a set of 12,520 images, leading to an overall accuracy of 90.41 %. Analyzing the system globally recognition accuracy was 89.19 %, as 11,167 road signs from a database with 12,520 signs were correctly recognized. The processing speed of the embedded system varied between 20 and 30 frames per second. Therefore, based on these results, the proposed system can be considered a promising tool with high commercial potential.

  相似文献   

12.
基于PCA学习子空间算法的有限汉字识别   总被引:11,自引:0,他引:11       下载免费PDF全文
采用PCA学习子空间方法来进行灰度图象上字符的识别,不仅克服了传统的基于二值化字符特征提取和识别所带来的主要困难,还尽量多地保存了字符特征,该算法在PCA子空间的基础上,通过反馈监督学习的方法使子空间作旋转调整,从而获得了更好的分类效果,特别当字符类别数不是很大时,子空间的训练时间也将在可接受的范围之内,应用效果也表明,采用PCAA学习子空间算法对车牌汉字这一有限汉字集进行识别,取得了较好的效果,实用价值较高。  相似文献   

13.
14.
Handwritten digit recognition has long been a challenging problem in the field of optical character recognition and of great importance in industry. This paper develops a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve performance of isolated Farsi/Arabic handwritten digit recognition, we use Bag of Visual Words (BoVW) technique to construct images feature vectors. Each visual word is described by Scale Invariant Feature Transform (SIFT) method. For learning feature vectors, Quantum Neural Networks (QNN) classifier is used. Experimental results on a very popular Farsi/Arabic handwritten digit dataset (HODA dataset) show that proposed method can achieve the highest recognition rate compared to other state of the arts methods.  相似文献   

15.
计算机计算性能的提升使得深度学习成为了可能。作为计算机视觉领域的重要发展方向之一的目标检测也开始结合深度学习方法并广泛应用于各行各业。受限于网络的复杂度和检测算法的设计,目标检测的速度和精度成为一个trade-off。目前电商领域的飞速发展产生了大量包含商品参数的图片,使用传统方法难以有效地提取出图片中的商品参数信息。针对这一问题,本文提出了一种将深度学习检测算法和传统OCR技术相结合的方法,在保证了识别速度的同时大大提升了识别的精度。本文研究的问题包括检测模型、针对特定数据训练、图片预处理以及文字识别等。本文首先比较了现有的目标检测算法,权衡其优缺点,然后使用YOLO模型完成检测任务,并针对YOLO模型中存在的不足进行了一定的改进和优化,得到了一个专用于检测图片中商品参数的目标检测模型,最后使用tesseract完成文字提取任务。在将整个流程结合到一起后,我们的系统不仅有着较好的识别精度,而且是高效和健壮的。本文最后还讨论了优势和不足之处,并指出了未来工作的方向。  相似文献   

16.
17.
The common paradigm employed for object detection is the sliding window (SW) search. This approach generates grid-distributed patches, at all possible positions and sizes, which are evaluated by a binary classifier: The tradeoff between computational burden and detection accuracy is the real critical point of sliding windows; several methods have been proposed to speed up the search such as adding complementary features. We propose a paradigm that differs from any previous approach since it casts object detection into a statistical-based search using a Monte Carlo sampling for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multistage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifiers. The method can be easily plugged into a Bayesian-recursive framework to exploit the temporal coherency of the target objects in videos. Several tests on pedestrian and face detection, both on images and videos, with different types of classifiers (cascade of boosted classifiers, soft cascades, and SVM) and features (covariance matrices, Haar-like features, integral channel features, and histogram of oriented gradients) demonstrate that the proposed method provides higher detection rates and accuracy as well as a lower computational burden w.r.t. sliding window detection.  相似文献   

18.
In this paper, we propose multi-view object detection methodology by using specific extended class of haar-like filters, which apparently detects the object with high accuracy in the unconstraint environments. There are several object detection techniques, which work well in restricted environments, where illumination is constant and the view angle of the object is restricted. The proposed object detection methodology successfully detects faces, cars, logo objects at any size and pose with high accuracy in real world conditions. To cope with angle variation, we propose a multiple trained cascades by using the proposed filters, which performs even better detection by spanning a different range of orientation in each cascade. We tested the proposed approach by still images by using image databases and conducted some evaluations by using video images from an IP camera placed in outdoor. We tested the method for detecting face, logo, and vehicle in different environments. The experimental results show that the proposed method yields higher classification performance than Viola and Jones’s detector, which uses a single feature for each weak classifier. Given the less number of features, our detector detects any face, object, or vehicle in 15 fps when using 4 megapixel images with 95% accuracy on an Intel i7 2.8 GHz machine.  相似文献   

19.
In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby FinerReader. We also get promising results with our ad detection system on a large set of complex layout testing images.  相似文献   

20.
This paper presents a new method for detecting and recognizing text in complex images and video frames. Text detection is performed in a two-step approach that combines the speed of a text localization step, enabling text size normalization, with the strength of a machine learning text verification step applied on background independent features. Text recognition, applied on the detected text lines, is addressed by a text segmentation step followed by an traditional OCR algorithm within a multi-hypotheses framework relying on multiple segments, language modeling and OCR statistics. Experiments conducted on large databases of real broadcast documents demonstrate the validity of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号