共查询到12条相似文献,搜索用时 0 毫秒
1.
We consider the problem of document binarization as a pre-processing step for optical character recognition (OCR) for the purpose of keyword search of historical printed documents. A number of promising techniques from the literature for binarization, pre-filtering, and post-binarization denoising were implemented along with newly developed methods for binarization: an error diffusion binarization, a multiresolutional version of Otsu's binarization, and denoising by despeckling. The OCR in the ABBYY FineReader 7.1 SDK is used as a black box metric to compare methods. Results for 12 pages from six newspapers of differing quality show that performance varies widely by image, but that the classic Otsu method and Otsu-based methods perform best on average. 相似文献
2.
An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows 总被引:1,自引:0,他引:1
Bilal Bataineh Siti Norul Huda Sheikh Abdullah Khairuddin Omar 《Pattern recognition letters》2011,32(14):1805-1813
Binary image representation is essential format for document analysis. In general, different available binarization techniques are implemented for different types of binarization problems. The majority of binarization techniques are complex and are compounded from filters and existing operations. However, the few simple thresholding methods available cannot be applied to many binarization problems. In this paper, we propose a local binarization method based on a simple, novel thresholding method with dynamic and flexible windows. The proposed method is tested on selected samples called the DIBCO 2009 benchmark dataset using specialized evaluation techniques for binarization processes. To evaluate the performance of our proposed method, we compared it with the Niblack, Sauvola and NICK methods. The results of the experiments show that the proposed method adapts well to all types of binarization challenges, can deal with higher numbers of binarization problems and boosts the overall performance of the binarization. 相似文献
3.
F.S. Brundick Ann E.M. Brodeen Malcolm S. Taylor 《International Journal on Document Analysis and Recognition》2002,4(3):170-176
In this paper we consider a statistical approach to augment a limited database of groundtruth documents for use in evaluation
of optical character recognition software. A modified moving-blocks bootstrap procedure is used to construct surrogate documents
for this purpose which prove to serve effectively and, in some regards, indistinguishably from groundtruth. The proposed method
is validated through a rigorous statistical procedure.
Received: March 30, 2000 / Revised: September 14, 2001 相似文献
4.
The design and analysis of multidimensional All-Partial-Sums (APS) algorithms are considered. We employ the sequence length as the performance measurement criterion for APS algorithms and corresponding thresholding methods, which is more sophisticated than asymptotic time complexity under the straight-line program computation model. With this criterion, we propose the piling algorithm to minimize the sequence length, then we show this algorithm is an optimal APS algorithm in commutative semigroups in the worst case. The experimental results also show the algorithmic efficiency of the piling algorithm. Furthermore, the theoretical works of APS algorithm will help to construct the higher dimensional thresholding methods. 相似文献
5.
Yevgen Biletskiy Author Vitae Girish R. Ranganathan Author Vitae 《Computers in Industry》2010,61(8):750-759
In enterprise firms, enormous amounts of electronic documents are generated by business analysts and other business domain application users. Applications that use these documents are often driven by business logic that is hard-coded together with application logic. One approach to the separation of business logic from applications is to create and maintain business and information extraction rules in an external, user-friendly format. The drawback of such an externalization is that the business rules, usually, do not have machine interpretable semantics. This situation often leads to misinterpretation of domain analysis documents, which can inhibit the productivity of computer-assisted analytical work and the effectiveness of business solutions. This paper proposes an ontology and rule-based framework for the development of business domain applications, which includes semantic processing of externalized business rules and to some extent externalization of application logic. The creation of external information extraction rules by the business analyst is a cumbersome and time consuming task. In order to overcome this problem, the framework also includes a rule learning system to semi-automate the generation of information extraction rules from source documents with the help of manual annotations. The main idea behind the work presented in this paper is to re-engineer very large enterprise information systems to adapt to Semantic Web computing techniques. The work presented in this paper is inspired by an industrial project. 相似文献
6.
This paper presents a novel, knowledge-based method for measuring semantic similarity in support of applications aimed at organizing and retrieving relevant textual information. We show how a quantitative context may be established for what is essentially qualitative in nature by effecting a topological transformation of the lexicon into a metric space where distance is well-defined. We illustrate the technique with a simple example and report on promising experimental results with a significant word similarity problem. 相似文献
7.
8.
《中国科学:信息科学(英文版)》2012,(3):626-637
The technology for high precision attitude determination when using low precision sensors is a key requirement of a modern small satellite. This paper presents a new attitude determination algorithm termed preprocess EKF(PP-EKF) based on preprocessing of the sensor data. It can enhance the overall modeling accuracy by using the quadratic penalty function to correct the dynamic model error and angular velocity error in realtime, based on the information fusion of the current and the past measurement information. The measurement model of the EKF is linearized by introducing the q method. The solution error of this process is also corrected to further improve the accuracy of the measurement model and make better use of measurement data from the low precision sensors, to obtain good attitude determination results. Finally, the simulation results demonstrate the high reliability and other advantages of the proposed algorithm. 相似文献
9.
10.
11.
This research analyzes the gene relationship according to their annotations. We present here a similar genes discovery system (SGDS), based upon semantic similarity measure of gene ontology (GO) and Entrez gene, to identify groups of similar genes. In order to validate the proposed measure, we analyze the relationships between similarity and expression correlation of pairs of genes. We explore a number of semantic similarity measures and compute the Pearson correlation coefficient. Highly correlated genes exhibit strong similarity in the ontology taxonomies. The results show that our proposed semantic similarity measure outperforms the others and seems better suited for use in GO. We use MAPK homogenous genes group and MAP kinase pathway as benchmarks to tune the parameters in our system for achieving higher accuracy. We applied the SGDS to RON and Lutheran pathways, the results show that it is able to identify a group of similar genes and to predict novel pathways based on a group of candidate genes. 相似文献
12.
A dynamic, nonlinear, multi-input multi-output application using the Recurrent Dynamic Neuron Network (RDNN) model is presented for a two-by-two distillation column case study. It is shown that the RDNN model, though compact (in terms of number of neurons and parameters to be estimated) performs well in both open- and closed-loop simulations. Open-loop simulations show that the RDNN is able to predict nonlinear output responses. The dual composition control problem is also investigated to demonstrate the model-based applications attainable with the RDNN. Due to the control affine nature of the RDNN structure, and the fact that it has finite vector relative degree, Input-Output Linearization techniques were used within the Internal Model Control framework for controller design. Nonlinear Model Predictive Control applications were also demonstrated using the RDNN. Simulations show that a combination of closed-loop and open-loop identification for the RDNN model results in a model-based controller which achieves robust closed-loop performance. 相似文献