期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

OCR binarization and image pre-processing for searching historical documents

Maya R. Gupta Nathaniel P. Jacobson Eric K. Garcia 《Pattern recognition》2007,40(2):389-397

We consider the problem of document binarization as a pre-processing step for optical character recognition (OCR) for the purpose of keyword search of historical printed documents. A number of promising techniques from the literature for binarization, pre-filtering, and post-binarization denoising were implemented along with newly developed methods for binarization: an error diffusion binarization, a multiresolutional version of Otsu's binarization, and denoising by despeckling. The OCR in the ABBYY FineReader 7.1 SDK is used as a black box metric to compare methods. Results for 12 pages from six newspapers of differing quality show that performance varies widely by image, but that the classic Otsu method and Otsu-based methods perform best on average. 相似文献

2.

An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows 总被引：1，自引：0，他引：1

Bilal Bataineh Siti Norul Huda Sheikh Abdullah Khairuddin Omar 《Pattern recognition letters》2011,32(14):1805-1813

Binary image representation is essential format for document analysis. In general, different available binarization techniques are implemented for different types of binarization problems. The majority of binarization techniques are complex and are compounded from filters and existing operations. However, the few simple thresholding methods available cannot be applied to many binarization problems. In this paper, we propose a local binarization method based on a simple, novel thresholding method with dynamic and flexible windows. The proposed method is tested on selected samples called the DIBCO 2009 benchmark dataset using specialized evaluation techniques for binarization processes. To evaluate the performance of our proposed method, we compared it with the Niblack, Sauvola and NICK methods. The results of the experiments show that the proposed method adapts well to all types of binarization challenges, can deal with higher numbers of binarization problems and boosts the overall performance of the binarization. 相似文献

3.

A statistical approach to the generation of a database for evaluating OCR software

F.S. Brundick Ann E.M. Brodeen Malcolm S. Taylor 《International Journal on Document Analysis and Recognition》2002,4(3):170-176

In this paper we consider a statistical approach to augment a limited database of groundtruth documents for use in evaluation of optical character recognition software. A modified moving-blocks bootstrap procedure is used to construct surrogate documents for this purpose which prove to serve effectively and, in some regards, indistinguishably from groundtruth. The proposed method is validated through a rigorous statistical procedure. Received: March 30, 2000 / Revised: September 14, 2001 相似文献

4.

The optimal All-Partial-Sums algorithm in commutative semigroups and its applications for image thresholding segmentation

Xie Xie Jiu-Lun Fan 《Theoretical computer science》2011,412(15):1419-1433

The design and analysis of multidimensional All-Partial-Sums (APS) algorithms are considered. We employ the sequence length as the performance measurement criterion for APS algorithms and corresponding thresholding methods, which is more sophisticated than asymptotic time complexity under the straight-line program computation model. With this criterion, we propose the piling algorithm to minimize the sequence length, then we show this algorithm is an optimal APS algorithm in commutative semigroups in the worst case. The experimental results also show the algorithmic efficiency of the piling algorithm. Furthermore, the theoretical works of APS algorithm will help to construct the higher dimensional thresholding methods. 相似文献

5.

A semantic approach to a framework for business domain software systems

Yevgen Biletskiy^{Author Vitae} Girish R. Ranganathan Author Vitae 《Computers in Industry》2010,61(8):750-759

In enterprise firms, enormous amounts of electronic documents are generated by business analysts and other business domain application users. Applications that use these documents are often driven by business logic that is hard-coded together with application logic. One approach to the separation of business logic from applications is to create and maintain business and information extraction rules in an external, user-friendly format. The drawback of such an externalization is that the business rules, usually, do not have machine interpretable semantics. This situation often leads to misinterpretation of domain analysis documents, which can inhibit the productivity of computer-assisted analytical work and the effectiveness of business solutions. This paper proposes an ontology and rule-based framework for the development of business domain applications, which includes semantic processing of externalized business rules and to some extent externalization of application logic. The creation of external information extraction rules by the business analyst is a cumbersome and time consuming task. In order to overcome this problem, the framework also includes a rule learning system to semi-automate the generation of information extraction rules from source documents with the help of manual annotations. The main idea behind the work presented in this paper is to re-engineer very large enterprise information systems to adapt to Semantic Web computing techniques. The work presented in this paper is inspired by an industrial project. 相似文献

6.

A method for computing lexical semantic distance using linear functionals

Del Jensen Christophe Giraud-Carrier Nathan Davis 《Journal of Web Semantics》2008,6(2):99-108

This paper presents a novel, knowledge-based method for measuring semantic similarity in support of applications aimed at organizing and retrieving relevant textual information. We show how a quantitative context may be established for what is essentially qualitative in nature by effecting a topological transformation of the lexicon into a metric space where distance is well-defined. We illustrate the technique with a simple example and report on promising experimental results with a significant word similarity problem. 相似文献

7.

A paradigm for semantic picture recognition

Michael L. Baird Michael D. Kelly 《Pattern recognition》1974,6(1):61-74

相似文献

8.

An algorithm for high precision attitude determination when using low precision sensors

《中国科学:信息科学(英文版)》2012,(3):626-637

The technology for high precision attitude determination when using low precision sensors is a key requirement of a modern small satellite. This paper presents a new attitude determination algorithm termed preprocess EKF(PP-EKF) based on preprocessing of the sensor data. It can enhance the overall modeling accuracy by using the quadratic penalty function to correct the dynamic model error and angular velocity error in realtime, based on the information fusion of the current and the past measurement information. The measurement model of the EKF is linearized by introducing the q method. The solution error of this process is also corrected to further improve the accuracy of the measurement model and make better use of measurement data from the low precision sensors, to obtain good attitude determination results. Finally, the simulation results demonstrate the high reliability and other advantages of the proposed algorithm. 相似文献

9.

An algorithm for high determination when using precision attitude low precision sensors

CAO Lu CHEN XiaoQian SHENG Tao 《中国科学:信息科学(英文版)》2012,(3):626-637

相似文献

10.

基于DSP的高精度测频方法与软件设计

薛海东郭迎清杜玉环张小栋丁毅《传感器与微系统》2016,(1):117-120

提出了一种基于数字信号处理器(DSP)的测频方法,用于光纤涡轮流量计转子叶片频率的测量.在简要分析了光纤涡轮流量计的工作原理的基础上,设计出了光纤涡轮流量计测量系统软件的硬件电路平台;阐述了利用事件管理器通用定时器实现高精度数据采集方法的设计与DSP实现;分析了高稳定性、实时性的FIR滤波算法与DSP实现;讨论了利用插值方法改进FFT算法实现高精度测频的DSP实现;利用通用定时器的比较操作来产生脉宽调制(PWM)波,实现TTL电平输出. 相似文献

11.

Jung-Hsien Chiang Shing-Hua Ho Wen-Hung Wang 《Expert systems with applications》2008,35(3):1115-1121

This research analyzes the gene relationship according to their annotations. We present here a similar genes discovery system (SGDS), based upon semantic similarity measure of gene ontology (GO) and Entrez gene, to identify groups of similar genes. In order to validate the proposed measure, we analyze the relationships between similarity and expression correlation of pairs of genes. We explore a number of semantic similarity measures and compute the Pearson correlation coefficient. Highly correlated genes exhibit strong similarity in the ontology taxonomies. The results show that our proposed semantic similarity measure outperforms the others and seems better suited for use in GO. We use MAPK homogenous genes group and MAP kinase pathway as benchmarks to tune the parameters in our system for achieving higher accuracy. We applied the SGDS to RON and Lutheran pathways, the results show that it is able to identify a group of similar genes and to predict novel pathways based on a group of candidate genes. 相似文献

12.

Multivariable nonlinear control applications for a high purity distillation column using a recurrent dynamic neuron model

Andre M. Shaw Francis J. Doyle III 《Journal of Process Control》1997,7(4):255-268

A dynamic, nonlinear, multi-input multi-output application using the Recurrent Dynamic Neuron Network (RDNN) model is presented for a two-by-two distillation column case study. It is shown that the RDNN model, though compact (in terms of number of neurons and parameters to be estimated) performs well in both open- and closed-loop simulations. Open-loop simulations show that the RDNN is able to predict nonlinear output responses. The dual composition control problem is also investigated to demonstrate the model-based applications attainable with the RDNN. Due to the control affine nature of the RDNN structure, and the fact that it has finite vector relative degree, Input-Output Linearization techniques were used within the Internal Model Control framework for controller design. Nonlinear Model Predictive Control applications were also demonstrated using the RDNN. Simulations show that a combination of closed-loop and open-loop identification for the RDNN model results in a model-based controller which achieves robust closed-loop performance. 相似文献