期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A probabilistic method for keyword retrieval in handwritten document images

Huaigu Cao Anurag Bhardwaj Venu GovindarajuAuthor vitae 《Pattern recognition》2009,42(12):3374-3382

相似文献

2.

Handwritten text separation from annotated machine printed documents using Markov Random Fields

Xujun Peng Srirangaraj Setlur Venu Govindaraju Ramachandrula Sitaram 《International Journal on Document Analysis and Recognition》2013,16(1):1-16

The convenience of search, both on the personal computer hard disk as well as on the web, is still limited mainly to machine printed text documents and images because of the poor accuracy of handwriting recognizers. The focus of research in this paper is the segmentation of handwritten text and machine printed text from annotated documents sometimes referred to as the task of “ink separation” to advance the state-of-art in realizing search of hand-annotated documents. We propose a method which contains two main steps—patch level separation and pixel level separation. In the patch level separation step, the entire document is modeled as a Markov Random Field (MRF). Three different classes (machine printed text, handwritten text and overlapped text) are initially identified using G-means based classification followed by a MRF based relabeling procedure. A MRF based classification approach is then used to separate overlapped text into machine printed text and handwritten text using pixel level features forming the second step of the method. Experimental results on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment show that our method is robust and provides good text separation performance. 相似文献

3.

Machine printed text and handwriting identification in noisy document images 总被引：1，自引：0，他引：1

Zheng Y Li H Doermann D 《IEEE transactions on pattern analysis and machine intelligence》2004,26(3):337-353

In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections. 相似文献

4.

Impact of online handwriting recognition performance on text categorization

Sebastián Peña Saldarriaga Christian Viard-Gaudin Emmanuel Morin 《International Journal on Document Analysis and Recognition》2010,13(2):159-171

Today, there is an increasing demand of efficient archival and retrieval methods for online handwritten data. For such tasks, text categorization is of particular interest. The textual data available in online documents can be extracted through online handwriting recognition; however, this process produces errors in the resulting text. This work reports experiments on the categorization of online handwritten documents based on their textual contents. We analyze the effect of word recognition errors on the categorization performances, by comparing the performances of a categorization system with the texts obtained through online handwriting recognition and the same texts available as ground truth. Two well-known categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2,000 handwritten documents has been collected for this study. Results show that classification rate loss is not significant, and precision loss is only significant for recall values of 60–80% depending on the noise levels. 相似文献

5.

Keyword spotting in doctor's handwriting on medical prescriptions

《Expert systems with applications》2017

相似文献

6.

Retrieval of online handwriting by synthesis and matching

C.V. Jawahar^{Author Vitae} A. Balasubramanian Author VitaeAuthor Vitae Anoop M. Namboodiri Author Vitae 《Pattern recognition》2009,42(7):1445-1457

Search and retrieval is gaining importance in the ink domain due to the increase in the availability of online handwritten data. However, the problem is challenging due to variations in handwriting between various writers, digitizers and writing conditions. In this paper, we propose a retrieval mechanism for online handwriting, which can handle different writing styles, specifically for Indian languages. The proposed approach provides a keyboard-based search interface that enables to search handwritten data from any platform, in addition to pen-based and example-based queries. One of the major advantages of this framework is that information retrieval techniques such as ranking relevance, detecting stopwords and controlling word forms can be extended to work with search and retrieval in the ink domain. The framework also allows cross-lingual document retrieval across Indian languages. 相似文献

7.

Text line and word segmentation of handwritten documents

G. Louloudis B. Gatos I. Pratikakis C. HalatsisAuthor vitae 《Pattern recognition》2009,42(12):3169-3183

In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and finally the efficient separation of vertically connected characters using a novel method based on skeletonization. Word segmentation is addressed as a two class problem. The distances between adjacent overlapped components in a text line are calculated using the combination of two distance metrics and each of them is categorized either as an inter- or an intra-word distance in a Gaussian mixture modeling framework. The performance of the proposed methodology is based on a consistent and concrete evaluation methodology that uses suitable performance measures in order to compare the text line segmentation and word segmentation results against the corresponding ground truth annotation. The efficiency of the proposed methodology is demonstrated by experimentation conducted on two different datasets: (a) on the test set of the ICDAR2007 handwriting segmentation competition and (b) on a set of historical handwritten documents. 相似文献

8.

Learning-based word spotting system for Arabic handwritten documents

Muna Khayyat Louisa Lam Ching Y. Suen 《Pattern recognition》2014

The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database. 相似文献

9.

Automatic recognition of handwritten medical forms for search engines

Robert Jay Milewski Venu Govindaraju Anurag Bhardwaj 《International Journal on Document Analysis and Recognition》2009,11(4):203-218

A new paradigm, which models the relationships between handwriting and topic categories, in the context of medical forms, is presented. The ultimate goals are: (1) a robust method which categorizes medical forms into specified categories, and (2) the use of such information for practical applications such as an improved recognition of medical handwriting or retrieval of medical forms as in a search engine. Medical forms have diverse, complex and large lexicons consisting of English, Medical and Pharmacology corpus. Our technique shows that a few recognized characters, returned by handwriting recognition, can be used to construct a linguistic model capable of representing a medical topic category. This allows (1) a reduced lexicon to be constructed, thereby improving handwriting recognition performance, and (2) PCR (Pre-Hospital Care Report) forms to be tagged with a topic category and subsequently searched by information retrieval systems. We present an improvement of over 7% in raw recognition rate and a mean average precision of 0.28 over a set of 1,175 queries on a data set of unconstrained handwritten medical forms filled in emergency environments. This work was supported by the National Science Foundation. 相似文献

10.

TOPIC DETECTION OF UNRESTRICTED TEXTS: APPROACHES AND EVALUATIONS

Yllias Chali 《Applied Artificial Intelligence》2013,27(2):119-135

ABSTRACT

Topic detection and tracking refers to automatic techniques for locating topically related cohesive paragraphs in a stream of text. Most documents are about more than one subject, but many Natural Language Processing (NLP) and Information Retrieval (IR) techniques implicitly assume documents have just one topic. Even in the presence of a single topic within a document, the document may address multiple subtopics and various aspects of the primary topic. Hence, dividing documents into topically coherent units and discovering their topic might have many uses. We describe new clues that account for the topic of grouping of contiguous portions of the text. Those clues are based on general lexical resources, which make them applicable to unrestricted texts, and can have many uses such as helping users find answers to general questions in an information search task, or in question/answering systems, or in text summarization. We devise an algorithm for identifying these clues, and we report on the performance of these clues, as well as the improvements suggested by our experiments. 相似文献

11.

Further explorations in text alignment with handwritten documents

E. Micah Kornfield R. Manmatha James Allan 《International Journal on Document Analysis and Recognition》2007,10(1):39-52

相似文献

12.

Handwritten Chinese text editing and recognition system

Shusen Zhou Qingcai Chen Xiaolong Wang 《Multimedia Tools and Applications》2014,71(3):1363-1380

This paper describes a handwritten Chinese text editing and recognition system that can edit handwritten text and recognize it with a client-server mode. First, the client end samples and redisplays the handwritten text by using digital ink technics, segments handwritten characters, edits them and saves original handwritten information into a self-defined document. The self-defined document saves coordinates of all sampled points of handwriting characters. Second, the server recognizes handwritten document based on the proposed Gabor feature extraction and affinity propagation clustering (GFAP) method, and returns the recognition results to client end. Moreover, the server can also collect the labeled handwritten characters and fine tune the recognizer automatically. Experimental results on HIT-OR3C database show that our handwriting recognition method improves the recognition performance remarkably. 相似文献

13.

SVM-based writer retrieval system in handwritten document images

Bouibed Mohamed Lamine Nemmour Hassiba Chibani Youcef 《Multimedia Tools and Applications》2022,81(16):22629-22651

相似文献

14.

HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents

Zaher Al Aghbari Salama Brook 《Expert systems with applications》2009,36(8):10942-10951

相似文献

15.

Supporting electronic ink databases

《Information Systems》1999,24(4):303-326

The emergence of the pen as the main interface device for personal digital assistants and pen-computers has made handwritten text, and more generally ink, a first-class object. As for any other type of data, the need of retrieval is a prevailing one. Retrieval of handwritten text is more difficult than that of conventional data since it is necessary to identify a handwritten word given slightly different variations in its shape. The current way of addressing this is by using handwriting recognition, which is prone to errors and limits the expressiveness of ink. Alternatively, one can retrieve from the database handwritten words that are similar to a query handwritten word using techniques borrowed from pattern and speech recognition. In this paper, an indexing technique based on Hidden Markov Models is proposed. Its implementation and its performance is reported in this paper. 相似文献

16.

A novel word spotting method based on recurrent neural networks

Frinken V Fischer A Manmatha R Bunke H 《IEEE transactions on pattern analysis and machine intelligence》2012,34(2):211-224

相似文献

17.

Hermite and Gabor transforms for noise reduction and handwriting classification in ancient manuscripts

Véronique Eglin Stéphane Bres Carlos Rivero 《International Journal on Document Analysis and Recognition》2007,9(2-4):101-122

相似文献

18.

Using topic models for OCR correction

Faisal Farooq Anurag Bhardwaj Venu Govindaraju 《International Journal on Document Analysis and Recognition》2009,12(3):153-164

Despite several decades of research in document analysis, recognition of unconstrained handwritten documents is still considered a challenging task. Previous research in this area has shown that word recognizers perform adequately on constrained handwritten documents which typically use a restricted vocabulary (lexicon). But in the case of unconstrained handwritten documents, state-of-the-art word recognition accuracy is still below the acceptable limits. The objective of this research is to improve word recognition accuracy on unconstrained handwritten documents by applying a post-processing or OCR correction technique to the word recognition output. In this paper, we present two different methods for this purpose. First, we describe a lexicon reduction-based method by topic categorization of handwritten documents which is used to generate smaller topic-specific lexicons for improving the recognition accuracy. Second, we describe a method which uses topic-specific language models and a maximum-entropy based topic categorization model to refine the recognition output. We present the relative merits of each of these methods and report results on the publicly available IAM database. 相似文献

19.

Voting techniques for expert search 总被引：4，自引：2，他引：2

Craig Macdonald Iadh Ounis 《Knowledge and Information Systems》2008,16(3):259-280

In an expert search task, the users’ need is to identify people who have relevant expertise to a topic of interest. An expert search system predicts and ranks the expertise of a set of candidate persons with respect to the users’ query. In this paper, we propose a novel approach for predicting and ranking candidate expertise with respect to a query, called the Voting Model for Expert Search. In the Voting Model, we see the problem of ranking experts as a voting problem. We model the voting problem using 12 various voting techniques, which are inspired from the data fusion field. We investigate the effectiveness of the Voting Model and the associated voting techniques across a range of document weighting models, in the context of the TREC 2005 and TREC 2006 Enterprise tracks. The evaluation results show that the voting paradigm is very effective, without using any query or collection-specific heuristics. Moreover, we show that improving the quality of the underlying document representation can significantly improve the retrieval performance of the voting techniques on an expert search task. In particular, we demonstrate that applying field-based weighting models improves the ranking of candidates. Finally, we demonstrate that the relative performance of the voting techniques for the proposed approach is stable on a given task regardless of the used weighting models, suggesting that some of the proposed voting techniques will always perform better than other voting techniques. Extended version of ‘Voting for candidates: adapting data fusion techniques for an expert search task’. C. Macdonald and I. Ounis. In Proceedings of ACM CIKM 2006, Arlington, VA. 2006. doi: 10.1145/1183614.1183671. 相似文献

20.

CLOVIS: towards precision-oriented text-based video retrieval through the unification of automatically-extracted concepts and relations of the visual and audio/speech contents

M. Belkhatir 《Journal of Intelligent Information Systems》2010,34(2):135-175

相似文献