首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Information Systems》1999,24(4):303-326
The emergence of the pen as the main interface device for personal digital assistants and pen-computers has made handwritten text, and more generally ink, a first-class object. As for any other type of data, the need of retrieval is a prevailing one. Retrieval of handwritten text is more difficult than that of conventional data since it is necessary to identify a handwritten word given slightly different variations in its shape. The current way of addressing this is by using handwriting recognition, which is prone to errors and limits the expressiveness of ink. Alternatively, one can retrieve from the database handwritten words that are similar to a query handwritten word using techniques borrowed from pattern and speech recognition. In this paper, an indexing technique based on Hidden Markov Models is proposed. Its implementation and its performance is reported in this paper.  相似文献   

2.
The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database.  相似文献   

3.
Offline handwritten Amharic word recognition   总被引:1,自引:0,他引:1  
This paper describes two approaches for Amharic word recognition in unconstrained handwritten text using HMMs. The first approach builds word models from concatenated features of constituent characters and in the second method HMMs of constituent characters are concatenated to form word model. In both cases, the features used for training and recognition are a set of primitive strokes and their spatial relationships. The recognition system does not require segmentation of characters but requires text line detection and extraction of structural features, which is done by making use of direction field tensor. The performance of the recognition system is tested by a dataset of unconstrained handwritten documents collected from various sources, and promising results are obtained.  相似文献   

4.
目的 维吾尔文属于黏着性语言,其组成方式是在词干上添加词缀来实现不同的语义,在添加词缀的过程中词干的尾部会发生一定的形态变化,而且词干添加词缀的时候也可能会发生弱化、脱落、增音等音变现象导致进一步的形态变化,所以利用目前的图像文字检索(word spotting)技术只能检索到某一具体的维吾尔文词汇,却不能以某一词干为检索词,检索出其对应的带后缀的词语。为此,提出了基于映射关系的带后缀印刷体维吾尔文词语检索技术。方法 首先利用局部特征对维吾尔文词图像进行特征提取,其次将获得的特征用快速最近邻搜索(fast library for approximate nearest neighbors,FLANN)双向匹配来获得特征匹配集,最后将特征匹配集进行单应性变换和透视变换到待检索维吾尔文词图像上,把特征匹配集转化为空间关系,经过映射匹配对特征匹配集的空间关系进行后缀词检索,从而实现印刷体维吾尔文图像带后缀词检索的需求。结果 实验数据选取190幅维吾尔文印刷体文本图像中的17 648幅切割词图像,并对其中30幅词图像的167幅后缀词图像进行后缀检索,采用不同的局部特征算法进行后缀检索对比,结果表明,尺度不变特征变换(scale-invariant feature transform,SIFT)算法的后缀检索效果优于SURF(speeded up robust features)算法,精确率和召回率分别达到了94.23%和88.02%,在印刷体文档图像中,可以高效地检索到词干组成的后缀词,能够满足用户的不同检索需求,具有普适性。在弱化、脱落、增音和多种音变同时出现以及词干尾部发生变化的不同情况下进行后缀检索对比实验,实验结果表明在弱化和词干尾部变化而导致的形态变化中,检索效率最佳。结论 本文提出的基于映射关系进行后缀词图像检索的方法,是第一次对维吾尔文带后缀词检索方式的一种实现,利用匹配集之间的空间关系,对维吾尔文带后缀词图像实现了高效检索的目的。  相似文献   

5.
6.
7.
Because of large variations involved in handwritten words, the recognition problem is very difficult. Hidden Markov models (HMM) have been widely and successfully used in speech processing and recognition. Recently HMM has also been used with some success in recognizing handwritten words with presegmented letters. In this paper, a complete scheme for totally unconstrained handwritten word recognition based on a single contextual hidden Markov model type stochastic network is presented. Our scheme includes a morphology and heuristics based segmentation algorithm, a training algorithm that can adapt itself with the changing dictionary, and a modified Viterbi algorithm which searches for the (l+1)th globally best path based on the previous l best paths. Detailed experiments are carried out and successful recognition results are reported  相似文献   

8.
For a segmentation and dynamic programming-based handwritten word recognition system, outlier rejection at the character level can improve word recognition performance because it reduces the chances that erroneous combinations of segments result in high word confidence values. We studied the multilayer perceptron (MLP) and a variant of radial basis function network (RBF) with the goal to use them as character level classifiers that have enhanced outlier rejection ability. The variant of the RBF uses principal component analysis (PCA) on the clusters defined by the nodes in the hidden layer. It was also trained with and without a regularization term that was aimed at minimizing the variances of the nodes in the hidden layer. Our experiments on handwritten word recognition showed: (1) In the case of MLPs, using more hidden nodes than that required for classification and including outliers in the training data can improve outlier rejection performance; (2) in the case of PCA-RBFs, training with the regularization term and no outlier can achieve performance very close to training with outliers. These results are both interesting. Result (1) is of interest because it is well known that minimizing the number of parameters, and therefore keeping the number of hidden units low, should increase the generalization capability. On the other hand, using more hidden units increases the chances of creating closed decision regions, as predicted by the theory in Gori and Scarselli (IEEE Trans. PAMI 20 (11) (1998) 1121). Result (2) is a strong statement in support of the use of regularization terms for the training of RBF-type neural networks in problems such as handwriting recognition for which outlier rejection is important. Additional tests on combining MLPs and PCA-RBF networks showed the potential to improve word recognition performance by exploiting the complementarity of these two kinds of neural networks.  相似文献   

9.
In this paper a method for analytic handwritten word recognition based on causal Markov random fields is described. The word models are hmms where each state corresponds to a letter modeled by a nshp-hmm (Markov field). The word models are built dynamically. Training is operated using Baum-Welch algorithm where the parameters are reestimated on the generated word models. The segmentation is unnecessary: the system determines itself during training the best repartition of the information within the letter models. First experiments on two real databases of French check amount words give very encouraging results up to 86% for recognition without rejection. Received: March 31, 2000 / Accepted: January 9, 2002  相似文献   

10.
11.
A lexicon-based, handwritten word recognition system combining segmentation-free and segmentation-based techniques is described. The segmentation-free technique constructs a continuous density hidden Markov model for each lexicon string. The segmentation-based technique uses dynamic programming to match word images and strings. The combination module uses differences in classifier capabilities to achieve significantly better performance  相似文献   

12.
Maximal word functions occur in data retrieval applications and have connections with ranking problems, which in turn were first investigated in relation to data compression [21]. By the maximal word function of a languageL *, we mean the problem of finding, on inputx, the lexicographically largest word belonging toL that is smaller than or equal tox.In this paper we present a parallel algorithm for computing maximal word functions for languages recognized by one-way nondeterministic auxiliary pushdown automata (and hence for the class of context-free languages).This paper is a continuation of a stream of research focusing on the problem of identifying properties others than membership which are easily computable for certain classes of languages. For a survey, see [24].  相似文献   

13.
With the ever-increasing growth of the World Wide Web, there is an urgent need for an efficient information retrieval system that can search and retrieve handwritten documents when presented with user queries. However, unconstrained handwriting recognition remains a challenging task with inadequate performance thus proving to be a major hurdle in providing robust search experience in handwritten documents. In this paper, we describe our recent research with focus on information retrieval from noisy text derived from imperfect handwriting recognizers. First, we describe a novel term frequency estimation technique incorporating the word segmentation information inside the retrieval framework to improve the overall system performance. Second, we outline a taxonomy of different techniques used for addressing the noisy text retrieval task. The first method uses a novel bootstrapping mechanism to refine the OCR’ed text and uses the cleaned text for retrieval. The second method uses the uncorrected or raw OCR’ed text but modifies the standard vector space model for handling noisy text issues. The third method employs robust image features to index the documents instead of using noisy OCR’ed text. We describe these techniques in detail and also discuss their performance measures using standard IR evaluation metrics.  相似文献   

14.
In off-line handwriting recognition, classifiers based on hidden Markov models (HMMs) have become very popular. However, while there exist well-established training algorithms which optimize the transition and output probabilities of a given HMM architecture, the architecture itself, and in particular the number of states, must be chosen “by hand”. Also the number of training iterations and the output distributions need to be defined by the system designer. In this paper we examine several optimization strategies for an HMM classifier that works with continuous feature values. The proposed optimization strategies are evaluated in the context of a handwritten word recognition task.  相似文献   

15.
In this paper, we study the effect of taking the user into account in a query-by-example handwritten word spotting framework. Several off-the-shelf query fusion and relevance feedback strategies have been tested in the handwritten word spotting context. The increase in terms of precision when the user is included in the loop is assessed using two datasets of historical handwritten documents and two baseline word spotting approaches both based on the bag-of-visual-words model. We finally present two alternative ways of presenting the results to the user that might be more attractive and suitable to the user's needs than the classic ranked list.  相似文献   

16.
Handwritten digit recognition has long been a challenging problem in the field of optical character recognition and of great importance in industry. This paper develops a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve performance of isolated Farsi/Arabic handwritten digit recognition, we use Bag of Visual Words (BoVW) technique to construct images feature vectors. Each visual word is described by Scale Invariant Feature Transform (SIFT) method. For learning feature vectors, Quantum Neural Networks (QNN) classifier is used. Experimental results on a very popular Farsi/Arabic handwritten digit dataset (HODA dataset) show that proposed method can achieve the highest recognition rate compared to other state of the arts methods.  相似文献   

17.
Search and retrieval is gaining importance in the ink domain due to the increase in the availability of online handwritten data. However, the problem is challenging due to variations in handwriting between various writers, digitizers and writing conditions. In this paper, we propose a retrieval mechanism for online handwriting, which can handle different writing styles, specifically for Indian languages. The proposed approach provides a keyboard-based search interface that enables to search handwritten data from any platform, in addition to pen-based and example-based queries. One of the major advantages of this framework is that information retrieval techniques such as ranking relevance, detecting stopwords and controlling word forms can be extended to work with search and retrieval in the ink domain. The framework also allows cross-lingual document retrieval across Indian languages.  相似文献   

18.
The existing margin-based discriminant analysis methods such as nonparametric discriminant analysis use K-nearest neighbor (K-NN) technique to characterize the margin. The manifold learning–based methods use K-NN technique to characterize the local structure. These methods encounter a common problem, that is, the nearest neighbor parameter K should be chosen in advance. How to choose an optimal K is a theoretically difficult problem. In this paper, we present a new margin characterization method named sparse margin–based discriminant analysis (SMDA) using the sparse representation. SMDA can successfully avoid the difficulty of parameter selection. Sparse representation can be considered as a generalization of K-NN technique. For a test sample, it can adaptively select the training samples that give the most compact representation. We characterize the margin by sparse representation. The proposed method is evaluated by using AR, Extended Yale B database, and the CENPARMI handwritten numeral database. Experimental results show the effectiveness of the proposed method; its performance is better than some other state-of-the-art feature extraction methods.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号