期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning-based word spotting system for Arabic handwritten documents

Muna Khayyat Louisa Lam Ching Y. Suen 《Pattern recognition》2014

The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database. 相似文献

2.

A computer-based system to support forensic studies on handwritten documents

Katrin Franke Mario Köppen 《International Journal on Document Analysis and Recognition》2001,3(4):218-231

Computer-based forensic handwriting analysis requires sophisticated methods for the pre-processing of digitized paper documents, in order to provide high-quality digitized handwriting, which represents the original handwritten product as accurately as possible. Due to the requirement of processing a huge amount of different document types, neither a standardized queue of processing stages, fixed parameter sets nor fixed image operations are qualified for such pre-processing methods. Thus, we present an open layered framework that covers adaptation abilities at the parameter, operator, and algorithm levels. Moreover, an embedded module, which uses genetic programming, might generate specific filters for background removal on-the-fly. The framework is understood as an assistance system for forensic handwriting experts and has been in use by the Bundeskriminalamt, the federal police bureau in Germany, for two years. In the following, the layered framework will be presented, fundamental document-independent filters for textured, homogeneous background removal and for foreground removal will be described, as well as aspects of the implementation. Results of the framework-application will also be given. Received July 12, 2000 / Revised October 13, 2000 相似文献

3.

Attribute CNNs for word spotting in handwritten documents

Sebastian Sudholt Gernot A. Fink 《International Journal on Document Analysis and Recognition》2018,21(3):199-218

Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation (Almazán et al. in IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566, 2014). At their time, this influential method defined the state of the art in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with convolutional neural networks(CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functions for binary and real-valued word string embeddings. In addition, we propose two different CNN architectures, specifically designed for word spotting. These architectures are able to be trained in an end-to-end fashion. In a number of experiments, we investigate the influence of different word string embeddings and optimization strategies. We show our attribute CNNs to achieve state-of-the-art results for segmentation-based word spotting on a large variety of data sets. 相似文献

4.

A general approach for multi-oriented text line extraction of handwritten documents

Nazih Ouwayed Abdel Bela?d 《International Journal on Document Analysis and Recognition》2012,15(4):297-314

The multi-orientation occurs frequently in ancient handwritten documents, where the writers try to update a document by adding some annotations in the margins. Due to the margin narrowness, this gives rise to lines in different directions and orientations. Document recognition needs to find the lines everywhere they are written whatever their orientation. This is why we propose in this paper a new approach allowing us to extract the multi-oriented lines in scanned documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image meshing allowing us to progressively and locally determine the lines. Once the meshing is established, the orientation is determined using the Wigner–Ville distribution on the projection histogram profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterward, the text lines are extracted locally in each zone basing on the follow-up of the orientation lines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an accuracy of about 98.6%. 相似文献

5.

A deep HMM model for multiple keywords spotting in handwritten documents

Simon Thomas Clément Chatelain Laurent Heutte Thierry Paquet Yousri Kessentini 《Pattern Analysis & Applications》2015,18(4):1003-1015

相似文献

6.

A graph-based approach for segmenting touching lines in historical handwritten documents

David Fernández-Mota Josep Lladós Alicia Fornés 《International Journal on Document Analysis and Recognition》2014,17(3):293-312

Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data. 相似文献

7.

Text line detection in handwritten documents

G. Louloudis Author Vitae B. Gatos Author Vitae I. Pratikakis^{Author Vitae} 《Pattern recognition》2008,41(12):3758-3772

In this paper, we present a new text line detection method for handwritten documents. The proposed technique is based on a strategy that consists of three distinct steps. The first step includes image binarization and enhancement, connected component extraction, partitioning of the connected component domain into three spatial sub-domains and average character height estimation. In the second step, a block-based Hough transform is used for the detection of potential text lines while a third step is used to correct possible splitting, to detect text lines that the previous step did not reveal and, finally, to separate vertically connected characters and assign them to text lines. The performance evaluation of the proposed approach is based on a consistent and concrete evaluation methodology. 相似文献

8.

脱机手写维吾尔文本图像单词切分

下载免费PDF全文

阿依萨代提·阿卜力孜加合买提·司马义卡米力·木依丁艾斯卡尔·艾木都拉《计算机工程与应用》2018,54(9):133-138

针对脱机手写维吾尔文本行图像中单词切分问题,提出了FCM融合K-means的聚类算法。通过该算法得到单词内距离和单词间距离两种分类。以聚类结果为依据,对文字区域进行合并,得到切分点,再对切分点内的文字进行连通域标注,进行着色处理。以50幅不同的人书写的维吾尔脱机手写文本图像为实验对象,共有536行和4?002个单词,正确切分率达到80.68%。实验结果表明,该方法解决了手写维吾尔文在切分过程中,单词间距离不规律带来的切分困难的问题和一些单词间重叠的问题。同时实现了大篇幅手写文本图像的整体处理。相似文献

9.

A scale space approach for automatically segmenting words from historical handwritten documents 总被引：1，自引：0，他引：1

Manmatha R Rothfeder JL 《IEEE transactions on pattern analysis and machine intelligence》2005,27(8):1212-1225

相似文献

10.

Text box proposals for handwritten word spotting from documents

Ghosh Suman Valveny Ernest 《International Journal on Document Analysis and Recognition》2018,21(1-2):91-108

International Journal on Document Analysis and Recognition (IJDAR) - In this article, we propose a new approach to segmentation-free word spotting that is based on the combination of three... 相似文献

11.

Text line extraction from multi-skewed handwritten documents

S. Basu Author VitaeAuthor Vitae M. Kundu Author Vitae Author Vitae D.K. Basu Author Vitae 《Pattern recognition》2007,40(6):1825-1839

A novel text line extraction technique is presented for multi-skewed document images of handwritten English or Bengali text. It assumes that hypothetical water flows, from both left and right sides of the image frame, face obstruction from characters of text lines. The stripes of areas left unwetted on the image frame are finally labelled for extraction of text lines. The success rate of the technique, as observed experimentally, are 90.34% and 91.44% for handwritten Bengali and English document images, respectively. The work may contribute significantly for the development of applications related to optical character recognition of Bengali/English text. 相似文献

12.

A synthesised word approach to word retrieval in handwritten documents

Y. Liang M.C. Fairhurst R.M. Guest 《Pattern recognition》2012,45(12):4225-4236

相似文献

13.

Script-independent text line segmentation in freestyle handwritten documents

Li Y Zheng Y Doermann D Jaeger S Li Y 《IEEE transactions on pattern analysis and machine intelligence》2008,30(8):1313-1329

相似文献

14.

Text line and word segmentation of handwritten documents

G. Louloudis B. Gatos I. Pratikakis C. HalatsisAuthor vitae 《Pattern recognition》2009,42(12):3169-3183

In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and finally the efficient separation of vertically connected characters using a novel method based on skeletonization. Word segmentation is addressed as a two class problem. The distances between adjacent overlapped components in a text line are calculated using the combination of two distance metrics and each of them is categorized either as an inter- or an intra-word distance in a Gaussian mixture modeling framework. The performance of the proposed methodology is based on a consistent and concrete evaluation methodology that uses suitable performance measures in order to compare the text line segmentation and word segmentation results against the corresponding ground truth annotation. The efficiency of the proposed methodology is demonstrated by experimentation conducted on two different datasets: (a) on the test set of the ICDAR2007 handwriting segmentation competition and (b) on a set of historical handwritten documents. 相似文献

15.

Further explorations in text alignment with handwritten documents

E. Micah Kornfield R. Manmatha James Allan 《International Journal on Document Analysis and Recognition》2007,10(1):39-52

相似文献

16.

Word matching using single closed contours for indexing handwritten historical documents

Tomasz Adamek Noel E. O’Connor Alan F. Smeaton 《International Journal on Document Analysis and Recognition》2007,9(2-4):153-165

相似文献

17.

Automatic writer identification framework for online handwritten documents using character prototypes

Guo Xian Tan Christian Viard-Gaudin Alex C. KotAuthor vitae 《Pattern recognition》2009,42(12):3313-3323

This paper proposes an automatic text-independent writer identification framework that integrates an industrial handwriting recognition system, which is used to perform an automatic segmentation of an online handwritten document at the character level. Subsequently, a fuzzy c-means approach is adopted to estimate statistical distributions of character prototypes on an alphabet basis. These distributions model the unique handwriting styles of the writers. The proposed system attained an accuracy of 99.2% when retrieved from a database of 120 writers. The only limitation is that a minimum length of text needs to be present in the document in order for sufficient accuracy to be achieved. We have found that this minimum length of text is about 160 characters or approximately equivalent to 3 lines of text. In addition, the discriminative power of different alphabets on the accuracy is also reported. 相似文献

18.

A model for the gray-intensity distribution of historical handwritten documents and its application for binarization

Marte A. Ramírez-Ortegón Lilia L. Ramírez-Ramírez Ines Ben Messaoud Volker Märgner Erik Cuevas Raúl Rojas 《International Journal on Document Analysis and Recognition》2014,17(2):139-160

In this article, our goal is to describe mathematically and experimentally the gray-intensity distributions of the fore- and background of handwritten historical documents. We propose a local pixel model to explain the observed asymmetrical gray-intensity histograms of the fore- and background. Our pixel model states that, locally, the gray-intensity histogram is the mixture of gray-intensity distributions of three pixel classes. Following our model, we empirically describe the smoothness of the background for different types of images. We show that our model has potential application in binarization. Assuming that the parameters of the gray-intensity distributions are correctly estimated, we show that thresholding methods based on mixtures of lognormal distributions outperform thresholding methods based on mixtures of normal distributions. Our model is supported with experimental tests that are conducted with extracted images from DIBCO 2009 and H-DIBCO 2010 benchmarks. We also report results for all four DIBCO benchmarks. 相似文献

19.

HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents

Zaher Al Aghbari Salama Brook 《Expert systems with applications》2009,36(8):10942-10951

相似文献

20.

Line extraction in handwritten documents via instance segmentation

Islam Adeela Anjum Tayaba Khan Nazar 《International Journal on Document Analysis and Recognition》2023,26(3):335-346

International Journal on Document Analysis and Recognition (IJDAR) - Extraction of text lines from handwritten document images is important for downstream text recognition tasks. It is challenging... 相似文献