共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper presents a new knowledge-based system for extracting and identifying text-lines from various real-life mixed text/graphics compound document images. The proposed system first decomposes the document image into distinct object planes to separate homogeneous objects, including textual regions of interest, non-text objects such as graphics and pictures, and background textures. A knowledge-based text extraction and identification method obtains the text-lines with different characteristics in each plane. The proposed system offers high flexibility and expandability by merely updating new rules to cope with various types of real-life complex document images. Experimental and comparative results prove the effectiveness of the proposed knowledge-based system and its advantages in extracting text-lines with a large variety of illumination levels, sizes, and font styles from various types of mixed and overlapping text/graphics complex compound document images. 相似文献
2.
Goh Wee Leng
D. P. Mital
Tay Sze Yong
Tan Kok Kang
《Engineering Applications of Artificial Intelligence》1994,7(6):639-651To efficiently store the information found in paper documents, text and non-text regions need to be separated. Non-text regions include half-tone photographs and line diagrams. The text regions can be converted (via an optical character reader) to a computer-searchable form, and the non-text regions can be extracted and preserved in compressed form using image-compression algorithms. In this paper, an effective system for automatically segmenting a document image into regions of text and non-text is proposed. The system first performs an adaptive thresholding to obtain a binarized image. Subsequently the binarized image is smeared using a run-length differential algorithm. The smeared image is then subjected to a text characteristic filter to remove error smearing of non-text regions. Next, baseline cumulative blocking is used to rectangularize the smeared region. Finally, a text block growing algorithm is used to block out a text sentence. The recognition of text is carried out on a text sentence basis. 相似文献
3.
为了满足办公自动化的实时性要求,本文提出了一种改进的自顶向下的图文分割算法。该方法利用文本行基线之间的距离自适应的确定结构元素的大小,克服自顶向下算法要求对页面有先验知识的缺点。实验表明,本文提出的算法分割准确,速度快。 相似文献
4.
Kuo-Chin FanAuthor Vitae Yuan-Kai WangAuthor VitaeTsann-Ran LayAuthor Vitae 《Pattern recognition》2002,35(11):2593-2611
Marginal noise is a common phenomenon in document analysis which results from the scanning of thick documents or skew documents. It usually appears in the front of a large and dark region around the margin of document images. Marginal noise might cover meaningful document objects, such as text, graphics and forms. The overlapping of marginal noise with meaningful objects makes it difficult to perform the task of segmentation and recognition of document objects. This paper proposes a novel approach to remove marginal noise. The proposed approach consists of two steps which are marginal noise detection and marginal noise deletion. Marginal noise detection will reduce an original document image into a smaller image, and then find marginal noise regions according to the shape length and location of the split blocks. After the detection of marginal noise regions, different removal methods are performed. A local thresholding method is proposed for the removal of marginal noise in gray-scale document images, whereas a region growing method is devised for binary document images. Experimenting with a wide variety of test samples reveals the feasibility and effectiveness of our proposed approach in removing marginal noises. 相似文献
5.
Vassilis Papavassiliou Author Vitae Themos Stafylakis Author Vitae Vassilis Katsouros Author Vitae 《Pattern recognition》2010,43(1):369-377
Two novel approaches to extract text lines and words from handwritten document are presented. The line segmentation algorithm is based on locating the optimal succession of text and gap areas within vertical zones by applying Viterbi algorithm. Then, a text-line separator drawing technique is applied and finally the connected components are assigned to text lines. Word segmentation is based on a gap metric that exploits the objective function of a soft-margin linear SVM that separates successive connected components. The algorithms tested on the benchmarking datasets of ICDAR07 handwriting segmentation contest and outperformed the participating algorithms. 相似文献
6.
S. Mandal S. P. Chowdhury A. K. Das Bhabatosh Chanda 《International Journal on Document Analysis and Recognition》2006,8(2-3):172-182
The requirement of detection and identification of tables from document images is crucial to any document image analysis and digital library system. In this paper we report a very simple but extremely powerful approach to detect tables present in document pages. The algorithm relies on the observation that the tables have distinct columns which implies that gaps between the fields are substantially larger than the gaps between the words in text lines. This deceptively simple observation has led to the design of a simple but powerful table detection system with low computation cost. Moreover, mathematical foundation of the approach is also established including formation of a regular expression for ease of implementation. 相似文献
7.
Nikolaos Stamatopoulos Basilis Gatos Stavros J. PerantonisAuthor vitae 《Pattern recognition》2009,42(12):3158-3168
Image segmentation is a major task of handwritten document image processing. Many of the proposed techniques for image segmentation are complementary in the sense that each of them using a different approach can solve different difficult problems such as overlapping, touching components, influence of author or font style etc. In this paper, a combination method of different segmentation techniques is presented. Our goal is to exploit the segmentation results of complementary techniques and specific features of the initial image so as to generate improved segmentation results. Experimental results on line segmentation methods for handwritten documents demonstrate the effectiveness of the proposed combination method. 相似文献
8.
Xiaonan Lu Saurabh Kataria William J. Brouwer James Z. Wang Prasenjit Mitra C. Lee Giles 《International Journal on Document Analysis and Recognition》2009,12(2):65-81
Authors use images to present a wide variety of important information in documents. For example, two-dimensional (2-D) plots
display important data in scientific publications. Often, end-users seek to extract this data and convert it into a machine-processible
form so that the data can be analyzed automatically or compared with other existing data. Existing document data extraction
tools are semi-automatic and require users to provide metadata and interactively extract the data. In this paper, we describe
a system that extracts data from documents fully automatically, completely eliminating the need for human intervention. The
system uses a supervised learning-based algorithm to classify figures in digital documents into five classes: photographs,
2-D plots, 3-D plots, diagrams, and others. Then, an integrated algorithm is used to extract numerical data from data points
and lines in the 2-D plot images along with the axes and their labels, the data symbols in the figure’s legend and their associated
labels. We demonstrate that the proposed system and its component algorithms are effective via an empirical evaluation. Our
data extraction system has the potential to be a vital component in high volume digital libraries. 相似文献
9.
10.
Reading text in natural images has focused again the attention of many researchers during the last few years due to the increasing availability of cheap image-capturing devices in low-cost products like mobile phones. Therefore, as text can be found on any environment, the applicability of text-reading systems is really extensive. For this purpose, we present in this paper a robust method to read text in natural images. It is composed of two main separated stages. Firstly, text is located in the image using a set of simple and fast-to-compute features highly discriminative between character and non-character objects. They are based on geometric and gradient properties. The second part of the system carries out the recognition of the previously detected text. It uses gradient features to recognize single characters and Dynamic Programming (DP) to correct misspelled words. Experimental results obtained with different challenging datasets show that the proposed system exceeds state-of-the-art performance, both in terms of localization and recognition. 相似文献
11.
Luke A. D. Hutchison William A. Barrett 《International Journal on Document Analysis and Recognition》2006,8(2-3):87-110
Image registration (or alignment) is a useful preprocessing tool for assisting in manual data extraction from handwritten forms, as well as for preparing documents for batch OCR of specific page regions. A new technique is presented for fast registration of lined tabular document images in the presence of a global affine transformation, using the Discrete Fourier--Mellin Transform (DFMT). Each component of the affine transform is handled separately, which dramatically reduces the total parameter space of the problem. This method is robust and deals with all components of the affine transform in a uniform way by working in the frequency domain. The DFMT is extended to handle shear, which can approximate a small amount of perspective distortion. In order to limit registration to foreground pixels only, and to eliminate Fourier edge effects, a novel, locally adaptive foreground-background segmentation algorithm is introduced, based on the median filter, which eliminates the need for Blackman windowing as usually required by DFMT image registration. A novel information-theoretic optimization of the median filter is presented. An original method is demonstrated for automatically obtaining blank document templates from a set of registered document images. 相似文献
12.
Automated segmentation of brain MR images 总被引:5,自引:0,他引:5
A simple, robust and efficient image segmentation algorithm for classifying brain tissues from dual echo Magnetic Resonance (MR) images is presented. The algorithm consists of a sequence of adaptive histogram analysis, morphological operations and knowledge based rules to accurately classify various regions such as the brain matter and the cerebrospinal fluid, and detect if there are any abnormal regions. It can be completely automated and has been tested on over hundred images from several patient studies. Experimental results are provided. 相似文献
13.
Reza Farrahi Moghaddam Author Vitae 《Pattern recognition》2010,43(6):2186-2198
In this work, a multi-scale binarization framework is introduced, which can be used along with any adaptive threshold-based binarization method. This framework is able to improve the binarization results and to restore weak connections and strokes, especially in the case of degraded historical documents. This is achieved thanks to localized nature of the framework on the spatial domain. The framework requires several binarizations on different scales, which is addressed by introduction of fast grid-based models. This enables us to explore high scales which are usually unreachable to the traditional approaches. In order to expand our set of adaptive methods, an adaptive modification of Otsu's method, called AdOtsu, is introduced. In addition, in order to restore document images suffering from bleed-through degradation, we combine the framework with recursive adaptive methods. The framework shows promising performance in subjective and objective evaluations performed on available datasets. 相似文献
14.
The creation and deployment of knowledge repositories for managing, sharing, and reusing tacit knowledge within an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge repository, knowledge maps, often created by document clustering techniques, represent an appealing and promising approach. Various document clustering techniques have been proposed in the literature, but most deal with monolingual documents (i.e., written in the same language). However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual document clustering (MLDC) to create organizational knowledge maps. Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents. The empirical evaluation results show that the proposed LSI-based MLDC technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision, and is capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus. 相似文献
15.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
16.
Ji-Yeon Lee Jeong-Seon Park Hyeran Byun Jongsub Moon Seong-Whan Lee 《Pattern recognition》2002,35(2):485-503
As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents. 相似文献
17.
Yu-Ting Pai Author VitaeAuthor Vitae Shanq-Jang Ruan Author Vitae 《Pattern recognition》2010,43(9):3177-3187
Document image binarization involves converting gray level images into binary images, which is a feature that has significantly impacted many portable devices in recent years, including PDAs and mobile camera phones. Given the limited memory space and the computational power of portable devices, reducing the computational complexity of an embedded system is of priority concern. This work presents an efficient document image binarization algorithm with low computational complexity and high performance. Integrating the advantages of global and local methods allows the proposed algorithm to divide the document image into several regions. A threshold surface is then constructed based on the diversity and the intensity of each region to derive the binary image. Experimental results demonstrate the effectiveness of the proposed method in providing a promising binarization outcome and low computational cost. 相似文献
18.
传统的图像压缩技术,大都基于图像空域和色度空间同质性的假定,在文档图像的压缩中并不能取得最好的压缩效果。针对文档图像的特点,提出了一种基于图层分割的文档图像压缩方法。该方法首先利用多尺度的2色聚类算法进行文档图像的图层分割,然后根据不同图层的特征,分别采用效果最佳的压缩技术,能够获得比传统的方法更好的压缩效果。 相似文献
19.
目的 在文档图像版面分析上,主流的深度学习方法克服了传统方法的缺点,能够同时实现文档版面的区域定位与分类,但大多需要复杂的预处理过程,模型结构复杂。此外,文档图像数据不足的问题导致文档图像版面分析无法在通用的深度学习模型上取得较好的性能。针对上述问题,提出一种多特征融合卷积神经网络的深度学习方法。方法 首先,采用不同大小的卷积核并行对输入图像进行特征提取,接着将卷积后的特征图进行融合,组成特征融合模块;然后选取DeeplabV3中的串并行空间金字塔策略,并添加图像级特征对提取的特征图进一步优化;最后通过双线性插值法对图像进行恢复,完成文档版面目标,即插图、表格、公式的定位与识别任务。结果 本文采用mIOU(mean intersection over union)以及PA(pixel accuracy)两个指标作为评价标准,在ICDAR 2017 POD文档版面目标检测数据集上的实验表明,提出算法在mIOU与PA上分别达到87.26%和98.10%。对比FCN(fully convolutional networks),提出算法在mIOU与PA上分别提升约14.66%和2.22%,并且提出的特征融合模块对模型在mIOU与PA上分别有1.45%与0.22%的提升。结论 本文算法在一个网络框架下同时实现了文档版面多种目标的定位与识别,在训练上并不需要对图像做复杂的预处理,模型结构简单。实验数据表明本文算法在训练数据较少的情况下能够取得较好的识别效果,优于FCN和DeeplabV3方法。 相似文献
20.
The National Cancer Institute has collected a large database of uterine cervix images termed “cervigrams”, for cervical cancer screening research. Tissues of interest within the cervigram, in particular the lesions, are of varying sizes and of complexnon-convex shapes. The tissues possess similar color features and their boundaries are not always clear. The main objective of the current work is to provide a segmentation framework for tissues of interest within the cervix, that can cope with these difficulties in an unsupervised manner and with a minimal number of parameters. 相似文献