首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Document layout analysis or page segmentation is the task of decomposing document images into many different regions such as texts, images, separators, and tables. It is still a challenging problem due to the variety of document layouts. In this paper, we propose a novel hybrid method, which includes three main stages to deal with this problem. In the first stage, the text and non-text elements are classified by using minimum homogeneity algorithm. This method is the combination of connected component analysis and multilevel homogeneity structure. Then, in the second stage, a new homogeneity structure is combined with an adaptive mathematical morphology in the text document to get a set of text regions. Besides, on the non-text document, further classification of non-text elements is applied to get separator regions, table regions, image regions, etc. The final stage, in refinement region and noise detection process, all regions both in the text document and non-text document are refined to eliminate noises and get the geometric layout of each region. The proposed method has been tested with the dataset of ICDAR2009 page segmentation competition and many other databases with different languages. The results of these tests showed that our proposed method achieves a higher accuracy compared to other methods. This proves the effectiveness and superiority of our method.  相似文献   

2.
3.
Document image binarization is a difficult task, especially for complex document images. Nonuniform background, stains, and variation in the intensity of the printed characters are some examples of challenging document features. In this work, binarization is accomplished by taking advantage of local probabilistic models and of a flexible active contour scheme. More specifically, local linear models are used to estimate both the expected stroke and the background pixel intensities. This information is then used as the main driving force in the propagation of an active contour. In addition, a curvature-based force is used to control the viscosity of the contour and leads to more natural-looking results. The proposed implementation benefits from the level set framework, which is highly successful in other contexts, such as medical image segmentation and road network extraction from satellite images. The validity of the proposed approach is demonstrated on both recent and historical document images of various types and languages. In addition, this method was submitted to the Document Image Binarization Contest (DIBCO??09), at which it placed 3rd.  相似文献   

4.
5.
提出了一种基于高斯衍生滤波器组的文种识别算法;分析了文本图像的纹理特性,相对于传统的小波变换,本文算法可以在更多方向上提取文字的边缘和脊特征.采用支持向量机(Support vector machine,SVM)对所提特征进行训练和分类,实现文字种类识别;在实验中选用中、英、俄、日、韩、阿拉伯等10种不同语言文字文本图像,测试了滤波器的不同参数对算法性能的影响,并与其他3种基于纹理的文种识别算法进行了比较,实验结果表明本文算法运算速度较快,且得到较好的识别率.  相似文献   

6.
This paper proposes a method to compare document images in multilingual corpus, which is composed of character segmentation, feature extraction and similarity measure. In character segmentation, a top-down strategy is used. We apply projection and self-adaptive threshold to analyze the layout and then segment the text line by horizontal projection. Then, English, Chinese and Japanese are recognized by different methods based on the distribution and ratios of text line. Finally, character segmentation with different languages is done using different strategies. In feature extraction and similarity measure, four features are given for coarse measurement, and then a template is set up. Based on the templates, a fast template matching method based on coarse-to-fine strategy and bit memory is presented for precise matching. The experimental results demonstrate that our method can handle multilingual document images of different resolutions and font sizes with high precision and speed.  相似文献   

7.
The detection of mathematical expressions is a prerequisite step for the digitisation of scientific documents. Many different multistage approaches have been proposed for the detection of expressions in document images, that is, page segmentation and expression detection. However, the detection accuracy of such methods still needs improvement owing to errors in the page segmentation of complex documents. This paper presents an end-to-end framework for mathematical expression detection in scientific document images without requiring optical character recognition (OCR) or document analysis techniques applied in conventional methods. The novelty of this paper is twofold. First, because document images are usually in binary form, the direct use of these images, which lack texture information as input for detection networks, may lead to an incorrect detection. Therefore, we propose the application of a distance transform to obtain a discriminating and meaningful representation of mathematical expressions in document images. Second, the transformed images are fed into the faster region with a convolutional neural network (Faster R-CNN) optimized to improve the accuracy of the detection. The proposed framework was tested on two benchmark data sets (Marmot and GTDB). Compared with the original Faster R-CNN, the proposed network improves the accuracies of detection of isolated and inline expressions by 5.09% and 3.40%, respectfully, on the Marmot data set, whereas those on the GTDB data set are improved by 4.04% and 4.55%. A performance comparison with conventional methods shows the effectiveness of the proposed method.  相似文献   

8.
Projection methods have been used in the analysis of bitonal document images for different tasks such as page segmentation and skew correction for more than two decades. However, these algorithms are sensitive to the presence of border noise in document images. Border noise can appear along the page border due to scanning or photocopying. Over the years, several page segmentation algorithms have been proposed in the literature. Some of these algorithms have come into widespread use due to their high accuracy and robustness with respect to border noise. This paper addresses two important questions in this context: 1) Can existing border noise removal algorithms clean up document images to a degree required by projection methods to achieve competitive performance? 2) Can projection methods reach the performance of other state-of-the-art page segmentation algorithms (e.g., Docstrum or Voronoi) for documents where border noise has successfully been removed? We perform extensive experiments on the University of Washington (UW-III) data set with six border noise removal methods. Our results show that although projection methods can achieve the accuracy of other state-of-the-art algorithms on the cleaned document images, existing border noise removal techniques cannot clean up documents captured under a variety of scanning conditions to the degree required to achieve that accuracy.  相似文献   

9.
Text segmentation using gabor filters for automatic document processing   总被引:24,自引:0,他引:24  
There is a considerable interest in designing automatic systems that will scan a given paper document and store it on electronic media for easier storage, manipulation, and access. Most documents contain graphics and images in addition to text. Thus, the document image has to be segmented to identify the text regions, so that OCR techniques may be applied only to those regions. In this paper, we present a simple method for document image segmentation in which text regions in a given document image are automatically identified. The proposed segmentation method for document images is based on a multichannel filtering approach to texture segmentation. The text in the document is considered as a textured region. Nontext contents in the document, such as blank spaces, graphics, and pictures, are considered as regions with different textures. Thus, the problem of segmenting document images into text and nontext regions can be posed as a texture segmentation problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. These filters have been extensively used earlier for a variety of texture segmentation tasks. Here we apply the same filters to the document image segmentation problem. Our segmentation method does not assume any a priori knowledge about the content or font styles of the document, and is shown to work even for skewed images and handwritten text. Results of the proposed segmentation method are presented for several test images which demonstrate the robustness of this technique. This work was supported by the National Science Foundation under NSF grant CDA-88-06599 and by a grant from E. 1. Du Pont De Nemours & Company.  相似文献   

10.
11.
Document representation and its application to page decomposition   总被引:6,自引:0,他引:6  
Transforming a paper document to its electronic version in a form suitable for efficient storage, retrieval, and interpretation continues to be a challenging problem. An efficient representation scheme for document images is necessary to solve this problem. Document representation involves techniques of thresholding, skew detection, geometric layout analysis, and logical layout analysis. The derived representation can then be used in document storage and retrieval. Page segmentation is an important stage in representing document images obtained by scanning journal pages. The performance of a document understanding system greatly depends on the correctness of page segmentation and labeling of different regions such as text, tables, images, drawings, and rulers. We use the traditional bottom-up approach based on the connected component extraction to efficiently implement page segmentation and region identification. A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis. Our algorithm has a high accuracy and takes approximately 1.4 seconds on a SGI Indy workstation for model creation, including orientation estimation, segmentation, and labeling (text, table, image, drawing, and ruler) for a 2550×3300 image of a typical journal page scanned at 300 dpi. This method is applicable to documents from various technical journals and can accommodate moderate amounts of skew and noise  相似文献   

12.
13.
We investigate different Vickers indentation segmentation methods and especially concentrate on active contours approaches as these techniques are known to be precise state of the art segmentation methods. Particularly, different kinds of level set-based methods which are improvements of the traditional active contours are analyzed. In order to circumvent the initialization problem of active contours, we separate the segmentation process into two stages. For the first stage, we introduce an approach which approximately locates the indentations with a high certainty. The results achieved with this method serve as initializations for the precise active contours (second stage). This two-stage approach delivers highly precise results for most real world indentation images. However, there are images, which are very difficult to segment. To handle even such images, our segmentation method is incorporated with the Shape from Focus approach, by including 3D information. In order to decrease the overall runtime, moreover, a gradual enhancement approach based on unfocused images is introduced. With three different databases, we compare the proposed methods and we show that the segmentation accuracy of these methods is highly competitive compared with other approaches in the literature.  相似文献   

14.
Image segmentation has been widely used in document image analysis for extraction of printed characters, map processing in order to find lines, legends, and characters, topological features extraction for extraction of geographical information, and quality inspection of materials where defective parts must be delineated among many other applications. In image analysis, the efficient segmentation of images into meaningful objects is important for classification and object recognition. This paper presents two novel methods for segmentation of images based on the Fractional-Order Darwinian Particle Swarm Optimization (FODPSO) and Darwinian Particle Swarm Optimization (DPSO) for determining the n-1 optimal n-level threshold on a given image. The efficiency of the proposed methods is compared with other well-known thresholding segmentation methods. Experimental results show that the proposed methods perform better than other methods when considering a number of different measures.  相似文献   

15.
16.
Skew estimation and page segmentation are the two closely related processing stages for document image analysis. Skew estimation needs proper page segmentation, especially for document images with multiple skews that are common in scanned images from thick bound publications in 2-up style or postal envelopes with various printed labels. Even if only a single skew is concerned for a document image, the presence of minority regions of different skews or undefined skew such as noise may severely affect the estimation for the dominant skew. Page segmentation, on the other hand, may need to know the exact skew angle of a page in order to work properly. This paper presents a skew estimation method with built-in skew-independent segmentation functionality that is capable of handling document images with multiple regions of different skews. It is based on the convex hulls of the individual components (i.e. the smallest convex polygon that fully contains a component) and that of the component groups (i.e. the smallest convex polygon that fully contain all the components in a group) in a document image. The proposed method first extracts the convex hulls of the components, segments an image into groups of components according to both the spatial distances and size similarities among the convex hulls of the components. This process not only extracts the hints of the alignments of the text groups, but also separate noise or graphical components from that of the textual ones. To verify the proposed algorithms, the full sets of the real and the synthetic samples of the University of Washington English Document Image Database I (UW-I) are used. Quantitative and qualitative comparisons with some existing methods are also provided.  相似文献   

17.
文本页面图像的图文分割与分类算法   总被引:2,自引:0,他引:2       下载免费PDF全文
为了能对包含不规则图片区和表格的倾斜文本页面图像进行图文分割与分类,提出了一种新的图文分割和分类算法。该算法先采用数学形态学和分级霍夫变换来进行文本倾斜的检测和校正;然后为了使算法能够对包含不规则图片区的文本页面图像进行处理,提出在传统的投影轮廓切割算法中,引入中点切割的过程,以便利用一系列的矩形来近似地逼近不规则的图片区。对于分割后的图像,则提出利用黑白像素比(Rbw)和近邻像素间的交叉相关性(Rcc)两个特征来作为分类的判据。实验结果证明,算法速度快、可靠性高。该算法只适用于二值图像。  相似文献   

18.
Marginal noise is a common phenomenon in document analysis which results from the scanning of thick documents or skew documents. It usually appears in the front of a large and dark region around the margin of document images. Marginal noise might cover meaningful document objects, such as text, graphics and forms. The overlapping of marginal noise with meaningful objects makes it difficult to perform the task of segmentation and recognition of document objects. This paper proposes a novel approach to remove marginal noise. The proposed approach consists of two steps which are marginal noise detection and marginal noise deletion. Marginal noise detection will reduce an original document image into a smaller image, and then find marginal noise regions according to the shape length and location of the split blocks. After the detection of marginal noise regions, different removal methods are performed. A local thresholding method is proposed for the removal of marginal noise in gray-scale document images, whereas a region growing method is devised for binary document images. Experimenting with a wide variety of test samples reveals the feasibility and effectiveness of our proposed approach in removing marginal noises.  相似文献   

19.
An advanced image and video segmentation system is proposed. The system builds on existing work, but extends it to achieve efficiency and robustness, which are the two major shortcomings of segmentation methods developed so far. Six different schemes containing several approaches tailored for diverse applications constitute the core of the system. The first two focus on very-low complexity image segmentation addressing real-time applications under specific assumptions. The third scheme is a highly efficient implementation of the powerful nonlinear diffusion model. The other three schemes address the more complex task of physical object segmentation using information about the scene structure or motion. These techniques are based on an extended diffusion model and morphology. The main objective of this work has been to develop a robust and efficient segmentation system for natural video and still images. This goal has been achieved by advancing the state-of-art in terms of pushing forward the frontiers of current methods to meet the challenges of the segmentation task in different situations under reasonable computational cost. Consequently, more efficient methods and novel strategies to issues for which current approaches fail are developed. The performance of the presented segmentation schemes has been assessed by processing several video sequences. Qualitative and quantitative result of this assessment are also reported  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号