首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study presents a new method, namely the multi-plane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. This process consists of two stages—localized histogram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the resultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach processes document images regionally and adaptively according to their respective local features. Hence detailed characteristics of the extracted textual objects, particularly small characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved. Moreover, this way also allows background objects with uneven, gradational, and sharp variations in contrast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed approach is effective in extracting textual objects with various illuminations, sizes, and font styles from various types of complex document images.  相似文献   

2.
Effective compound image compression algorithms require compound images to be first segmented into regions such as text, pictures and background to minimize the loss of visual quality of text during compression. In this paper, a new compound image segmentation algorithm based on the Mixed Raster Content model (MRC) of multilayer approach is proposed (foreground/mask/background). This algorithm first segments a compound image into different classes. Then each class is transformed to the three-layer MRC model differently according to the property of that class. Finally, the foreground and the background layers are compressed using JPEG 2000. The mask layer is compressed using JBIG2. The proposed morphological-based segmentation algorithm design a binary segmentation mask which partitions a compound image into different layers, such as the background layer and the foreground layer accurately. Experimental results show that it is more robust with respect to the font size, style, colour, orientation, and alignment of text in an uneven background. At similar bit rates, our MRC compression with the morphology-based segmentation achieves a much higher subjective quality and coding efficiency than the state-of-the-art compression algorithms, such as JPEG, JPEG 2000 and H.264/AVC-I.  相似文献   

3.
传统的图像压缩技术,大都基于图像空域和色度空间同质性的假定,在文档图像的压缩中并不能取得最好的压缩效果。针对文档图像的特点,提出了一种基于图层分割的文档图像压缩方法。该方法首先利用多尺度的2色聚类算法进行文档图像的图层分割,然后根据不同图层的特征,分别采用效果最佳的压缩技术,能够获得比传统的方法更好的压缩效果。  相似文献   

4.
5.
目的 针对基于区域的语义分割方法在进行语义分割时容易缺失细节信息,造成图像语义分割结果粗糙、准确度低的问题,提出结合上下文特征与卷积神经网络(CNN)多层特征融合的语义分割方法。方法 首先,采用选择搜索方法从图像中生成不同尺度的候选区域,得到区域特征掩膜;其次,采用卷积神经网络提取每个区域的特征,并行融合高层特征与低层特征。由于不同层提取的特征图大小不同,采用RefineNet模型将不同分辨率的特征图进行融合;最后将区域特征掩膜和融合后的特征图输入到自由形式感兴趣区域池化层,经过softmax分类层得到图像的像素级分类标签。结果 采用上下文特征与CNN多层特征融合作为算法的基本框架,得到了较好的性能,实验内容主要包括CNN多层特征融合、结合背景信息和融合特征以及dropout值对实验结果的影响分析,在Siftflow数据集上进行测试,像素准确率达到82.3%,平均准确率达到63.1%。与当前基于区域的端到端语义分割模型相比,像素准确率提高了10.6%,平均准确率提高了0.6%。结论 本文算法结合了区域的前景信息和上下文信息,充分利用了区域的语境信息,采用弃权原则降低网络的参数量,避免过拟合,同时利用RefineNet网络模型对CNN多层特征进行融合,有效地将图像的多层细节信息用于分割,增强了模型对于区域中小目标物体的判别能力,对于有遮挡和复杂背景的图像表现出较好的分割效果。  相似文献   

6.
7.
Handwritten digit recognition has long been a challenging problem in the field of optical character recognition and of great importance in industry. This paper develops a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve performance of isolated Farsi/Arabic handwritten digit recognition, we use Bag of Visual Words (BoVW) technique to construct images feature vectors. Each visual word is described by Scale Invariant Feature Transform (SIFT) method. For learning feature vectors, Quantum Neural Networks (QNN) classifier is used. Experimental results on a very popular Farsi/Arabic handwritten digit dataset (HODA dataset) show that proposed method can achieve the highest recognition rate compared to other state of the arts methods.  相似文献   

8.
9.
When a page of a book is scanned or photocopied, textual noise (extraneous symbols from the neighboring page) and/or non-textual noise (black borders, speckles, ...) appear along the border of the document. Existing document analysis methods can handle non-textual noise reasonably well, whereas textual noise still presents a major issue for document analysis systems. Textual noise may result in undesired text in optical character recognition (OCR) output that needs to be removed afterwards. Existing document cleanup methods try to explicitly detect and remove marginal noise. This paper presents a new perspective for document image cleanup by detecting the page frame of the document. The goal of page frame detection is to find the actual page contents area, ignoring marginal noise along the page border. We use a geometric matching algorithm to find the optimal page frame of structured documents (journal articles, books, magazines) by exploiting their text alignment property. We evaluate the algorithm on the UW-III database. The results show that the error rates are below 4% each of the performance measures used. Further tests were run on a dataset of magazine pages and on a set of camera captured document images. To demonstrate the benefits of using page frame detection in practical applications, we choose OCR and layout-based document image retrieval as sample applications. Experiments using a commercial OCR system show that by removing characters outside the computed page frame, the OCR error rate is reduced from 4.3 to 1.7% on the UW-III dataset. The use of page frame detection in layout-based document image retrieval application decreases the retrieval error rates by 30%.  相似文献   

10.
该文主要介绍了一种新的基于率失真优化的运动补偿的方法。它针对8x8的子块,提出了若干更贴近于实际物体轮廓和实际运动轨迹的运动分割模式,通过率失真方程来选择模式,利用这些模式来进行运动估计和运动补偿,从而达到高效地描述物体运动的目的,并在码率和失真之间取得了一个良好的平衡。实验结果表明,在全码率范围内,这种新的运动补偿方法的性能较传统的方法都有不同幅度的提高,而运算复杂度的提高也在可以接受的范围之内。  相似文献   

11.
The problem of handwritten digit recognition has long been an open problem in the field of pattern classification and of great importance in industry. The heart of the problem lies within the ability to design an efficient algorithm that can recognize digits written and submitted by users via a tablet, scanner, and other digital devices. From an engineering point of view, it is desirable to achieve a good performance within limited resources. To this end, we have developed a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve the overall performance achieved in classification task, the literature suggests combining the decision of multiple classifiers rather than using the output of the best classifier in the ensemble; so, in this new approach, an ensemble of classifiers is used for the recognition of handwritten digit. The classifiers used in proposed system are based on singular value decomposition (SVD) algorithm. The experimental results and the literature show that the SVD algorithm is suitable for solving sparse matrices such as handwritten digit. The decisions obtained by SVD classifiers are combined by a novel proposed combination rule which we named reliable multi-phase particle swarm optimization. We call the method “Reliable” because we have introduced a novel reliability parameter which is applied to tackle the problem of PSO being trapped in local minima. In comparison with previous methods, one of the significant advantages of the proposed method is that it is not sensitive to the size of training set. Unlike other methods, the proposed method uses just 15 % of the dataset as a training set, while other methods usually use (60–75) % of the whole dataset as the training set. To evaluate the proposed method, we tested our algorithm on Farsi/Arabic handwritten digit dataset. What makes the recognition of the handwritten Farsi/Arabic digits more challenging is that some of the digits can be legally written in different shapes. Therefore, 6000 hard samples (600 samples per class) are chosen by K-nearest neighbor algorithm from the HODA dataset which is a standard Farsi/Arabic digit dataset. Experimental results have shown that the proposed method is fast, accurate, and robust against the local minima of PSO. Finally, the proposed method is compared with state of the art methods and some ensemble classifier based on MLP, RBF, and ANFIS with various combination rules.  相似文献   

12.
As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.  相似文献   

13.
目的 医学图像的像素级标注工作需要耗费大量的人力。针对这一问题,本文以医学图像中典型的眼底图像视盘分割为例,提出了一种带尺寸约束的弱监督眼底图像视盘分割算法。方法 对传统卷积神经网络框架进行改进,根据视盘的结构特点设计新的卷积融合层,能够更好地提升分割性能。为了进一步提高视盘分割精度,本文对卷积神经网络的输出进行了尺寸约束,同时用一种新的损失函数对尺寸约束进行优化,所提的损失公式可以用标准随机梯度下降方法来优化。结果 在RIM-ONE视盘数据集上展开实验,并与经典的全监督视盘分割方法进行比较。实验结果表明,本文算法在只使用图像级标签的情况下,平均准确识别率(mAcc)、平均精度(mPre)和平均交并比(mIoU)分别能达到0.852、0.831、0.827。结论 本文算法不需要专家进行像素级标注就能够实现视盘的准确分割,只使用图像级标注就能够得到像素级标注的分割精度。缓解了医学图像中像素级标注难度大的问题。  相似文献   

14.
15.
In this paper an efficient approach for segmentation of the individual characters from scanned documents typed on old typewriters is proposed. The approach proposed in this paper is primarily intended for processing of machine-typed documents, but can be used for machine-printed documents as well. The proposed character segmentation approach uses the modified projection profiles technique which is based on using the sliding window for obtaining the information about the document image structure. This is followed by histogram processing in order to determine the spaces between lines, words and characters in the document image. The decision-making logic used in the process of character segmentation is describes and represents the most an integral aspect of the proposed technique. Beside the character segmentation approach, the ultra-fast architecture for geometrical image transformations, which is used for image rotation in the process of skew correction, is presented, and its fast implementation using pointer arithmetic and a highly optimized low-level machine routine is provided. The proposed character segmentation approach is semi-automatic and uses threshold values to control the segmentation process. Provided results for segmentation accuracy show that the proposed approach outperforms the state-of-the-art approaches in most cases. Also, the results from the aspect of the time complexity show that the new technique performs faster than state-of-the-art approaches and can process even very large document images in less than one second, which makes this approach suitable for real-time tasks. Finally, visual demonstration of the proposed approach performances is achieved using original documents authored by Nikola Tesla.  相似文献   

16.
Imaged document text retrieval without OCR   总被引:6,自引:0,他引:6  
We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the vertical traverse density (VTD) and horizontal traverse density (HTD), are extracted. An n-gram-based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from the UW1 (University of Washington 1) database confirms the validity of the proposed method  相似文献   

17.
Segmentation is the most challenging part of Arabic handwriting recognition due to the unique characteristics of Arabic writing that allow the same shape to denote different characters. An Arabic handwriting recognition system cannot be successful without using an appropriate segmentation method. In this paper, a very effective and efficient off-line Arabic handwriting recognition approach is proposed. The proposed approach has three stages. Firstly, all characters are simplified to single-pixel-thin images that preserve the fundamental writing characteristics. Secondly, the image pixels are normalized into horizontal and vertical lines only. Therefore, the different writing styles can be unified and the shapes of characters are standardized. Finally, these orthogonal lines are coded as unique vectors; each vector represents one letter of a word. To evaluate the proposed techniques, we have tested our approach on two different datasets. Our experimental results show that the proposed approach has superior performance over the state-of-the-art approaches.  相似文献   

18.
Image segmentation has been widely used in document image analysis for extraction of printed characters, map processing in order to find lines, legends, and characters, topological features extraction for extraction of geographical information, and quality inspection of materials where defective parts must be delineated among many other applications. In image analysis, the efficient segmentation of images into meaningful objects is important for classification and object recognition. This paper presents two novel methods for segmentation of images based on the Fractional-Order Darwinian Particle Swarm Optimization (FODPSO) and Darwinian Particle Swarm Optimization (DPSO) for determining the n-1 optimal n-level threshold on a given image. The efficiency of the proposed methods is compared with other well-known thresholding segmentation methods. Experimental results show that the proposed methods perform better than other methods when considering a number of different measures.  相似文献   

19.
Text segmentation using gabor filters for automatic document processing   总被引:24,自引:0,他引:24  
There is a considerable interest in designing automatic systems that will scan a given paper document and store it on electronic media for easier storage, manipulation, and access. Most documents contain graphics and images in addition to text. Thus, the document image has to be segmented to identify the text regions, so that OCR techniques may be applied only to those regions. In this paper, we present a simple method for document image segmentation in which text regions in a given document image are automatically identified. The proposed segmentation method for document images is based on a multichannel filtering approach to texture segmentation. The text in the document is considered as a textured region. Nontext contents in the document, such as blank spaces, graphics, and pictures, are considered as regions with different textures. Thus, the problem of segmenting document images into text and nontext regions can be posed as a texture segmentation problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. These filters have been extensively used earlier for a variety of texture segmentation tasks. Here we apply the same filters to the document image segmentation problem. Our segmentation method does not assume any a priori knowledge about the content or font styles of the document, and is shown to work even for skewed images and handwritten text. Results of the proposed segmentation method are presented for several test images which demonstrate the robustness of this technique. This work was supported by the National Science Foundation under NSF grant CDA-88-06599 and by a grant from E. 1. Du Pont De Nemours & Company.  相似文献   

20.
In this paper, we propose a metric rectification method to restore an image from a single camera-captured document image. The core idea is to construct an isometric image mesh by exploiting the geometry of page surface and camera. Our method uses a general cylindrical surface (GCS) to model the curved page shape. Under a few proper assumptions, the printed horizontal text lines are shown to be line convergent symmetric. This property is then used to constrain the estimation of various model parameters under perspective projection. We also introduce a paraperspective projection to approximate the nonlinear perspective projection. A set of close-form formulas is thus derived for the estimate of GCS directrix and document aspect ratio. Our method provides a straightforward framework for image metric rectification. It is insensitive to camera positions, viewing angles, and the shapes of document pages. To evaluate the proposed method, we implemented comprehensive experiments on both synthetic and real-captured images. The results demonstrate the efficiency of our method. We also carried out a comparative experiment on the public CBDAR2007 data set. The experimental results show that our method outperforms the state-of-the-art methods in terms of OCR accuracy and rectification errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号