首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《成像科学杂志》2013,61(3):177-182
Abstract

In composite document image, handwritten and printed text is often found to be overlapped with printed lines. The problem becomes critical for obscure and broken lines at multiple positions. Consequently, line removal is unavoidable pre-processing stage in the development of robust object recognisers. Moreover, the restoration of the smash-up characters after removal of lines still persists to be a problem of interest. This paper presents a new approach to detect and remove unwanted printed line inherited in the text image at any position without character distortion to avoid restoration stage. The proposed technique is based on connected component analysis. Experiments are conducted using single line images that scanned and extracted manually from several documents and forms. It is demonstrated that our approach is equally suitable to deal with line removal in printed and handwritten text written in any language circumvent restoration stage. Promising results are reported in comparison with the other researchers in the state of the arts.  相似文献   

2.
N. Tripathy  U. Pal 《Sadhana》2006,31(6):755-769
Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten text recognition process. In this paper we propose a water reservoir concept-based scheme for segmentation of unconstrained Oriya handwritten text into individual characters. Here, at first, the text image is segmented into lines, and the lines are then segmented into individual words. For line segmentation, the document is divided into vertical stripes. Analysing the heights of the water reservoirs obtained from different components of the document, the width of a stripe is calculated. Stripe-wise horizontal histograms are then computed and the relationship of the peak-valley points of the histograms is used for line segmentation. Based on vertical projection profiles and structural features of Oriya characters, text lines are segmented into words. For character segmentation, at first, the isolated and connected (touching) characters in a word are detected. Using structural, topological and water reservoir concept-based features, characters of the word that touch are then segmented. From experiments we have observed that the proposed “touching character” segmentation module has 96.7% accuracy for two-character touching strings.  相似文献   

3.
4.
姜继春  王晓红  许秦蓉 《包装工程》2014,35(19):114-118
目的在不受光照条件的影响下,利用H-Cb混合颜色模型,提取快递单底单图像手写体文字信息。方法首先将图像从RGB颜色空间分别转换到HSI颜色空间和YCbCr颜色空间;然后将改进的YCbCr颜色空间的Cb颜色分量与HSI颜色空间的H颜色分量进行信息融合;最后对提取出的手写体文字信息进行阈值和反相处理,并将该算法提取结果与基于YCbCr颜色空间Cb颜色分量阈值分割方法和基于Lab颜色空间的手写文字聚类算法的提取结果,在分割效果、文字识别率上进行对比。结果利用H-Cb混合颜色模型检测出的手写体文字更准确,具有更高的识别率,在理想文字切分条件下识别率达96%。结论使用H-Cb混合颜色模型提取手写文字受光照条件影响小,提取出的图像噪声小、识别率高,算法简单可行,为彩色图像的检测与判定技术提供了支撑。  相似文献   

5.
基于 YCbCr 颜色空间的快递单手写文字分割   总被引:3,自引:2,他引:1  
目的在YCbCr颜色空间下,利用Cb颜色分量信息结合阈值分割方法,提取快递单图像手写体文字信息。方法首先将图像从RGB颜色空间转换到YCbCr颜色空间下,然后在Cb颜色分量图像下进行图像阈值分割处理操作,最后对提取出的手写体文字信息进行中值滤波去噪处理,并将该算法提取的结果与基于YCbCr颜色空间使用K均值聚类方法提取的结果在分割效果、分割时间与文字识别率上进行对比。结果利用Cb颜色分量提取出的手写体文字信息更清晰,具有更快的处理速度和更高的识别率,快递单图像平均处理时间为1.36 s,识别率为89%。结论单独利用Cb颜色分量信息提取手写文字就可得到较好的提取效果,算法简单、可行。  相似文献   

6.
The present paper proposes a novel algorithm for recognition of handwritten digits. For this, the present paper classified the digits into two groups: one group consists of blobs with/without stems and the other digits with stems only. The blobs are identified based on a new concept called morphological region filling methods. This eliminates the problem of finding the size of blobs and their structuring elements. The digits with blobs and stems are identified by a new concept called ‘connected component’. This method completely eliminates the complex process of recognition of horizontal or vertical lines and the property called ‘concavities’. The digits with only stems are recognized, by extending stems into blobs by using connected component approach of morphology. The present method has been applied and tested with various handwritten digits from modified NIST (National Institute of Standards and Technology) handwritten digit database (MNIST), and the success rate has been given. The present method is also compared with various existing methods.  相似文献   

7.
8.
P V S Rao 《Sadhana》1994,19(2):257-270
  相似文献   

9.
In recent years, Deep Learning models have become indispensable in several fields such as computer vision, automatic object recognition, and automatic natural language processing. The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field, especially for the Arabic language, which, compared to other languages, has a dearth of published works. In this work, we presented an efficient and new system for offline Arabic handwritten text recognition. Our new approach is based on the combination of a Convolutional Neural Network (CNN) and a Bidirectional Long-Term Memory (BLSTM) followed by a Connectionist Temporal Classification layer (CTC). Moreover, during the training phase of the model, we introduce an algorithm of data augmentation to increase the quality of data. Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters, thus overcoming several problems related to this point. To train and test (evaluate) our approach, we used two Arabic handwritten text recognition databases, which are IFN/ENIT and KHATT. The Experimental results show that our new approach, compared to other methods in the literature, gives better results.  相似文献   

10.
In recent years, the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web. As a result, the use of techniques for extracting useful information from large collections of data, and particularly documents, has become more necessary and challenging. Text clustering is such a technique; it consists in dividing a set of text documents into clusters (groups), so that documents within the same cluster are closely related, whereas documents in different clusters are as different as possible. Clustering depends on measuring the content (i.e., words) of a document in terms of relevance. Nevertheless, as documents usually contain a large number of words, some of them may be irrelevant to the topic under consideration or redundant. This can confuse and complicate the clustering process and make it less accurate. Accordingly, feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features. In this study, we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features. The proposed approach is based on two metaheuristic algorithms: a genetic algorithm (GA) and a shuffled frog-leaping algorithm (SFLA). The GA performs feature selection, and the SFLA performs clustering. To evaluate its effectiveness, the proposed approach was tested on a well-known text document dataset: the “20Newsgroup” dataset from the University of California Irvine Machine Learning Repository. Overall, after multiple experiments were compared and analyzed, it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering, compared with classical K-means clustering. Nevertheless, this improvement requires longer computational time.  相似文献   

11.
In this article, we developed a Bayesian model to characterize text line and text block structures on document images using the text word bounding boxes. We posed the extraction problem as finding the text lines and text blocks that maximize the Bayesian probability of the text lines and text blocks given the text word bounding boxes. In particular, we derived the so-called probabilistic linear displacement model (PLDM) to model the text line structures from text word bounding boxes. We also developed an augmented PLDM model to characterize the text block structures from text line bounding boxes. By systematically gathering statistics from a large population of document images, we are able to validate our models through experiments and determine the proper model parameters. We designed and implemented an iterative algorithm that used these probabilistic models to extract the text lines and text blocks. The quantitative performances of the algorithm in terms of the rates of miss, false, correct, splitting, merging, and spurious detections of the text lines and text blocks are reported. © 1996 John Wiley & Sons, Inc.  相似文献   

12.
13.
基于 Lab 颜色空间的手写文字提取算法研究   总被引:3,自引:1,他引:2  
目的研究颜色空间聚类在彩色手写体文字提取方面的应用。方法分别在Lab,LUV,YCbCr颜色空间以及YIQ颜色空间下,进行手写体文字图像聚类效果的分析比较,并结合空间域滤波增强与边缘检测技术提取出所需要的手写体文字信息。结果所选择研究对象在Lab颜色空间下对手写体文字具有较好的提取效果,有利于后续的文字识别。结论颜色空间聚类方法能有效避免灰度转换造成颜色信息丢失而引起的误判,在保证原有阈值分割算法快速、简单的前提下,能够对彩色图像进行更为准确的分割。  相似文献   

14.
15.
The paper discusses the segmentation of words into characters, which is an essential task in the development process of character recognition systems, as poorly segmented characters will automatically be unrecognized. The segmentation of offline handwritten Arabic text poses a greater challenge because of its cursive nature and different writing styles. In this article, we propose a new approach to segment handwritten Arabic characters using an efficient analysis of the vertical projection histogram. Our approach was tested using a set of handwritten Arabic words from the IFN/ENIT database, and promising results were obtained.  相似文献   

16.
A new approach for protection of parallel transmission lines is presented using a time-frequency transform known as the S-transform that generates the S-matrix during fault conditions. The S-transform is an extension of the wavelet transform and provides excellent time localisation of voltage and current signals during fault conditions. The change in energy is calculated from the S-matrix of the current signal using signal samples for a period of one cycle. The change in energy in any of the phases of the two lines can be used to identify the faulty phase based on some threshold value. Once the faulty phase is identified the differences in magnitude and phase are utilised to identify the faulty line. For similar types of simultaneous faults on both the lines and external faults beyond the protected zone, where phasor comparison does not work, the impedance to the fault point is calculated from the estimated phasors. The computed phasors are then used to trip the circuit breakers in both lines. The proposed method for transmission-line protection includes all 11 types of shunt faults on one line and also simultaneous faults on both lines. The robustness of the proposed algorithm is tested by adding significant noise to the simulated voltage and current waveforms of a parallel transmission line. A laboratory power network simulator is used for testing the efficacy of the algorithm in a more realistic manner.  相似文献   

17.
This paper presents a handwritten document recognition system based on the convolutional neural network technique. In today’s world, handwritten document recognition is rapidly attaining the attention of researchers due to its promising behavior as assisting technology for visually impaired users. This technology is also helpful for the automatic data entry system. In the proposed system prepared a dataset of English language handwritten character images. The proposed system has been trained for the large set of sample data and tested on the sample images of user-defined handwritten documents. In this research, multiple experiments get very worthy recognition results. The proposed system will first perform image pre-processing stages to prepare data for training using a convolutional neural network. After this processing, the input document is segmented using line, word and character segmentation. The proposed system get the accuracy during the character segmentation up to 86%. Then these segmented characters are sent to a convolutional neural network for their recognition. The recognition and segmentation technique proposed in this paper is providing the most acceptable accurate results on a given dataset. The proposed work approaches to the accuracy of the result during convolutional neural network training up to 93%, and for validation that accuracy slightly decreases with 90.42%.  相似文献   

18.
In the present milieu, changes in regulations and the opening of power markets have manifested in the form of large amount of power transfer across transmission lines with frequent changes in loading conditions based on market price. Since conventional distance relays may consider power swing as a fault, tripping because of such malfunctioning would lead to serious consequences for power system stability. A frequency domain approach for digital relaying of transmission line faults mitigating the adverse effects of power swing on conventional distance relaying is presented. A wavelet-neuro-fuzzy combined approach for fault location is also presented. It is different from conventional algorithms that are based on deterministic computations on a well-defined model for transmission line protection. The wavelet transform captures the dynamic characteristics of fault signals using wavelet multi-resolution analysis (MRA) coefficients. The fuzzy inference system (FIS) and the adaptive-neuro-fuzzy inference system (ANFIS) are both used to extract important features from wavelet MRA coefficients and thereby to reach conclusions regarding fault location. Computer simulations using MATLAB have been conducted for a 300 km, 400 kV line and results indicate that the proposed localisation algorithm is immune to effects of fault inception, angle and distance. The results contained here validate the superiority of the ANFIS approach over the FIS for fault location.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号