首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The touching character segmentation problem becomes complex when touching strings are multi-oriented. Moreover in graphical documents sometimes characters in a single-touching string have different orientations. Segmentation of such complex touching is more challenging. In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region in the background portion. Based on the convex hull information, at first, we use this background information to find some initial points for segmentation of a touching string into possible primitives (a primitive consists of a single character or part of a character). Next, the primitives are merged to get optimum segmentation. A dynamic programming algorithm is applied for this purpose using the total likelihood of characters as the objective function. A SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Experiments were performed in different databases of real and synthetic touching characters and the results show that the method is efficient in segmenting touching characters of arbitrary orientations and sizes.  相似文献   

3.
4.
在连续手写中文中,有偏旁部首离得较远的单字,单字之间可能会存在粘连、重叠。针对这种情况给出了一种基于识别得分提取单字的演化方法。对行笔划序列进行二进制编码,采用改进的遗传算法实现演化过程。染色体中连续0或1对应的笔划组成候选单字。用汉王手写单字识别器获取它们的识别得分,以单字个数较少和总的识别得分较大为优化目标。遗传算法中的变异概率和交叉概率自适应生成。测试结果表明该方法对连续手写中文具有较好的分割效果。  相似文献   

5.
粘连断裂字符行的切分识别,是很多OCR 实际应用中存在的主要困难之一. 本文针对粘连断裂的印刷体数字行,提出了一种基于Viterbi 算法的切分识别方案,该方案采用两次切分识别的层次型结构. 在第二次切分识别过程中,首先,在候选切分点区域,结合灰度图像与二值轮廓信息,采用基于Viterbi 算法搜索的非直线路径进行切分,得到有效的切分路径;然后,结合分类器输出的可信度,采用Viterbi 算法来合并前面得到的候选切分图像块,进行动态切分与识别. 实际的金融票据识别系统实验表明,本文提出的印刷体数字行切分识别方法能够较好的克服字符行的粘连与断裂情况,提高了识别系统的识别率和鲁棒性.  相似文献   

6.
针对古籍古文献中部分汉字易发生粘连现象,提出一种古籍手写汉字多步分割方法.该方法继承了以往粗分割和细分割相结合的思想,首先采用投影进行粗分割,将手写汉字分为粘连字符和非粘连字符两类;然后针对粘连字符串抛弃常用的串行模式,直接采用粗分割的统计信息,设置初始分割路径,并基于最短分割路径的思想,在初始分割路径的局部邻域内基于最小权值搜索并修改分割路径,从而获得最佳的加权分割路径.实验证明该方法解决了字符分割不足和多处粘连字符的分割问题,有效的提高了分割的准确率,且算法的时间复杂度较低,算法效率较高.  相似文献   

7.
The task of handwritten Chinese character recognition is one of the most challenging areas of human handwriting classification. The main reason for this is related to the writing system itself which encompasses thousands of characters, coupled with high levels of diversity in personal writing styles and attributes. Much of the existing work for both online and off-line handwritten Chinese character recognition has focused on methods which employ feature extraction and segmentation steps. The preprocessed data from these steps form the basis for the subsequent classification and recognition phases. This paper proposes an approach for handwritten Chinese character recognition and classification using only an image alignment technique and does not require the aforementioned steps. Rather than extracting features from the image, which often means building models from very large training data, the proposed method instead uses the mean image transformations as a basis for model building. The use of an image-only model means that no subjective tuning of the feature extraction is required. In addition by employing a fuzzy-entropy-based metric, the work also entails improved ability to model different types of uncertainty. The classifier is a simple distance-based nearest neighbour classification system based on template matching. The approach is applied to a publicly available real-world database of handwritten Chinese characters and demonstrates that it can achieve high classification accuracy and is robust in the presence of noise.  相似文献   

8.
In this paper, we develop a new method to separate single-touching handwritten numeral strings with two numerals using structural features. A binary image of a single-touching handwritten numeral string is preprocessed with an efficient algorithm for smoothing, linearization and detection of structural points of image contours. The touching region of a single-touching handwritten numeral string is determined based on distribution of the structural points in the handwritten numeral string. A candidate touching point is preselected based on the geometrical information of a special structural point in the touching region. In some cases, the left or right lateral numeral of a single-touching handwritten numeral string can be recognized. The recognition information can be utilized to correct the position of the candidate touching point. We have tested our method on image samples taken from the U.S. National Institute of Science and Technology (NIST) database. We used 500 sample images for training and obtained a correct separation rate of 99.1%. For 3287 test samples not used for training the correct separation rate was 97.2%.  相似文献   

9.
10.
11.
在对现有的货运列车车号分割算法及相关字符分割算法对比研究的基础上,文中提出并实现了一种新的货运列车车号分割算法。根据上下轮廓特征初步确定车号字符串图像的候选分割位置,然后根据字符尺寸比例和数字的弧特征,对断裂字符进行合并和对粘连字符进行再分割。该方法巧妙地避免了传统的投影分析分割法中处理粘连字符的难题,也避免了噪声对连通域的影响。与传统方法相比,具有较好的鲁棒性,达到了较高的精度和运行效率,为整个车号识别系统的精确性和稳定性提供了保障。  相似文献   

12.
在字符识别系统中,字符的有效分割是识别的关键。针对手写汉字字间距及字内距无规则可循,字符间极易发生粘连、交错等现象,提出一种多步分割方法。该方法首先利用Viterbi算法将原字符串切分成互不连通的分割块,使非粘连汉字、交错汉字得到正确分割;对于其中宽度较大存在粘连字符的分割块,从候选分割点入手,用非线性分割路径将粘连部分分开;最后再应用A*算法找到全局最佳分割位置,使过分割的字符得到完整合并。实验结果表明,该方法对于手写汉字的分割是可行、有效的。  相似文献   

13.
With the advances of handwriting capturing devices and computing power of mobile computers, pen-based Chinese text input is moving from character-based input to sentence-based input. This paper proposes a real-time recognition approach for sentence-based input of Chinese handwriting. The main feature of the approach is a dynamically maintained segmentation–recognition candidate lattice that integrates multiple contexts including character classification, linguistic context and geometric context. Whenever a new stroke is produced, dynamic text line segmentation and character over-segmentation are performed to locate the position of the stroke in text lines and update the primitive segment sequence of the page. Candidate characters are then generated and recognized to assign candidate classes, and linguistic context and geometric context involving the newly generated candidate characters are computed. The candidate lattice is updated while the writing process continues. When the pen lift time exceeds a threshold, the system searches the candidate lattice for the result of sentence recognition. Since the computation of multiple contexts consumes the majority of computing and is performed during writing process, the recognition result is obtained immediately after the writing of a sentence is finished. Experiments on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the robustness and effectiveness of the proposed approach.  相似文献   

14.
手写汉字识别是手写汉字输入的基础。目前智能设备中的手写汉字输入法无法根据用户的汉字书写习惯,动态调整识别模型以提升手写汉字的正确识别率。通过对最新深度学习算法及训练模型的研究,提出了一种基于用户手写汉字样本实时采集的个性化手写汉字输入系统的设计方法。该方法将采集用户的手写汉字作为增量样本,通过对服务器端训练生成的手写汉字识别模型的再次训练,使识别模型能够更好地适应该用户的书写习惯,提升手写汉字输入系统的识别率。最后,在该理论方法的基础上,结合新设计的深度残差网络,进行了手写汉字识别的对比实验。实验结果显示,通过引入实时采集样本的再次训练,手写汉字识别模型的识别率有较大幅度的提升,能够更有效的满足用户在智能设备端对手写汉字输入系统的使用需求。  相似文献   

15.
16.
Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data.  相似文献   

17.
一种新的粘连字符图像分割方法   总被引:2,自引:0,他引:2  
针对监控画面采样图像中数字的自动识别问题,提出一种新的粘连字符图像分割方法。该方法以预处理后二值图像的连通状况来判定字符粘连的存在,并对粘连字符图像采用上下轮廓极值法确定候选粘连分割点,以双向最短路径确定合适的图像分割线路。仿真实验表明,该方法能有效解决粘连字符图像的分割问题。  相似文献   

18.
Merged characters are the major cause of recognition errors. We classify the merging relationship between two involved characters into three types: "linear," "nonlinear," and "overlapped." Most segmentation methods handle the first type well, however, their capabilities of handling the other two types are limited. The weakness of handling the nonlinear and overlapped types results from character segmentation by linear, usually vertical, cuts assumed in these methods. This paper proposes a novel merged character segmentation and recognition method based on forepart prediction, necessity-sufficiency matching and character-adaptive masking. This method utilizes the information obtained from the forepart of merged characters to predict candidates for the leftmost character, and then applies character-adaptive masking and character recognition to verifying the prediction. Therefore, the arbitrary-shaped cutting path will follow the right shape of the leftmost character so as to preserve the shape of the next character. This method handles the first two types well and greatly improves the segmentation accuracy of the overlapped type. The experimental results and the performance comparisons with other methods demonstrate the effectiveness of the proposed method.  相似文献   

19.
连续手写识别是中文手写输入技术的核心,自然、快捷地输入中文信息一直是模式识别乃至人工智能领域追求的目标。提出了一种有效克服小屏幕限制的连续叠写汉字识别方法。该方法基于切分-识别集成的解码框架,先使用过切分算法处理输入的书写轨迹;然后启用一种新颖的感知机算法判定字符的边界;随后采用来自字符分类模型、几何模型和语言模型的多种上下文信息进行路径解码。为适应不同类型的移动终端,特别提出了一种高效压缩字符分类模型的方法,以有效减少字符识别过程对存储和内存的占用。该识别方法已在Android平台上部署,并进行了大规模的测试实验。实验结果证实了该识别方法的性能和效率。  相似文献   

20.
Generally speaking, through the binarization of gray-scale images, useful information for the segmentation of touched or overlapped characters may be lost in many cases. If we analyze gray-scale images, however, specific topographic features and the variation of intensities can be observed in the character boundaries. In this paper, we propose a new methodology for character segmentation and recognition which makes the best use of the characteristics of gray-scale images. In the proposed methodology, the character segmentation regions are determined by using projection profiles and topographic features extracted from the gray-scale images. Then a nonlinear character segmentation path in each character segmentation region is found by using multi-stage graph search algorithm. Finally, in order to confirm the nonlinear character segmentation paths and recognition results, a recognition-based segmentation method is adopted. Through the experiments with various kinds of printed documents, it is convinced that the proposed methodology is very effective for the segmentation and recognition of touched and overlapped characters  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号