期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

付芦静钱军浩钟云飞《计算机工程与应用》2015,51(5):178-182

针对彩色印刷图像背景色彩丰富和汉字存在多个连通分量,连通域文字分割算法不能精确提取文字,提出基于汉字连通分量的彩色印刷图像版面分割方法。利用金字塔变换逆半调算法对图像进行预处理,通过颜色采样和均值偏移分割图像颜色,标记文字连通分量,根据汉字结构和连通分量特性重建汉字连通分量,分析文字连通分量连接关系确定文字排列方向实现文字分割。实验结果表明,该方法能够有效地重建汉字连通分量,在彩色印刷图像中实现对不同字体、字号、颜色的文字分割。相似文献

2.

Adaptive document block segmentation and classification 总被引：3，自引：0，他引：3

Shih F.Y. Shy-Shyan Chen 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1996,26(5):797-802

This paper presents an adaptive block segmentation and classification technique for daily-received office documents having complex layout structures such as multiple columns and mixed-mode contents of text, graphics, and pictures. First, an improved two-step block segmentation algorithm is performed based on run-length smoothing for decomposing any document into single-mode blocks. Then, a rule-based block classification is used for classifying each block into the text, horizontal/vertical line, graphics, or-picture type. The document features and rules used are independent of character font and size and the scanning resolution. Experimental results show that our algorithms are capable of correctly segmenting and classifying different types of mixed-mode printed documents. 相似文献

3.

鲁棒的多体印刷英文识别系统的实现 总被引：6，自引：1，他引：5

伍振军丁晓青《计算机工程与应用》2001,37(20):120-122

文章讨论了设计一个实用的多体英文识别系统中解决的主要问题。该系统能识别多达260种字体,包括斜体和黑体等字体,对训练集的识别率达到99%,对实际文本测试的错误率比TH-OCR2000低56%。文章详细阐述了文本行字切分,特征提取和分类器设计,以及后处理所使用的常用技术,对各种技术的特点进行了分析和比较,并提出了一些新的技术。文章对于OCR系统的设计具有一定的指导意义。相似文献

4.

高性能的多体印刷英文识别系统的实现 总被引：3，自引：0，他引：3

陈国平张明新付跃文王劲林《计算机工程与应用》2006,42(12):183-186

提高低质量文本图像的识别率是现今文字识别研究的重要方向。文章对倾斜文本行的切分算法,断裂、粘连、交叠字符的切分算法以及后处理作了较为深入的研究,提出一些新的算法。该系统能够识别多达260种字体,包括黑体、斜体等字体,对训练集的识别率达到98.5%,并在实际应用中取得了良好效果。相似文献

5.

A segmentation-free approach to recognise printed Sinhala script using linear symmetry

H.L. Premaratne Author Vitae J. Bigun^{Author Vitae} 《Pattern recognition》2004,37(10):2081-2089

相似文献

6.

Multi-oriented Bangla and Devnagari text recognition

Umapada Pal Partha Pratim Roy Nilamadhaba Tripathy Josep LladósAuthor vitae 《Pattern recognition》2010,43(12):4124-4136

相似文献

7.

多知识综合判决的字符切分算法 总被引：3，自引：0，他引：3

刘刚丁晓青彭良瑞刘长松《计算机工程与应用》2002,38(17):59-61,72

高性能的印刷体文字识别系统中,在单字识别技术比较成熟的条件下,字符切分成为比较关键的环节。字符切分可以看作是对字符边界正确切分位置的一个决策过程,该决策需要同时考虑字符局部的识别情况和全局的上下文关系。该文通过对中日韩三国文字字符切分的研究,提出一种基于多知识综合判决的字符切分算法。该算法成功应用于AsiaOCR项目,对于东方文字中常见的混排英文问题也能很好处理。实验结果表明,和以前的算法相比,新算法在中日韩三国文字识别系统中的切分错误率平均下降50%。相似文献

8.

Offline Arabic handwriting recognition: a survey 总被引：1，自引：0，他引：1

Lorigo LM Govindaraju V 《IEEE transactions on pattern analysis and machine intelligence》2006,28(5):712-724

相似文献

9.

印刷维吾尔文本切割 总被引：1，自引：0，他引：1

靳简明丁晓青彭良瑞王华《中文信息学报》2005,19(5):78-85

我国新疆地区使用的维吾尔文借用阿拉伯文字母书写。因为阿拉伯文字母自身书写的特点,造成维文文本的切割和识别极其困难。本文在连通体分类的基础上,结合水平投影和连通体分析的方法实现维文文本的文字行切分和单词切分。然后定位单词基线位置,计算单词轮廓和基线的距离,寻找所有可能的切点实现维文单词过切割,最后利用规则合并过切分字符。实验结果表明,字符切割准确率达到99 %以上。相似文献

10.

Layout extraction of mixed mode documents 总被引：2，自引：0，他引：2

Frank Hönes Jürgen Lichter 《Machine Vision and Applications》1994,7(4):237-246

Proper processing and efficient representation of the digitized images of printed documents require the separation of the various information types: text, graphics, and image elements. For most applications it is sufficient to separate text and nontext, because text contains the most information. This paper describes the implementation and performance of a robust algorithm for text extraction and segmentation that is completely independent of text orientation and can deal with text in various font styles and sizes. Text objects can be nested in nontext areas, and inverse printing can also be analyzed. It should be mentioned that the classification is based only on rough image features, and individual characters are not recognized. The three main processing steps of the system are the generation of connected components, neighborhood analysis, and generation of text lines and blocks. As output, connected components are classified as text or nontext. Text components are grouped as characters, words, lines, and blocks. Nontext objects are accumulated as a separate nontext block. 相似文献

11.

Multioriented and curved text lines extraction from Indian documents

Pal U. Roy P.P. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(4):1676-1684

There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or the text lines may be curved shapes. For the optical character recognition (OCR) of these documents, we need to extract such lines properly. In this paper, we propose a novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines. A reservoir is a metaphor to illustrate the cavity region of a character where water can be stored. In the proposed scheme, at first, connected components are labeled and identified either as isolated or touching. Next, each touching component is classified either straight type (S-type) or curve type (C-type), depending on the reservoir base-area and envelope points of the component. Based on the type (S-type or C-type) of a component two candidate points are computed from each touching component. Finally, candidate regions (neighborhoods of the candidate points) of the candidate points of each component are detected and after analyzing these candidate regions, components are grouped to get individual text lines. 相似文献

12.

A survey on Arabic character segmentation

Yasser M. Alginahi 《International Journal on Document Analysis and Recognition》2013,16(2):105-126

相似文献

13.

Efficient skew detection of printed document images based on novel combination of enhanced profiles

A. Papandreou B. Gatos S. J. Perantonis I. Gerardis 《International Journal on Document Analysis and Recognition》2014,17(4):433-454

相似文献

14.

Extraction of type style-based meta-information from imaged documents

B.B. Chaudhuri U. Garain 《International Journal on Document Analysis and Recognition》2001,3(3):138-149

Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i) different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach on a large number of good quality, as well as degraded, document images are presented. Received July 12, 2000 / Revised October 1, 2000 相似文献

15.

Texture for script identification 总被引：2，自引：0，他引：2

Busch A Boles WW Sridharan S 《IEEE transactions on pattern analysis and machine intelligence》2005,27(11):1720-1732

相似文献

16.

多字体印刷维吾尔文的切分

哈力木拉《中文信息学报》1997,11(3):36-41

在许多文字识别系统中, 字符切分是预处理阶段的一部分, 其目的是从文本图象中分离出字母图象。而后才能针对切分后的每个字母进行识别。在具有连体特征的文字中, 字符切分就显得特别重要, 因为字符切分的准确与否直接影响字符的识别。维吾尔文就具有这种明显的连体特点, 本文主要讨论了采用抽取投影特征的方法, 实现了多字体维吾尔文的行切分、字切分和字符切分。相似文献

17.

Integrating knowledge sources in Devanagari text recognition system

Bansal V. Sinha R.M.K. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2000,30(4):500-505

The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved 相似文献

18.

Analysis and recognition of highly degraded printed characters

Anna?Tonazzini Email author Stefano?Vezzosi Luigi?Bedini 《International Journal on Document Analysis and Recognition》2003,6(4):236-247

This paper proposes an integrated system for the processing and analysis of highly degraded printed documents for the purpose of recognizing text characters. As a case study, ancient printed texts are considered. The system is comprised of various blocks operating sequentially. Starting with a single page of the document, the background noise is reduced by wavelet-based decomposition and filtering, the text lines are detected, extracted, and segmented by a simple and fast adaptive thresholding into blobs corresponding to characters, and the various blobs are analyzed by a feedforward multilayer neural network trained with a back-propagation algorithm. For each character, the probability associated with the recognition is then used as a discriminating parameter that determines the automatic activation of a feedback process, leading the system back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition cannot be relied on and makes use of blind deconvolution and MRF-based segmentation techniques whose high complexity is greatly reduced when applied to a few subimages of small size. The experimental results highlight that the proposed system performs a very precise segmentation of the characters and then a highly effective recognition of even strongly degraded texts. 相似文献

19.

An optical character recognition system for printed Telugu text

C.?Vasantha?Lakshmi Email author C.?Patvardhan 《Pattern Analysis & Applications》2004,7(2):190-204

相似文献

20.

国际化标准框架下蒙文操作系统的设计 总被引：2，自引：0，他引：2

芮建武吴健孙玉芳《计算机研究与发展》2006,43(4):716-721

蒙文操作系统实现较为复杂的原因在于两个方面:①传统蒙文采用自顶向下竖写、每列从左向右排列的书写方式;②蒙文字符在不同文本上下文中采用变化相当复杂的显现字形.基于操作系统国际化体系结构,从蒙文字符集、蒙文字符的变形显现、蒙文文本的垂直显示、蒙文独特的图形用户界面等多个方面阐述了传统蒙文操作系统实现中面临的难点和技术方案;简要介绍了基于QtKDE桌面系统的实现;最后提出了蒙文操作系统实现仍需要解决的问题. 相似文献