共查询到20条相似文献,搜索用时 31 毫秒
1.
After the computer became a tool for data sharing and information exchange, the unified computer font has made the text lose the diversity and discreteness of handwriting. Text is the crucial factor for the spread of culture and civilization. Many electronic books have lost the characteristic fonts with cultural background and historical significance in the original ancient books after the digitalization. One example is the sculpted typeface with diversity and discreteness that can be called a Tibetan culture. In order to solve this problem, a research method of digitizing engraving fonts in ancient Tibetan books is proposed. Firstly, the projection method and the connected domain method are used to segment the ancient book image. Secondly, the GIST feature algorithm is used to realize the image text recognition. Thirdly, the SIFT feature algorithm is used to implement the image font style classification, and diffe rent styles of carved fonts in the ancient books are obtained. A font diversity expression algorithm is proposed to realize the diversity and discreteness of carved fonts in ancient books. The purpose of the research is to achieve the inheritance and protection of engraving fonts, which has important cultural research and inheritance significance. 相似文献
2.
3.
4.
Gabor滤波角度对字体识别结果有重要影响,由于字体纹理与自然纹理的不同,现有的Gabor滤波器角度参数不适于提取字体纹理的有效特征。基于字体纹理的多变性,该文提出使用遗传算法通过对字体纹理的学习优化滤波角度参数,使之能够适应字体纹理的特点,以提高识别率。通过对常用的4种字体899块字体纹理样本的测试表明:遗传算法能够找到适合字体识别的角度参数,使用新的角度参数减少了识别时间,提高了字体识别率。 相似文献
5.
Font recognition is useful for improving optical text recognition systems’ accuracy and time, and to restore the documents’ original formats. This paper addresses a need for Arabic font recognition research by introducing an Arabic font recognition database consisting of 40 fonts, 10 sizes (ranging from 8 to 24 points) and 4 styles (viz. normal, bold, italic, and bold–italic). The database is split into three sets (viz. training, validation, and testing). The database is freely available to researchers.1 Moreover, we introduce a baseline font recognition system for benchmarking purposes, and report identification rates on our KAFD database and the Arabic Printed Text Image (APTI) database with 20 and 10 fonts, respectively. The best recognition rates are achieved using log-Gabor filters. 相似文献
6.
为了揭示汉字字体与受众的情感意象之间的内在关系,从认知计算的角度出发, 探索构建一种“设计特征-结构指标-意象”的灰箱关联模型,以其预测汉字字体的多个意象。首 先依据认知计算的原理将字体结构规则抽象为知识,运用产生式规则将字体结构知识进行定量 描述,提出字重、重心、字面、字怀 4 个字体结构指标的认知计算公式,将无序的形态信息转 化为结构化的有序信息。然后基于汉字字体意象认知系统的非线性耦合的特点,发展出一种运 用多输出最小二乘支持向量回归机(MLS-SVR)进行汉字字体多意象预测的方法。将该方法对汉 字字体的 3 个意象进行预测,实验结果表明其具有良好的预测效果和精度。该模型可作为字体 智能设计系统的适应度函数,为发展字体智能设计提供有益的参考。 相似文献
7.
As collections of archived digital documents continue to grow the maintenance of an archive, and the quality of reproduction from the archived format, become important long‐term considerations. In particular, Adobe's portable document format (PDF) is now an important ‘final form’ standard for archiving and distributing electronic versions of technical documents. It is important that all embedded images in the PDF, and any fonts used for text rendering, should at the very minimum be easily readable on screen. Unfortunately, because PDF is based on PostScript technology, it allows the embedding of bitmap fonts in Adobe Type 3 format as well as higher‐quality outline fonts in TrueType or Adobe Type 1 formats. Bitmap fonts do not generally perform well when they are scaled and rendered on low‐resolution devices such as workstation screens. The work described here investigates how a plug‐in to Adobe Acrobat enables bitmap fonts to be substituted by corresponding outline fonts using a checksum matching technique against a canonical set of bitmap fonts, as originally distributed. The target documents for our initial investigations are those PDF files produced by LATEX systems when set up in a default (bitmap font) configuration. For all bitmap fonts where recognition exceeds a certain confidence threshold replacement fonts in Adobe Type 1 (outline) format can be substituted with consequent improvements in file size, screen display quality and rendering speed. The accuracy of font recognition is discussed together with the prospects of extending these methods to bitmap‐font PDF files from sources other than LATEX. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献
8.
9.
10.
汉字具有丰富的字体类型,并且不同的字体在汉字结构上有显著的不同,现在的OCR技术侧重字的识别,而对字体识别的关注较少。提出文字相关的单字符字体识别方法,利用文字相关的先验信息及字体结构特征,对字体的相似性度量采用向量空间模型,并针对常用66款简体字进行实验,得到了较好的平均识别率。 相似文献
11.
基于Gabor变换的汉字字体识别研究 总被引:2,自引:0,他引:2
在分析汉字字体特征的基础上,介绍了利用Gabor滤波器,通过纹理分析提取全局特征进行汉字字体识别的方法。实验结果表明,这种方法是可行的、有效的。 相似文献
12.
13.
基于纹理特征的汉字字体识别研究 总被引:2,自引:0,他引:2
介绍了字体识别的重要性和有待解决的问题,提出了一种利用Gabor滤波器提取版面纹理特征进行字体识别的方法,着重介绍了滤液器设计、纹理特征提取和字体识别的过程。这种方法是与内容无关的,不需要局部微细特征分析,可以解决实际版面样弱印刷质量差、变形多的问题。用于常见字体的识别,取得了较好的效果。 相似文献
14.
Avi-Itzhak H.I. Diep T.A. Garland H. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(2):218-224
Optical character recognition (OCR) refers to a process whereby printed documents are transformed into ASCII files for the purpose of compact storage, editing, fast retrieval, and other file manipulations through the use of a computer. The recognition stage of an OCR process is made difficult by added noise, image distortion, and the various character typefaces, sizes, and fonts that a document may have. In this study a neural network approach is introduced to perform high accuracy recognition on multi-size and multi-font characters; a novel centroid-dithering training process with a low noise-sensitivity normalization procedure is used to achieve high accuracy results. The study consists of two parts. The first part focuses on single size and single font characters, and a two-layered neural network is trained to recognize the full set of 94 ASCII character images in 12-pt Courier font. The second part trades accuracy for additional font and size capability, and a larger two-layered neural network is trained to recognize the full set of 94 ASCII character images for all point sizes from 8 to 32 and for 12 commonly used fonts. The performance of these two networks is evaluated based on a database of more than one million character images from the testing data set 相似文献
15.
Ding X Chen L Wu T 《IEEE transactions on pattern analysis and machine intelligence》2007,29(2):195-204
A novel algorithm for font recognition on a single unknown Chinese character, independent of the identity of the character, is proposed in this paper. We employ a wavelet transform on the character image and extract wavelet features from the transformed image. After a Box-Cox transformation and LDA (linear discriminant analysis) process, the discriminating features for font recognition are extracted and classified through a MQDF (Modified quadric distance function) classifier with only one prototype for each font class. Our experiments show that our algorithm can achieve a recognition rate of 90.28 percent on a single unknown character and 99.01 percent if five characters are used for font recognition. Compared with existing methods, all of which are based on a text block, our method can provide a higher recognition rate and is more flexible and robust, since it is based on a single unknown character. Additionally, our method demonstrates that it is possible to extract subtle yet discriminative signals embedded in a much larger noisy background 相似文献
16.
17.
VxWorks5.5采用点阵字库实现字体显示,这种字库设计简洁,应用广泛,但一个字库只能对应一种字体的一种大小,在不确定使用何种字体的情况下,这种传统的字体显示方式便不能够满足需求。通过使用TrueType字库和FreeType字体引擎相结合的方式,能实现多种字体、任意大小的显示功能。主要介绍了TrueType、FreeType技术的基本原理,以及在VxWorks5.5下如何将WindML、FreeType和TrueType三者相结合实现矢量字体显示的方法。 相似文献
18.
针对中华传统刺绣工艺传承保护问题中的分类任务,传统的刺绣分类方法存在耗时长、精度低以及需要大量掌握专业知识的人力资源等问题;设计了一种基于改进DenseNet的刺绣图像分类识别方法;构建刺绣图像分类识别数据集;采用局部二值模式LBP、Canny算子边缘提取以及Gabor滤波等方式提取纹理特征,将不同特征图与原图合并为四至六通道图像数据集送入网络进行消融试验,扩充了数据集宽度;为稳定训练过程,加速损失收敛速度,提出引入SPP (spatial pyramid pooling)结构优化模型;为提高分类识别精度使用Leaky ReLU激活函数优化ReLU函数;实验结果表明基于改进DenseNet的刺绣图像分类识别方法可解决传统刺绣图像分类方法中存在的问题,改进后的刺绣图像分类模型与基准模型相比准确率提高了8.1%,高达97.39%。 相似文献
19.
Yanhong Li Lopresti D. Nagy G. Tomkins A. 《IEEE transactions on pattern analysis and machine intelligence》1996,18(2):99-107
Considers the problem of evaluating character image generators that model distortions encountered in optical character recognition (OCR). While a number of such defect models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. The authors introduce a rigorous and more pragmatic definition of when a model is accurate: they say a defect model is validated if the OCR errors induced by the model are indistinguishable from the errors encountered when using real scanned documents. The authors describe four measures to quantify this similarity, and compare and contrast them using over ten million scanned and synthesized characters in three fonts. The measures differentiate effectively between different fonts and different scans of the same font regardless of the underlying text 相似文献
20.
Texture for script identification 总被引:2,自引:0,他引:2
Busch A Boles WW Sridharan S 《IEEE transactions on pattern analysis and machine intelligence》2005,27(11):1720-1732