首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
印刷体藏文文字识别技术研究   总被引:2,自引:0,他引:2       下载免费PDF全文
藏文字因其结构的特殊性,在应用传统文字识别方法进行识别时正确识别率较低,识别效果较差。在深入分析以印刷体藏文文字特征的基础上,提出了一系列可以在干扰情况下提高识别率的方法,包括局部自适应二值化算法、基于连通域的切分、基于网格的模糊笔划特征提取等。实验结果说明,这些方法可提高印刷体藏文文字识别系统的正确识别率和抗干扰能力。  相似文献   

2.
联机手写藏文识别中字丁规范化处理*   总被引:2,自引:0,他引:2  
通过对几种规范化处理算法的深入研究,根据联机手写藏文字丁的特点进行规范化处理,揭示出不同算法的内在本质规律;同时对联机手写藏文识别中的各种噪声进行分析,使用相应的方法消除噪声。使联机手写藏文识别系统的识别率得到一定的提高。  相似文献   

3.
移动互联网时代已经到来,各种移动终端成为了人们交流与沟通的重要工具,然而少数民族文化信息的传递因为输入法的缺失而受到了一定的限制,推动和发展少数民族文化需要从最基本的文字输入法开始。针对以上问题,本文通过对藏文的特点、藏文文字的组成、国内外对藏文文字信息编码等方面进行了探讨和研究,在此基础上分析了Android系统输入法框架IMF的特点、组成及工作原理。最后依据Android系统输入法的框架通过使用各种开发工具和方法,开发出了一款藏文输入法,并通过了在Android系统终端上的测试。文章阐述了藏文在Android系统下输入法的设计思想以及实现过程,介绍了该技术的设计原理以及流程。  相似文献   

4.
每种文字创制之初都不可能十分完善,需要在运用中不断总结经验,逐渐规范,日益完善,藏文同样有一个发展过程。"藏文"一词写作,意为"藏族的文字"。藏文作为藏族人民的书面交际工具,历史之悠久。该文中着重介绍了中小学课本中藏文字的常用频率,使教育者能够基本了解每一阶段藏文文字掌握程度。  相似文献   

5.
本文本主要对图片文字提取展开研究,首先读取图片进行预处理;然后针对网格特征和方向特征对图片文字进行特征提取;为了提高识别系统的可靠性,采用多分类器集成方法,即通过多个互补的分类器来改善单个分类器的性能。  相似文献   

6.
近日,全球首款在统一平台上支持中国多个少数民族文字文档的识别系统,在清华大学通过专家鉴定。这个系统首次完成了在统一平台上对蒙古文、藏文、维吾尔文、哈萨克文、朝鲜文和柯尔克孜文(混排汉英)文档的电脑识别,其主要技术指标达到了国际领先水平。  相似文献   

7.
藏文字属性分析是藏文信息处理的一项基础性工作,对藏文信息处理的研究和藏语文教学具有重要的参考价值及指导意义。藏文字是一种特殊的拼音文字,由1~7个基本构件横向和纵向拼接而成。因而藏文字符的属性包括其组成的构件及其构件的位置特征,以及藏文字的使用频度、结构、字长等属性特征。该文通过分析藏文字的结构,分别建立了藏文字及藏文字符串的向量模型VMTT、VMTS和藏文字符串的稀疏域模型SLM,并在向量模型和稀疏域模型上研究了藏文字符的构件特征。  相似文献   

8.
自然场景乌金体藏文文本信息作为高度浓缩的高层语义信息,不仅具有较大的研究和实用价值,而且可以用于协助藏文场景文本理解领域的研究.目前针对自然场景下乌金体藏文的检测与识别的相关研究甚少,本文在人工收集的自然场景乌金体藏文图像数据集的基础上,对比了目前常见的文字检测算法在自然场景乌金体藏文上的检测性能以及在不同特征提取网络下基于序列的文字识别算法CRNN在自然场景乌金体藏文图像数据集上的识别准确率并分析了在314张真实自然场景下乌金体藏文识别出错的特殊例子.实验表明本文在文字检测阶段采用的可微分的二值化网络DBNet在测试集上具有更好的检测性能,该方法在测试集上的准确率、召回率、F1值分别达到了0.89、0.59、0.71;在文字识别阶段采用MobileNetV3 Large作为特征提取网络时,CRNN算法在测试集上的识别准确率最高,达到了0.4365.  相似文献   

9.
藏文字频统计是藏文信息处理的基础性工作,通过对藏文字的部件、音节、结构和字的频度与通用度等定量统计与定性分析,为藏文信息处理提供基础数据。藏文字是一种由藏文字构件横向和纵向组合而成的拼音文字,在藏文字频统计中不仅要从整字角度统计分析藏文字频度属性,还要统计分析构成其构件的频度及位置属性。因此,在藏文字频统计系统中要分解构成藏文字的各部件。本文通过开发藏文字频统计系统,利用组合构件库结合藏文文法提出了一种藏文字构件分解算法。经测试,该算法不仅简单易行,而且可以有效地确定出各基本构件的位置特征,已应用于项目藏文字频统计系统。  相似文献   

10.
论述在MS Windows中,字库、文字处理器以及软件开发工具等环境对基于国际标准的藏文文字信息处理的支持。随着藏文OpenType字库的研制成功以及Unicode文字处理器对藏文文字的支持,解决了MS Windows中藏文的输入、输出、存储以及显示问题,而随着MS Visual C++8.0的推出,解决了藏文文字的程序处理问题,从而使基于国际标准的藏文文字信息处理在MS Windows中变得可行。  相似文献   

11.
张博  杨维  耿放  马晓元  韩策策 《传感器世界》2021,27(2):17-22,10
字符定位与识别技术在交通领域应用广泛。字符识别系统包括字符图像的拍摄、预先处理,将字符从图中截取出来,最后对字符对比甄别。采用灰阶处理、均值滤波等方法对图像预处理。在字符定位部分,采用边缘检测算法对字符图像在原始图中位置进行定位。在字符分割前,对字符进行二值化及倾斜校正。利用区域增长(region growing)方法切割字符,将具有相似特性的像素点集合到一起,与投影方法相结合进行切割。最后建立模板库,采用基于模板匹配的改进算法进行字符甄别。以汽车牌照的预处理、字符的提取和识别为例,说明文章所采用技术的有效性。  相似文献   

12.
The aim of our work is to present a new method based on structural characteristics and a fuzzy classifier for off-line recognition of handwritten Arabic characters in all their forms (beginning, end, middle and isolated). The proposed method can be integrated in any handwritten Arabic words recognition system based on an explicit segmentation process. First, three preprocessing operations are applied on character images: thinning, contour tracing and connected components detection. These operations extract structural characteristics used to divide the set of characters into five subsets. Next, features are extracted using invariant pseudo-Zernike moments. Classification was done using the Fuzzy ARTMAP neural network, which is very fast in training and supports incremental learning. Five Fuzzy ARTMAP neural networks were employed; each one is designed to recognize one subset of characters. The recognition process is achieved in two steps: in the first one, a clustering method affects characters to one of the five character subsets. In the second one, the pseudo-Zernike features are used by the appropriate Fuzzy ARTMAP classifier to identify the character. Training process and tests were performed on a set of character images manually extracted from the IFN/ENIT database. A height recognition rate was reported.  相似文献   

13.
多字体印刷藏文字符识别   总被引:5,自引:1,他引:5  
藏文字符识别系统是中文多文种信息处理系统的重要组成部分,但至今国内外的研究基本处于空白。本文提出了一种基于统计模式识别的多字体印刷藏文字符识别方法:从字符轮廓中抽取方向线素特征,利用线性鉴别分析(LDA)压缩降维后得到紧凑的字符特征向量。采用基于置信度分析的两级分类策略,设计了带偏差欧氏距离分类器(EDD)完成高效的粗分类,细分类采用修正二次鉴别函数(MQDF)。通过实验选取恰当的分类器参数后,在容量为177,600字符(300样本/字符类)的测试集上的识别率达到99.79%,证明了该方法的有效性。  相似文献   

14.
Decorated characters are widely used in various documents. Practical optical character reader is required to deal with not only common fonts but also complex designed fonts. However, since the appearances of decorated characters are complicated, most general character recognition systems cannot give good performances on decorated characters. In this paper, an algorithm that can extract character's essential structure from a decorated character is proposed. This algorithm is applied in preprocessing of character recognition. The proposed algorithm consists of three procedures: global structure extraction, interpolation of structure and smoothing. By using multiscale images, topographical features, such as ridges and ravines are detected for structure extraction. Ridges are used for extracting global structure and ravines are used for interpolation. Experimental results show character structures can be clearly extracted from very complex decorated characters  相似文献   

15.
A novel SVM-based handwritten Tamil character recognition system   总被引:1,自引:0,他引:1  
This paper describes a system for recognizing offline handwritten Tamil characters using support vector machine (SVM). Data samples are collected from different writers on A4 sized documents. They are scanned using a flat bed scanner at a resolution of 300 dpi and stored as gray-scale images. Various preprocessing operations are performed on the digitized image to enhance the quality of the image. Pixel densities are calculated for 64 different zones of the image and these values are used as the features of a character. These features are used to train the SVM. The SVM is tested for the first time to recognize handwritten Tamil characters. The system has achieved a very good recognition accuracy of 82.04% on the handwritten Tamil character database.  相似文献   

16.
藏医药文本字符嵌入对藏医药医学实体识别有着重要意义,但目前藏文缺少高质量的藏文语言模型。本文结合藏文结构特点使用普通藏文新闻文本训练基于音节的藏文BERT模型,并基于藏文BERT模型构建BERT-BiLSTM-CRF模型。该模型首先使用藏文BERT模型对藏医药文本字符嵌入进行学习,增强字符嵌入对藏文字符及其上下文信息的表示能力,然后使用BiLSTM层进一步抽取藏医药文本中字符之间的依赖关系,最后使用CRF层强化标注序列的合法性。实验结果表明,使用藏文BERT模型初始化藏医药文本字符嵌入有助于提高藏医药医学实体识别效果,F1值达96.18%。  相似文献   

17.
《信息交换用藏文编码字符集 基本集》奠定了研究藏文信息处理技术的基础,非常重要,但随着藏文信息处理技术研究的深入,也逐渐发现了《基本集》没能反映藏文构件的基本特征,增加了研究有关藏文工作的难度,同时,在使用中还存在藏文编码歧义等缺陷。针对上述问题提出了增加三个上加字的编码到BMP中,使得藏文编码能正确地反应藏文的构件特征,还提出用“界定藏文编码的使用方法”来消除《基本集》应用中存在的歧义以及正确理解几个字符的属性等问题。  相似文献   

18.
赵栋材 《微处理机》2012,33(5):35-38,43
木刻藏文经书文中出现字符间粘连、断裂、遮挡现象严重,为识别带来极大的困难。在字符切分、特征提取等文字识别方法基础上,增加了基于BP网络的训练方法,通过对大量的木刻藏文经书字符的训练,修正了数据,收敛了识别结果。实验结果显示,此方法有助于提高木刻藏文经书的文字识别正确率。  相似文献   

19.
Results of recognition experiments are presented where different preprocessing techniques have been coupled to the front end of a recognition system. Normalization of character dimensions turns out to be the most efficient preprocessing technique. Various other geometrical representations have been examined as well as local operations. Results obtained are discussed with respect to computational effort. Classification experiments were carried out on a character sample of 560 per class. They have been confirmed with a sample containing 3000 characters per class.  相似文献   

20.
Off-line handwritten oriental character recognition is a difficult task due to the large category and stroke variety. These oriental characters are made up of components known as radicals, which are often written in a distorted proportion and size. All these factors lead to a difficult recognition problem, which unfortunately cannot be solved using direct classification approach like the neural network classifier and a preprocessing module. This paper proposes several novel preprocessing approaches and synergy of classifiers to achieve good performance. Novel classification approaches, comprising rough and coarse classification modules are proposed which when combined appropriately produced a high-performance recognition system capable of producing high accuracy classification in off-line oriental character recognition. The recognition accuracy of the system is a high of 97% and a 99% for the top 5 candidate selection scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号