首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
数学公式图像识别与理解是文档图像处理领域的重要组成部分,目前尚无满足一般应用的处理方法. 提出了一种鲁棒的数学公式结构理解方法,使用公式图像识别结果、语法规则和句法规则分析数学公式结构,对数学公式的类型进行了完整的划分,对识别结果的错误进行自动的检查和纠正,能够自动分析数学公式符号的优先级和计算顺序. 既可以应用于数学公式图像的识别与格式转换,也可应用于对数学公式的检索和辅助编辑. 基于1 000个真实公式图像的实验结果证明了分析方法的有效性和稳定性.  相似文献   

2.
在线手写数学公式结构分析算法   总被引:1,自引:0,他引:1  
洪留荣 《计算机应用》2010,30(9):2545-2548
在线手写数学公式输入作为一种自然、快速的数学公式输入方法有着很大的应用前景。基于识别通用数学公式结构的范畴,提出了在线手写数学公式结构识别的算法。首先定义了数学公式结构的分形、支配关系并扩展了硬约规,同时根据手写公式的特点提出了最小生成树(MST)算法中一种新的权值计算方法,在此基础之上应用最小生成树算法和统计学的方法进行公式结构分析。与其他经典算法比较,所提算法扩大了识别的结构,同时识别正确率有所提高。  相似文献   

3.
数学公式广泛存在于各类文献之中,因此数学公式的自动定位、识别、分析和理解是文档图像处理必须研究的问题.但是数学公式和普通文本存在很大区别,所以数学公式的识别、分析远比文字段落的识别困难.本文回顾了数学公式图像处理的研究历程,提出了公式处理的模型,总结比较了公式定位、公式识别、公式分析以及性能评估等方面的主要处理方法,并展望了未来的研究方向.  相似文献   

4.
提出了一种基于基准线的多候选数学公式识别(Baseline Based Multi-candidate Mathematical Expression Recognition,BBMMER)方法。现代印刷体数学公式识别是模式识别的重要组成部分,而数学公式结构分析又是数学公式识别技术发展的瓶颈所在。提出了一种利用基准线定位公式嵌套结构,多候选分析公式符号间结构关系的方法,并使用LaTex格式表示数学公式的识别结果。在大量的公式图像组成的测试集上取得了良好的公式分析正确率。  相似文献   

5.
数学公式识别是OCR技术的重要组成部分,目前相关的研究还很欠缺。文章在简要介绍数学公式识别发展状况的基础上,针对结构分析这一公式识别的关键环节,提出了一种基于基准线、运算符作用域并结合语法分析数学公式结构的方法。实验表明,该方法对公式结构具有较好的适应性。  相似文献   

6.
李奋华  田学东 《微机发展》2004,14(12):13-15,88
数学公式识别是OCR技术的重要组成部分,目前相关的研究还很欠缺。文中在简要介绍数学公式识别发展状况的基础上,针对结构分析这一公式识别的关键环节,提出了一种将“自顶向下”和“自底向上”策略相结合的数学公式结构分析方法。实验表明,这种方法对公式结构具有较好的适应性。  相似文献   

7.
数学公式基线结构分析及识别算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
公式识别问题被分为字符分割和结构分析两部分内容。系统地研究了数学公式识别的全过程,使用自适应字符分割方法和基线结构分析算法成功地实现了一般数学公式的识别,识别率比较高,较好地完成了公式识别任务。从实验结果中可以看出,这种基于基线结构分析的数学公式识别方法能够满足大多数印刷体公式的识别,是一种较好的方法。  相似文献   

8.
数学公式识别是OCR技术的重要组成部分,目前相关的研究还很欠缺.文中在简要介绍数学公式识别发展状况的基础上,针对结构分析这一公式识别的关键环节,提出了一种将"自顶向下"和"自底向上"策略相结合的数学公式结构分析方法.实验表明,这种方法对公式结构具有较好的适应性.  相似文献   

9.
上下标关系数学公式中出现频繁又难于解决的特殊结构,容易与其它关系混淆.提出了基于模糊理论的数学公式上下标关系判别.运用模糊理论对数学公式中符号的空间区域关系进行划分,然后应用模糊识别的方法对上下标关系进行判别.实验结果表明,运用该方法能明显提高符号空间关系判别的识别率,尤其是能很好地判别手写数学公式中的空间关系,识别的正确率可达到96.4%.  相似文献   

10.
在线手写数学公式识别面临书写字符的不确定性、数学公式结构的复杂性,以及公式书写风格因人而异等问题,特别是在公式书写中出现偶然性错误和包含复杂结构的情况下,现有的仅依赖机器的识别算法的识别准确率较低.为了解决这一问题,提出了人在回路的手写公式识别方法,该方法主要在结构分析阶段引入了人的参与,借助人对结构中歧义笔画的修改和结构补笔操作,完善和界定结构笔画和结构内笔画信息.为了评估该方法的有效性,将其与不含用户参与信息的一个基线识别方法在结构识别率和表达式识别率方面进行了对比分析.结果表明,该方法能够有效地促进用户参与到手写识别过程,同时,针对实验收集的手写数学公式数据,引入用户参与的方法能够有效地提高手写数学公式的结构和表达式识别率,分别提高了9.26%和13.99%.  相似文献   

11.
根据数学公式中字符或符号间空间关系特点,并针对目前用于数学公式字符空间关系判别的区域和质心方法所存在的不足,提出了基于字符凸壳和模糊识别的字符空间关系判别方法.首先,对数学公式中的字符或符号进行分类,对每一类运用字符凸壳判别其正上和正下关系,然后应用模糊识别的方法对常见关系进行识别.实验结果表明,运用该方法能明显提高字符空间关系判别的识别率,识别的正确率可达到93.5%.  相似文献   

12.

Given the ubiquity of handwriting and mathematical content in human transactions, machine recognition of handwritten mathematical text and symbols has become a domain of great practical scope and significance. Recognition of mathematical expression (ME) has remained a challenging and emerging research domain, with mathematical symbol recognition (MSR) as a requisite step in the entire recognition process. Many variations in writing styles and existing dissimilarities among the wide range of symbols and recurring characters make the recognition tasks strenuous even for Optical Character Recognition. The past decade has witnessed the emergence of recognition techniques and the peaking interest of several researchers in this evolving domain. In light of the current research status associated with recognizing handwritten math symbols, a systematic review of the literature seems timely. This article seeks to provide a complete systematic analysis of recognition techniques, models, datasets, sub-stages, accuracy metrics, and accuracy details in an extracted form as described in the literature. A systematic literature review conducted in this study includes pragmatic studies until the year 2021, and the analysis reveals Support Vector Machine (SVM) to be the most dominating recognition technique and symbol recognition rate to be most frequently deployed accuracy measure and other interesting results in terms of segmentation, feature extraction and datasets involved are vividly represented. The statistics of mathematical symbols-related papers are shown, and open problems are identified for more advanced research. Our study focused on the key points of earlier research, present work, and the future direction of MSR.

  相似文献   

13.
Y.H. Huh  H.L. Beus   《Pattern recognition》1982,15(6):445-453
The Korean alphabet is a set of phonetic symbols which are combined to form characters, somewhat in the Chinese style. The phonetic quality of the symbols naturally limits the combinations that are useful, and character formation constraints further limit these combinations. Advantage of this is made in an on-line computer recognition system. Korean characters are entered by means of a graphic tablet, using the standard stroke sequences taught in schools. Recognition is perfect for carefully drawn characters and nearly so for characters written at an unhurried rate, provided the system is tuned to the writer's style.  相似文献   

14.
In this paper, we propose an approach for understanding Mathematical Expressions (MEs) in a printed document. The system is divided into three main components: (i) detection of MEs in a document; (ii) recognition of the symbols present in each ME; and (iii) arrangement of the recognised symbols. The MEs printed in separate lines are detected without any character recognition whereas the embedded expressions (mixed with normal text) are detected by recognising the mathematical symbols in text. Some structural features of the MEs are used for both cases. The mathematical symbols are grouped into two classes for convenience. At first, the frequently occurring symbols are recognised by a stroke-feature analysis technique. Recognition of less frequent symbols involves a hybrid of feature-based and template-based technique. The bounding-box coordinates and the size information of the symbols help to determine the spatial relationships among the symbols. A set of predefined rules is used to form the meaningful symbol groups so that a logical arrangement of the mathematical expression can be obtained. Experiments conducted using this approach on a large number of documents show high accuracy.  相似文献   

15.
介绍了一个印刷体数学公式识别系统,它由公式字符识别和结构分析两部分组成。在公式字符识别中,采用了一些适用于公式字符的特殊处理方法;在结构分析中,根据数学公式的结构布局,采用了一种将“自顶向下”和“自底向上”策略相结合的数学公式结构分析方法,实现了数学公式的重用,实验表明,这种方法能取得较好的识别效果。  相似文献   

16.
针对手写数学公式的识别和计算问题,提出了一种基于卷积神经网络的字符训练方法。利用计算机视觉对数学公式图片进行预处理,采用卷积神经网络进行二维矩阵转换,得到了对应的字符符号,通过后缀表达式计算了识别结果。运用Softmax函数训练了字符模型,统计和分析了几种类型的数学公式识别和计算结果。实验结果证明,通过训练字符能有效提高正确率,该方法可为复杂手写数学公式识别和计算提供参考。  相似文献   

17.
“形声”作为一种重要的造字方式,构筑了汉字家族中最为庞大的一支。造字之初,形声字以形符表义,以声符表音。随着时代的发展,声符的表音度渐渐发生变化,为人们准确地标音读字造成了一定困难。该文试采用聚类分析的方法,以普通话中3 500常用汉字为对象,结合语言学理论和计算机知识,依据声符表音程度相同、相似和不同制定详细分级标准,并得到每一层级的形声字表和百分数据,从而对现代汉字中形声字声符的表音度情况进行系统、直观而全面地呈现,以期为现代汉字规范的制定和汉语教学提供一定的参考和佐证。  相似文献   

18.
An expert system for general symbol recognition   总被引:3,自引:0,他引:3  
An expert system for analysis and recognition of general symbols is introduced. The system uses the structural pattern recognition technique for modeling symbols by a set of straight lines referred to as segments. The system rotates, scales and thins the symbol, then extracts the symbol strokes. Each stroke is transferred into segments (straight lines). The system is shown to be able to map similar styles of the symbol to the same representation. When the system had some stored models for each symbol (an average of 97 models/symbol), the rejection rate was 16.1% and the recognition rate was 83.9% of which 95% was recognized correctly. The system is tested by 5726 handwritten characters from the Center of Excellence for Document Analysis and Recognition (CEDAR) database. The system is capable of learning new symbols by simply adding their models to the system knowledge base.  相似文献   

19.
在印刷体数学公式识别中,不能准确地切分粘连符号是造成识别错误的主要原因之一。针对这种情况,提出了一种基于轮廓特征切分粘连符号的方法。根据轮廓特征及宽高比形成切分路径,然后对粘连字符切分。实验表明,这种方法使识别率得到了明显提高。  相似文献   

20.
A generic system for form dropout   总被引:21,自引:0,他引:21  
Recent advances in intelligent character recognition are enabling us to address many challenging problems in document image analysis. One of them is intelligent form analysis. This paper describes a generic system for form dropout when the filled-in characters or symbols are either touching or crossing the form frames. We propose a method to separate these characters from form frames whose locations are unknown. Since some of the character strokes are either touching or crossing the form frames, we need to address the following three issues: 1) localization of form frames; 2) separation of characters and form frames; and 3) reconstruction of broken strokes introduced during separation. The form frame is automatically located by finding long straight lines based on the block adjacency graph. Form frame separation and character reconstruction are implemented by means of this graph. The proposed system includes form structure learning and form dropout. First, a form structure-based template is automatically generated from a blank form which includes form frames, preprinted data areas and skew angle. With this form template, our system can then extract both handwritten and machine-typed filled-in data. Experimental results on three different types of forms show the performance of our system. Further, the proposed method is robust to noise and skew that is introduced during scanning  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号