基于多特征模糊模式识别的公式符号关系判定   总被引:1,自引:0,他引:1       下载免费PDF全文
数学表达式的识别过程中,结构分析是非常重要的一步。而符号关系的判定又是结构分析的关键。然而符号间关系的不确定性导致数学表达式运算含义的模糊性,已经成为数学公式识别中结构分析的一大难点。通过大量的统计数据,抽取出较为明显的特征,并引入多特征模糊模式识别的方法,建立隶属函数来判断印刷体数学公式符号的关系。实验结果表明,该方法适用范围较广,准确率较高,具有较强的鲁棒性。  相似文献   

数学表达式识别一般分为字符识别和结构分析两部分,而且大多数现有的方法是先进行字符识别然后将字符识别的结果作为结构分析的输入再进行结构分析,在这种分步识别的过程中,字符识别的错误会被继承到结构分析阶段,最终导致识别错误。关于数学表达式结构分析的问题,现有的方法大多是在假设所有的符号已经识别的基础上进行的。为了解决上述问题,提出了一种实时识别联机手写数学表达式的方法。该方法基于字符识别和结构分析的结合,动态地构建一棵数学表达式结构树来识别该数学表达式。在构建数学表达式树的过程中,采用了影响区域定位的方法,免去了其他不受影响区域的重复识别过程,因而提高了再次识别的效率,同时还弥补了现有实时识别方法不能乱序输入的缺陷。实验结果表明提出的方法可以得到比较满意的识别结果。  相似文献   

离线数学符号识别是离线数学表达式识别的前提。针对现有离线符号识别方法只是单纯的对符号进行识别,对离线表达式识别的其他环节未有任何帮助,反而会限制表达式识别,提出一种改进 YOLOv5s的离线符号识别方法。首先,根据符号图像小的特点,用生成对抗网络(GAN)进行数据增强;其次,从符号类别的角度分析,在 YOLOv5s 模型中引入空间注意力机制,利用全局最大值和全局平均值池化,扩大类别间的差异特征;最后,从符号自身角度分析,引入双向长短期记忆网络(BiLSTM)对符号特征矩阵进行处理,使符 号特征具有上下相关联的信息。实验结果表明:改进后的 YOLOv5s 取得较好离线符号识别效果,有 92.47%的识别率,与其他方法进行对比,证明了其有效性和稳健性。同时,能有效避免离线数学表达式识别中错误累积的问题,且能为表达式的结构分析提供有效依据。  相似文献   

中文科技文档中的数学表达式定位   总被引:1,自引:0,他引:1  
数学表达式定位是印刷体数学表达式识别的前提。针对中文科技文档,分别对独立表达式和内嵌表达式的定位问题提出了新的方法。采用自适应神经模糊推理系统(ANFIS) 对行特征进行分类,提取出独立表达式;采用模糊聚类和动态规划方法,从文档中依次提取出汉字、中文标点和英文字符,利用启发式规则合并剩余的数学符号而提取出内嵌表达式。实验表明,提出的表达式定位方法有很高的正确率。  相似文献   

针对数学表达式符号种类繁多、结构复杂多变、语法语义丰富等特点,提出一种检索结果相关排序算法,利用犹豫模糊集在处理多特征、多隶属度模式方面的优势,计算数学表达式间的相似度,实现基于相似度的数学表达式检索结果的相关排序。通过归纳数学表达式的符号、结构、语法、语义方面的特征,建立数学表达式的相似度函数,对数学表达式检索系统中用户查询式与检索结果集中数学表达式之间的相似程度进行综合多视角的测量。实验结果表明,该算法能实现数学表达式检索系统结果数据的有序输出,有助于改善数学表达式检索系统的性能。  相似文献   

汉字数学表达式的自动生成   总被引:10,自引:0,他引:10  
汉字的数学表达式是一种全新的汉字表示方法.通过对汉字部件特征的深入分析,利用图像处理技术对汉字数学表达式的自动生成做了探讨.选取了大约500个基本汉字部件,提取了各部件的连通数、亏格数、端点数、折点数、连接点数、交叉点数以及NMI,HNMI,VNMI值作为汉字部件的基本特征;并通过汉字连通区域的分割与合并进行汉字部件的划分和识别;最后,通过汉字结构的识别得到了汉字的数学表达式.实验中,汉字表达式自动生成的正确率为92%.这将在排版印刷、广告及包装设计、网络传输和中文移动通信等领域进一步促进中文信息的处理和传播.  相似文献   

数学表达式是现代计算机科学中必不可少的组成部分,数学教学软件中如果缺少数学表达式有效性的判定,将会严重影响软件的运行效率和用户体验。针对此问题提出了一种采用递归方法来判别数学表达式有效性的方法。首先对表达式进行规格化处理,然后遍历数学表达式,进行括号匹配、运算符优先级处理和数学基本初等函数识别,最后进行常量的识别。对其中遇到的问题进行了讨论,并提供了解决方案。该方法已经应用于数学软件、远程教育等应用领域的函数作图中,当用户输入表达式错误时,它能够及时提醒用户输入的错误位置,达到了很好的效果。实验表明:数学表达式的有效性判别可以显著提高教学软件、函数作图的效率,改善用户体验。  相似文献   

统计与结构结合的工程图纸符号识别方法   总被引:1,自引:0,他引:1  
提出一种统计识别与结构识别相结合的符号识别方法,以识别工程图纸中的各种符号,并达到了很好的识别效果。  相似文献   

数学表达式结构复杂多样,给检索带来困难。为此,提出一种数学表达式索引与检索方法。在索引阶段,通过对LaTeX数学表达式特点的分析与归纳,定义面向表达式二维结构特性的数学表达式特征表示方式,将互关联后继树索引模型应用于数学表达式索引的构建,以解决树结构表示表达式的层次增长问题。在匹配阶段,设计包括精确匹配、相容匹配、子式匹配、模糊匹配等查询模式的匹配算法。在浏览器/服务器模式下采用51 076条数学表达式进行索引与匹配。实验结果表明,提出的方法可加快查询速度,减小索引存储空间,能够适应数学表达式的结构特点,取得较好的检索效果。  相似文献   

通过分析同相与正交两路信号,建立了适用于分数间隔采样π/4-DQPSK差分解调的比特大数判决法、符号大数判决法及差分累加和判决法三种判决方式的数学模型,推导出了各种判决方式下符号误码率的数学表达式,并通过Monte-Carlo仿真实验对得出数学表达式进行了验证。误码率对比结果显示差分累加和判决法具有最佳的性能,在误码率为10-3,1/8符号间隔采样时,较传统差分解调有8.3dB的性能提升。  相似文献   

In this paper, we propose an approach for understanding Mathematical Expressions (MEs) in a printed document. The system is divided into three main components: (i) detection of MEs in a document; (ii) recognition of the symbols present in each ME; and (iii) arrangement of the recognised symbols. The MEs printed in separate lines are detected without any character recognition whereas the embedded expressions (mixed with normal text) are detected by recognising the mathematical symbols in text. Some structural features of the MEs are used for both cases. The mathematical symbols are grouped into two classes for convenience. At first, the frequently occurring symbols are recognised by a stroke-feature analysis technique. Recognition of less frequent symbols involves a hybrid of feature-based and template-based technique. The bounding-box coordinates and the size information of the symbols help to determine the spatial relationships among the symbols. A set of predefined rules is used to form the meaningful symbol groups so that a logical arrangement of the mathematical expression can be obtained. Experiments conducted using this approach on a large number of documents show high accuracy.  相似文献   

提出了一种基于基准线的多候选数学公式识别(Baseline Based Multi-candidate Mathematical Expression Recognition,BBMMER)方法。现代印刷体数学公式识别是模式识别的重要组成部分,而数学公式结构分析又是数学公式识别技术发展的瓶颈所在。提出了一种利用基准线定位公式嵌套结构,多候选分析公式符号间结构关系的方法,并使用LaTex格式表示数学公式的识别结果。在大量的公式图像组成的测试集上取得了良好的公式分析正确率。  相似文献   

An expert system for general symbol recognition   总被引:3,自引:0,他引:3  
An expert system for analysis and recognition of general symbols is introduced. The system uses the structural pattern recognition technique for modeling symbols by a set of straight lines referred to as segments. The system rotates, scales and thins the symbol, then extracts the symbol strokes. Each stroke is transferred into segments (straight lines). The system is shown to be able to map similar styles of the symbol to the same representation. When the system had some stored models for each symbol (an average of 97 models/symbol), the rejection rate was 16.1% and the recognition rate was 83.9% of which 95% was recognized correctly. The system is tested by 5726 handwritten characters from the Center of Excellence for Document Analysis and Recognition (CEDAR) database. The system is capable of learning new symbols by simply adding their models to the system knowledge base.  相似文献   

Analytical expressions for the probability of errors in recognition of multivariate Student distributions have been derived for the first time. The upper and lower bounds of the risk function have been found. The characteristics of the expressions are investigated, and numerical experiments are performed. Babushkina Elena V. Born 1968. Graduated from Perm State University in 1990. Perm State University, the Chair of Probability Theory and Mathematical Statistics, senior instructor. Scientific interests: statistical classification and estimation, cluster analysis. Number of publications is 28. A member of the Russian Association for Pattern Recognition. Abusev Rakip A. Born 1939. Graduated from Perm State University in 1967. Received the doctoral degree in 1993. Perm State University, head of the Chair of Probability Theory and Mathematical Statistics. Scientific interests: multivariate statistical analysis, pattern recognition, dimensionality reduction. Number of publications is 190. A member of the Russian Association for Pattern Recognition, a member of the International Bernoulli Society. A laureate of year in science of Perm State University. Died January 17, 2005.  相似文献   

多文种环境下汉字内码识别算法的研究   总被引:9,自引:4,他引:9  
汉字内码向ISO/IEC 10646过渡是实现计算机用文字编码统一的必然趋势,但目前在一段时间内仍将存在多种汉字内码并存的情况,所以实现汉字内码的自动识别是保证汉字多内码并存的关键。本文主要探讨了如何在多内码并存的多文种环境中实现汉字内码自动识别的问题,并提供了多种汉字内码识别算法,包括基于内码分布、标点符号特征、字频特征和语义特征的识别算法等。在此基础上,本文对不同的识别算法进行分析和评估。在对目标样本的测试中,以上算法的识别率最高可以达到99.9%以上。  相似文献   

脱机手写体字符识别技术是当前的热点和难点问题,是解决目前大量已有的文档资料录入工作的关键。在系统控制、人工智能、生物医学工程、遥感数据分析、军事目标识别等领域发挥了重要的作用,在国民经济、国防建设、社会发展和社会治安等方面得到广泛的应用。该文将就手写体字符识别技术的国内外研究进展进行综述。  相似文献   


Given the ubiquity of handwriting and mathematical content in human transactions, machine recognition of handwritten mathematical text and symbols has become a domain of great practical scope and significance. Recognition of mathematical expression (ME) has remained a challenging and emerging research domain, with mathematical symbol recognition (MSR) as a requisite step in the entire recognition process. Many variations in writing styles and existing dissimilarities among the wide range of symbols and recurring characters make the recognition tasks strenuous even for Optical Character Recognition. The past decade has witnessed the emergence of recognition techniques and the peaking interest of several researchers in this evolving domain. In light of the current research status associated with recognizing handwritten math symbols, a systematic review of the literature seems timely. This article seeks to provide a complete systematic analysis of recognition techniques, models, datasets, sub-stages, accuracy metrics, and accuracy details in an extracted form as described in the literature. A systematic literature review conducted in this study includes pragmatic studies until the year 2021, and the analysis reveals Support Vector Machine (SVM) to be the most dominating recognition technique and symbol recognition rate to be most frequently deployed accuracy measure and other interesting results in terms of segmentation, feature extraction and datasets involved are vividly represented. The statistics of mathematical symbols-related papers are shown, and open problems are identified for more advanced research. Our study focused on the key points of earlier research, present work, and the future direction of MSR.


Mathematical expression recognition: a survey   总被引:15,自引:0,他引:15  
Abstract. Automatic recognition of mathematical expressions is one of the key vehicles in the drive towards transcribing documents in scientific and engineering disciplines into electronic form. This problem typically consists of two major stages, namely, symbol recognition and structural analysis. In this survey paper, we will review most of the existing work with respect to each of the two major stages of the recognition process. In particular, we try to put emphasis on the similarities and differences between systems. Moreover, some important issues in mathematical expression recognition will be addressed in depth. All these together serve to provide a clear overall picture of how this research area has been developed to date. Received February 22, 2000 / Revised June 12, 2000  相似文献   

In this paper, we present a new document classification based on physical layout features and graph b-coloring modeling. In order to reduce the computing time and to increase the performance of our automatic reading system, we propose to pre-classify the business documents by introducing an Automatic Recognition of Documents stage as a pre-analysis phase. This phase guides others involved in the recognition process of the documents contents. Once the document type is identified, the reading system will use its corresponding information source to improve the recognition of its logical layout, the selection and parameterization of the OCR, and the final decision of sorting. The graph coloring model is introduced for both layout analysis and document classification. The proposed method is reliable, robust to various constraints and guarantees a real-time answer to the sorting of business documents.  相似文献   

