首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 154 毫秒
1.
通过分析表格的框线特征与结构特征,提出一种基于投影特征与结构特征的表格文本图像识别算法。该方法通过投影计算提取表格的框线特征,通过击中或击不中变换提取表格的结构特征,根据所提特征重要性的不同,设定分类判决阈值。实验结果表明,该方法能准确高效地区分表格文本图像与非表格文本图像,具有很强的实用性。  相似文献   

2.
表格分析是对表格的基本结构及形状进行识别的过程,是以后能否从表格单元中正确提取文本信息的关键.在结合表格特点的基础上,采用了表格线检测与处理相结合的方法获取表格框线.检测表格线过程中,通过定义了主表格线长度来加快扫描的速度:在表格线的处理中,针对杂线的剔除、表格线的调整及最终获得表格结构等方面进行了系统的探讨.大量的实验结果表明所提方法是可行的.  相似文献   

3.
针对表格框线存在倾斜、破裂、断裂及字符与表线粘连等情况,对表格框线的检测方法进行了深入研究。采用了表格框线检测与处理相结合的方法获取表线。在表格框线检测中,提出基于单向准连通的检测方法,有效地克服了框线的倾斜、破裂及粘连等情况;在表格框线的处理中,采用对检测线的连接和筛选的方法,有效解决了表格框线断裂的问题。通过大量的实验,表明该方法能取得较好的检测效果。  相似文献   

4.
基于字线分离的表格识别预处理算法   总被引:1,自引:1,他引:0  
表格文本图像版面中存在的大量的非表格框线对象,干扰正确提取表格框架结构.提出了一种基于字线分离的预处理算法.该算法在不提取表格框线的前提下,采用图像分块和连通域分析,实现字线分离.实验结果表明,该算法能够滤除大部分文字像素,准确有效突出表格文本图像中的框线信息,达到了预处理目的,是后续表格特征提取和识别的有效预备步骤.  相似文献   

5.
基于有向单连通链的表格框线检测算法   总被引:12,自引:0,他引:12  
表格框线检测是表格识别的基础.现有的表格框线检测算法或者速度慢,或者鲁棒性差,而且没有充分利用表格框线之间的约束信息.提出了一种基于所定义的图像结构基元"有向单连通链"的自底向上表格框线检测算法.在此算法中,有向单连通链是一种黑像素游程序列,作为非常合适的矢量基元,在引入一定表格框线约束信息的条件下合并单连通链,有效地去除伪框线,补全断裂的框线,提高了算法的鲁棒性,可以准确而快速地提取表格框线.通过滤除噪声单连通链,加快单连通链的合并速度,算法速度提高了3~10倍,满足了实用要求.实验证明,该算法具有速度  相似文献   

6.
介绍一种全新的基于数学形态学的表格自动处理系统。该系统无须经过学习过程,通过与表格框线形态相匹配的结构元素,以表格进行数学形态学方法处理,即可快速准确地提取表格框线,进而获取任意栏目的字符信息,用于进一步识别和处理。另外,对扫描或打印造成的非线性形变亦能较好地修复。  相似文献   

7.
字线交叠是表格处理中经常遇到的问题,它严重干扰了字符识别.本文提出一种基于线宽信息的表格框线去除算法-线宽阈值法.字符内采用较小的阈值去除框线,字符间采用较大的阈值,使本方法具有很好的抗噪声能力.针对数字与框线交叠的特殊情况,本文提出并比较了两种利用先验知识的方法:启发式先验知识法和识别反馈法.增值税发票的识别实验结果表明,本算法能使字线交叠情况下数字的识别率与字线不交叠的情况相当.  相似文献   

8.
基于笔交互的表格制作   总被引:2,自引:0,他引:2  
利用笔交互技术实现了表格的正式制作和草图制作,满足不同用户和不同应用的需要.提出了一种基于推理的自适应字线分离方法,首先通过上下文分析和全局分析来获取笔画的字特征和框线特征,然后对这2个特征进行综合评判来分离字线,通过交互来纠正分离错误.该方法实现了草图表格到正式表格的转换,增强了表格制作的灵活性.  相似文献   

9.
刘云锴  彭程  边赟 《计算机应用》2021,41(z1):250-254
针对传统表格结构识别算法中,前期图像预处理工作量大、复杂表格结构识别率低、高分辨率和高复杂度表格时间开销过于大的问题,提出先对图像表格结构利用直线段检测器进行框线检测,再利用双阈值直线判断规则,对本应属于同一直线的多条线段进行合并细化,最后对横纵线交点处缺失或过长线段,采用表格结构整体框线对齐的快速识别算法.实验结果表明,该算法既可以对图像高分辨率下简单表格和复杂表格准确识别,也可以满足图像低分辨率中简单表格和复杂表格的识别需求,可以容忍一定倾斜角度,因此减少了图像预处理工作,缩短了检测时间,甚至可以对非严格定义表格结构进行精准识别,进一步推动图像表格结构通用识别算法的进程.  相似文献   

10.
表格型票据中框线检测与去除算法   总被引:1,自引:0,他引:1  
字符笔画与表格线的粘连或交叠是表格型票据中普遍存在的现象,严重影响了后期票据自动识别处理的性能.现有方法大多基于二值图像,未能充分利用灰度图中的框线特征.基于票据图像中的框线特征,提出一种表格型票据预处理中的框线检测与去除算法,首先充分利用票据灰度图像的特点准确地检测出框线,再采用一种连通链结构描述叠加后的框线区域,然后对交叠进行判断和标记,根据标记保留字符笔划去除框线干扰.经过实际银行支票图像测试证明了算法的有效性和鲁棒性.  相似文献   

11.
A table is a well-organized and summarized knowledge expression for a domain. Therefore, it is of great importance to extract information from tables. However, many tables in Web pages are used not to transfer information but to decorate pages. One of the most critical tasks in Web table mining is thus to discriminate meaningful tables from decorative ones. The main obstacle of this task comes from the difficulty of generating relevant features for discrimination. This paper proposes a novel discrimination method using a composite kernel which combines parse tree kernels and a linear kernel. Because a Web table is represented as a parse tree by an HTML parser, it is natural to represent the structural information of a table as a parse tree. In this paper, two types of parse trees are used to represent structural information within and around a table. These two trees define the structure kernel that handles the structural information of tables. The contents of a Web table are manipulated by a linear kernel with content features. Support vector machines with the composite kernel distinguish meaningful tables from decorative ones with high accuracy. A series of experiments show that the proposed method achieves state-of-the-art performance.  相似文献   

12.
This paper describes a method for structuring graphics support which simplifies the programming of interactive applications, while improving their portability by providing a device-independent interface. The primary objective is to ease the task of the application programmer in developing applications which use interaction and graphics. Common graphics functions which are used by several applications are generalized and defined in terms of transition tables. User actions become inputs, system wait points become states, and system responses are pointed to by table outputs. A suitable modification is applied to each of the generalized graphics functions to tailor it to be used by a particular application.  相似文献   

13.
In documents, tables are important structured objects that present statistical and relational information. In this paper, we present a robust system which is capable of detecting tables from free style online ink notes and extracting their structure so that they can be further edited in multiple ways. First, the primitive structure of tables, i.e., candidates for ruling lines and table bounding boxes, are detected among drawing strokes. Second, the logical structure of tables is determined by normalizing the table skeletons, identifying the skeleton structure, and extracting the cell contents. The detection process is similar to a decision tree so that invalid candidates can be ruled out quickly. Experimental results suggest that our system is robust and accurate in dealing with tables having complex structure or drawn under complex situations.  相似文献   

14.
用单片机实现模糊控制策略是一种常用的重要的方法,它是根据误差和误差的变化率隶属度函数表,离线计算得到一张模糊控制查询表,用单片机实现查询功能。该方法应用十分广泛,因此优化离散形式的隶属度函数表具有重要的意义。该文提出了一种优化离散形式的隶属度函数表的新方法:即用遗传算法优化模糊集合中的语气算子H,从而优化离散形式隶属函数表。经优化后的隶属函数更能客观地反映控制对象真实特性,从而达到了优化模糊控制器的目的。文章用一个具体的实例以仿真的形式验证了该方法是正确的、有效的。  相似文献   

15.
In online automotive applications it is common to use look-up tables, or maps, to describe nonlinearities in component models that are to be valid over large operating ranges. If the component characteristics change with aging or wear, these look-up tables must be updated online. For 2-D look-up tables, the existing methods in the literature only adapt the observable parameters in the look-up table, which means that parameters in operation points that have not been visited for a long time may be far from their true values. In this work, correlations between different operating points are used to also update non-observable parameters of the look-up table. The method is applied to Open Circuit Voltage (OCV) curves for aged battery cells. From laboratory experimental data it is demonstrated that the proposed method can significantly reduce the average deviation from an aged OCV-curve compared to keeping the OCV-curve from the beginning of the cell’s life, both for observable and non-observable parameters.  相似文献   

16.
Conditional tables have been identified long ago as a way to capture unknown or incomplete information. However, queries over conditional tables have never been allowed to involve column functions such as aggregates. In this paper, the theory of conditional tables is extended in this direction, and it is shown that a strong representation system exists which has the closure property that the result of an aggregate query over a conditional table can be again represented by a conditional table. It turns out, however, that the number of tuples in a conditional table representing the result of an aggregate query may grow exponentially in the number of variables in the table. This phenomenon is analyzed in detail, and tight upper and lower bounds concerning the number of tuples contained in the result of an aggregate query are given. Finally, representation techniques are sketched that approximate aggregation results in tables of reasonable size.  相似文献   

17.
一种通用数据库数据整理方法   总被引:7,自引:1,他引:7  
刘天时  赵嵩正 《计算机工程》2004,30(20):70-71,74
数据库系统中的数据量随时间的延长增大,从而导致系统性能下降,对此文章提出了一种通用的数据整理方法。该方法基于触发器机制和从数据库数据到文本数据的转换技术,将满足条件的数据迁移到另一结构相同的数据库中,既保证了原信息系统的运行性能,又不损失系统的历史数据,而且在数据整理的同时兼有数据备份功能.该方法在实际应用中使开发者选择数据迁移过程简单化,对于有删除关联关系的多个表,只须考虑父表的情况,其所有子表的数据整理过程随父表自动完成。  相似文献   

18.
This paper plans an end-to-end method for extracting information from tables embedded in documents; input format is ASCII, to which any richer format can be converted, preserving all textual and much of the layout information. We start by defining table. Then we describe the steps involved in extracting information from tables and analyse table-related research to place the contribution of different authors, find the paths research is following, and identify issues that are still unsolved. We then analyse current approaches to evaluating table processing algorithms and propose two new metrics for the task of segmenting cells/columns/rows. We proceed to design our own end-to-end method, where there is a higher interaction between different steps; we indicate how back loops in the usual order of the steps can reduce the possibility of errors and contribute to solving previously unsolved problems. Finally, we explore how the actual interpretation of the table not only allows inferring the accuracy of the overall extraction process but also contributes to actually improving its quality. In order to do so, we believe interpretation has to consider context-specific knowledge; we explore how the addition of this knowledge can be made in a plug-in/out manner, such that the overall method will maintain its operability in different contexts.The opinions expressed in this article are the responsibility of the authors and do not necessarily reflect those of Banco de Portugal.  相似文献   

19.
The transformation of generalisation hierarchies into relational format cannot be done in a natural way. The usual methods of implementing such a structure are either to compress the hierarchy into one table or to represent it by means of a separate table for each class and subclass in the hierarchy. However, both techniques are accompanied by specific problems which become unacceptable for hierarchies that are structurally unstable, have many subclasses and/or use specialising attributes applying to several of the subclasses. This paper proposes an alternative technique in which generalisation structures are mapped into collections of tables and meta tables, the latter containing data about the hierarchy. The entity relationship approach is chosen as a frame of reference.  相似文献   

20.
有大量的关系信息存在于各种各样的Web列表中,但使用目前的搜索引擎却难以找到它们。本文提出了一种基于语义和数据特征的方法,用于识别和抽取Web列表中的关系信息。我们首先建立一个模型,描述所要的关系信息,然后寻找Web上的列表并估计它们是否包含所要的关系信息,当估计值足够大时.则从中抽取所要的关系信息。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号