共查询到20条相似文献,搜索用时 15 毫秒
1.
The Topological Structure of Scale-Space Images 总被引:5,自引:0,他引:5
2.
3.
表格文档在日常生活中运用十分广泛 ,它应用于人口普查、银行票据、各类报表等领域 ,对这类文档进行计算机自动处理具有重要的现实意义。表格文档信息处理系统主要由文档原始图像获取、文档结构提取和填写信息识别等部分组成。在分析了国内外表格文档信息自动录入系统的优缺点后 ,采用一种基于接触式图像传感器 (CIS)摄取表格文档的原始图像信号 ,利用硬件获得了高质量的图像信号。采用光学字符识别 (OCR)技术对填写的表格文档信息进行识别。该表格文档信息处理系统具有对表格文档的纸张和填写的要求低和识别准确度高的特点。 相似文献
4.
Vered Silber-Varod Amir Winer Nitza Geri 《Journal of Computer Information Systems》2017,57(2):106-111
Automatic Speech Recognition (ASR) may increase access to spoken information captured in videos. ASR is needed, especially for online academic video lectures that gradually replace class lectures and traditional textbooks. This conceptual article examines how technological barriers to ASR in under-resourced languages impair accessibility to video content and demonstrates it with the empirical findings of Hebrew ASR evaluations. We compare ASR with Optical Character Recognition (OCR) as facilitating access to textual and speech content and show their current performance in under-resourced languages. We target ASR of under-resourced languages as the main barrier to searching academic video lectures. We further show that information retrieval technologies, such as smart video players that combine both ASR and OCR capacities, must come to the fore once ASR technologies have matured. Therefore, suggesting that the current state of information retrieval from video lectures in under-resourced languages is equivalent to a knowledge dam. 相似文献
5.
开发了以DSP为核心的扫描识别及英汉互译系统(电子阅读笔),在小型手持设备上实现了从扫描输入图像到脱机OCR以及翻译显示的处理,并且完全独立于计算机工作.主要讨论了硬件和软件系统的构成,并重点阐速了软件的个别核心算法.实验系统工作稳定,平均识别率达到95%以上,并提供中英文共15万词汇量的翻译词典. 相似文献
6.
当前,已经有大量为单一字符集(或语种)而设计的OCR(optical character recognition)分类器.同时,随着全球一体化,多语文档的出现越来越普遍.因此,设计多语文档处理系统势在必行.提出了一般性的解决方案:两项OCR技术、一个系统和语言判断.为了使研究工作具体化,实现了一个中英文混合文章处理系统.其中主要涉及了3个关键问题:系统流程控制、汉英语言区域分离和英文字符切分.与以往的系统相比,该系统增加了汉英语言区域分离模块,并将基于等间距性的新方法应用于该模块.为了验证本系统的有效性,综合以往的方法实现了另一个系统.实验结果表明,该系统的性能明显优于另一个系统,在杂志样和书籍样上的识别率分别从98.48%和98.68%提高到99.13%和99.25%. 相似文献
7.
许多自然场景图像中都包含丰富的文本,它们对于场景理解有着重要的作用。随着移动互联网技术的飞速发展,许多新的应用场景都需要利用这些文本信息,例如招牌识别和自动驾驶等。因此,自然场景文本的分析与处理也越来越成为计算机视觉领域的研究热点之一,该任务主要包括文本检测与识别。传统的文本检测和识别方法依赖于人工设计的特征和规则,且模型设计复杂、效率低、泛化性能差。随着深度学习的发展,自然场景文本检测、自然场景文本识别以及端到端的自然场景文本检测与识别都取得了突破性的进展,其性能和效率都得到了显著提高。本文介绍了该领域相关的研究背景,对基于深度学习的自然场景文本检测、识别以及端到端自然场景文本检测与识别的方法进行整理分类、归纳和总结,阐述了各类方法的基本思想和优缺点。并针对隶属于不同类别下的方法,进一步论述和分析这些主要模型的算法流程、适用场景和技术发展路线。此外,列举说明了部分主流公开数据集,对比了各个模型方法在代表性数据集上的性能情况。最后总结了目前不同场景数据下的自然场景文本检测、识别及端到端自然场景文本检测与识别算法的局限性以及未来的挑战和发展趋势。 相似文献
8.
建立了相邻字符区域的高斯混合模型,用于区分字符与非字符.在此基础上,提出了一种从图像中提取多语种文本的方法.首先对输入图像进行二值化,并执行形态学闭运算,使二值图像中每个字符成为一个单独的连通成分.然后根据各连通成分重心的Voronoi区域,形成连通成分之间的邻接关系;最后在贝叶斯框架下,基于相邻字符区域的高斯混合模型计算相应的伪概率,以此为判据将每个连通成分标注为字符或非字符.利用所提出的文本提取方法,进行了复杂中英文文本的提取实验,获得大于97%的准确率和大于80%的召回率,证实了方法的有效性. 相似文献
9.
Yolanda Villacampa-Esteve M. A. Castro-Lopez Josep Luis Uso-Domenech Patricia Sastre 《控制论与系统》2013,44(3):189-201
In this paper the authors develop a dialectical logic of complex system notions within a mathematical linguistic theory of models. In the set of notions defined in a system, it is considered an order relationship and the Boole Algebra of the notions. This study obtains a tool, which is the metatheoretical base of such theory. The study of the complex systems as well as their modelling allows us to accomplish their analysis in the context of mathematical linguistics (Villacampa and Usó-Domènech 1999). The mathematics modelling determine texts - models and their study from a text theory (Villacampa et al. 1999). These theories imply the existence of a problem that must be studied in terms of classic logic: extension/ comprehension. These opposite categories form an entity within the same text/model and they are studied by the dialectical logic. The bases of the Dialectical Logic are necessary in the study of the Systems, since the dialectics formulate how the phenomena of the reality of the system should be studied as a means to examine any object, or system, which allows the perception of the Essence, or real nature. It will be necessary to consider the development and the changes of the system, and the system must be defined without antagonism between the dialectical and formal logic. 相似文献
10.
表格框线检测是表格识别的基础.现有的表格框线检测算法或者速度慢,或者鲁棒性差,而且没有充分利用表格框线之间的约束信息.提出了一种基于所定义的图像结构基元\"有向单连通链\"的自底向上表格框线检测算法.在此算法中,有向单连通链是一种黑像素游程序列,作为非常合适的矢量基元,在引入一定表格框线约束信息的条件下合并单连通链,有效地去除伪框线,补全断裂的框线,提高了算法的鲁棒性,可以准确而快速地提取表格框线.通过滤除噪声单连通链,加快单连通链的合并速度,算法速度提高了3~10倍,满足了实用要求.实验证明,该算法具有速度 相似文献
11.
根据边缘点的位置和颜色信息采取逐步松弛的聚类方法将图像分割成像素子集,应用文本区域边缘的分布特征提取初始文本区,并进行边界扩展得到完整的文本区域;同时给出了一种文本区域二值化方法,减少了在文本颜色极性未知时的二值图像个数,可提高字符分割等后续处理的计算效率.实验结果表明,该方法对文本区域提取是有效的,提取完整率达99%. 相似文献
12.
Bor-Shenn Jeng Tung-Ming Shieh Char-Shin Miou Chun-Jen Lee Bing-Shan Chien Yu-Hen Hu Gan-How Chang 《Image and vision computing》1995,13(10):745-754
In this paper, we present a new automated Chinese printed document entry system. This system features automated text/ graph segmentation, and multi-font, multi-size printed Chinese character recognition. Experimental results show that 95.8–99.4% of the top 10 printed characters can be correctly recognized, with the speed of 0.16 seconds/character. 相似文献
13.
一种stroke滤波器文字分割算法 总被引:1,自引:0,他引:1
为解决复杂背景中准确地进行文字分割的问题,提出了一种应用stroke滤波器进行文本分割的新方法。首先进行stroke滤波器的合理设计,并应用所设计的stroke滤波器来判别文本的彩色极性,得到初次分割的二值图。然后进行基于区域生长的文字分割。最后,应用OCR(optical character recognition)模块提高文本分割的整体性能。将提出的算法与其他算法进行了比较,结果表明,所提算法更为有效。 相似文献
14.
15.
针对新闻视频帧中文本区域的定位提取问题,提出了一种有效的字幕定位提取方法。通过灰度差分和变异灰度直方图对新闻视频帧字幕区域定位,再经改进的二维最大熵阈值方法对分割出的文字区域进行二值化,得到可识别的文字图片。最后对文本定位和OCR识别情况进行了算法对比。实验表明:与传统的投影法和最大熵方法相比,该方法可有效地提高文本定位的查全率和OCR的识别率。 相似文献
16.
In this paper, a robust, connected-component-based character locating method is presented. It is an important part of an optical character recognition (OCR) system. Color clustering is used to separate the color image into homogeneous color layers. Next, for each color layer, every connected component in color layers is analyzed using black adjacency graph (BAG), and the component-bounding box is computed. Then, for coarse detection of characters, an aligning-and-merging-analysis (AMA) scheme is proposed to locate all the potential characters using the information about the bounding boxes of connected components in all color layers. Finally, to eliminate false characters, a four-step identification of characters is used. The experimental results in this paper have proven that the method is effective. 相似文献
17.
拍摄或扫描图书文档时,所获得的页面图像会有不同程度的扭曲形变,这不仅影响美观或视觉效果,而且影响其深层处理,如OCR(Optical Character Recognition)字符识别。为解决上述问题,提出一种改进的基于模型的扭曲页面校正算法。首先对输入图像进行转正预处理,并通过以图像梯度信息确定阈值的二值化方法去除页面的灰色背景,然后利用简易的直线结构光提取页面文字行点集,由点集中心点曲线构建柱面模型进行页面校正。实验表明该方法能适应更多不同的页面扭曲类型,校正和去背景处理效果好、效率高,可显著提高OCR识别率,而且系统结构简便,容易实现。 相似文献
18.
考古出土的青铜器铭文是非常宝贵的文字材料,准确、快速地了解其释义和字形演变源流对考古学、历史学和语言学研究均有重要意义.青铜器铭文的辨识需要综合文字的形、音、义进行研究,其中第一步也是最重要的一步就是分析文字的形体特征.本文提出一种基于两阶段特征映射的神经网络模型来提取每个文字的形体特征,最后对比目前已知的文字研究成果,如《古文字类编》、《说文解字》,得出识别的结果.通过定性和定量的实验分析,我们发现本文提出的方法可达到较高的识别精度.特别地,在前10个预测类别中(Top-10)准确率达到了94.2%,大幅缩小了考古研究者的搜索推测空间,提高了青铜铭文识别的效率和准确性. 相似文献
19.
目的 视觉富文档信息抽取致力于将输入文档图像中的关键文字信息进行结构化提取,以解决实际业务问题,财务票据是其中一种常见的数据类型。解决该类问题通常需要应用光学字符识别(optical character recognition,OCR)和信息抽取等多个领域的技术。然而,目前公开的相关数据集的数量较少,且每个数据集中包含的图像数量也较少,这都成为了制约该领域技术发展的一个重要因素。为此,本文收集、标注并公开发布了一个真实中文扫描票据数据集SCID(scanned Chinese invoice dataset),包含6类常见财务票据,共40 716幅图像。方法 该数据集提供了用于OCR任务和信息抽取的两种标签。针对该数据集,本文提出一个基于LayoutLM v2(layout languagemodel v2)的基线方案,实现了从输入图像到最终结果的端到端推理。基于该数据集承办的CSIG(China Society ofImage and Graphics)2022票据识别与分析挑战赛,吸引了大量科研人员参与,并提出了优秀的解决方案。结果 在基线方案实验中,分别验证了使用OCR引擎推理、OCR模型精调和OCR真值3种设定的实验结果,F1值分别为0.768 7、0.857 0和0.985 7,一方面证明了LayoutLM v2模型的有效性;另一方面证明了该场景下OCR的挑战性。结论 本文提出的扫描票据数据集SCID展示了真实OCR技术应用场景的多项挑战,可以为文档富视觉信息抽取相关技术领域研发和技术落地提供重要数据支持。该数据集下载网址:https://davar-lab.github.io/dataset/scid.html。 相似文献
20.
Eduardo Torres Schumann Klaus U. Schulz 《International Journal on Document Analysis and Recognition》2006,8(1):1-99999
A significant portion of currently available documents exist in the form of images, for instance, as scanned documents. Electronic documents produced by scanning and OCR software contain recognition errors. This paper uses an automatic approach to examine the selection and the effectiveness of searching techniques for possible erroneous terms for query expansion. The proposed method consists of two basic steps. In the first step, confused characters in erroneous words are located and editing operations are applied to create a collection of erroneous error-grams in the basic unit of the model. The second step uses query terms and error-grams to generate additional query terms, identify appropriate matching terms, and determine the degree of relevance of retrieved document images to the user's query, based on a vector space IR model. The proposed approach has been trained on 979 document images to construct about 2,822 error-grams and tested on 100 scanned Web pages, 200 advertisements and manuals, and 700 degraded images. The performance of our method is evaluated experimentally by determining retrieval effectiveness with respect to recall and precision. The results obtained show its effectiveness and indicate an improvement over standard methods such as vectorial systems without expanded query and 3-gram overlapping.
Youssef Fataicha received his B.Sc. degree from Université de Rennes1, Rennes, France, in 1982. In 1984 he obtained his M.Sc. in computer science from Université de Rennes1, France. Between 1984 and 1986 he was a lecturer at the Université de Rennes1, France. He then served as engineer, from 1987 to 2000, at {Office de l'eau potable et de l'électricité} in Morocco. Since 2001 has been a Ph.D. student at the {école de Technologie Supérieure de l'Université du Québec} in Montreal, Québec, Canada. His research interests include pattern recognition, information retrieval, and image analysis.
Mohamed Cheriet received his B.Eng. in computer science from {Université des Sciences et de Technologie d'Alger} (Bab Ezouar, Algiers) in 1984 and his M.Sc. and Ph.D., also in computer science, from the University of Pierre et Marie Curie (Paris VI) in 1985 and 1988, respectively. Dr. Cheriet was appointed assistant professor in 1992, associate professor in 1995, and full professor in 1998 in the Department of Automation Engineering, {école de Technologie Supérieure} of the University of Québec, Montreal. Currently he is the director of LIVIA, the Laboratory for Imagery, Vision and Artificial Intelligence at ETS, and an active member of CENPARMI, the Centre for Pattern Recognition and Machine Intelligence. Professor Cheriet's research focuses on mathematical modeling for signal and image processing (scale-space, PDEs, and variational methods), pattern recognition, character recognition, text processing, document analysis and recognition, and perception. He has published more than 100 technical papers in these fields. He was the co-chair of the 11th and the 13th Vision Interface Conferences held respectively in Vancouver in 1998 and in Montreal in 2000. He was also the general co-chair of the 8th International Workshop on Frontiers on Handwriting Recognition held in Niagara-on-the-Lake in 2002. He has served as associate editor of the International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) since 2000. Dr. Cheriet is a senior member of IEEE.
Jian Yun Nie is a professor in the computer science department (DIRO), Université de Montreal, Québec, Canada. His research focuses on problems related to information retrieval, including multilingual and multimedia information retrieval, as well as natural language processing.
Ching Y. Suen received his M.Sc. (Eng.) from the University of Hong Kong and Ph.D. from the University of British Columbia, Canada. In 1972 he joined the Department of Computer Science of Concordia University, where he became professor in 1979 and served as chairman from 1980 to 1984 and as associate dean for research of the Faculty of Engineering and Computer Science from 1993 to 1997. He has guided/hosted 65 visiting scientists and professors and supervised 60 doctoral and master's graduates. Currently he holds the distinguished Concordia Research Chair in Artificial Intelligence and Pattern Recognition and is the Director of CENPARMI, the Centre for Pattern Recognition and Machine Intelligence.Professor Suen is the author/editor of 11 books and more than 400 papers on subjects ranging from computer vision and handwriting recognition to expert systems and computational linguistics. A Google search on “Ching Y. Suen” will show some of his publications. He is the founder of the International Journal of Computer Processing of Oriental Languages and served as its first editor-in-chief for 10 years. Presently he is an associate editor of several journals related to pattern recognition.A fellow of the IEEE, IAPR, and the Academy of Sciences of the Royal Society of Canada, he has served several professional societies as president, vice-president, or governor. He is also the founder and chair of several conference series including ICDAR, IWFHR, and VI. He has been the general chair of numerous international conferences, including the International Conference on Computer Processing of Chinese and Oriental Languages in August 1988 held in Toronto, International Conference on Document Analysis and Recognition held in Montreal in August 1995, and the International Conference on Pattern Recognition held in Québec City in August 2002.Dr. Suen has given 150 seminars at major computer companies and various government and academic institutions around the world. He has been the principal investigator of 25 industrial/government research contracts and is a grant holder and recipient of prestigious awards, including the ITAC/NSERC award from the Information Technology Association of Canada and the Natural Sciences and Engineering Research Council of Canada in 1992 and the Concordia “Research Fellow” award in 1998. 相似文献