期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘峰汪斌李向丰胡福乔《计算机应用与软件》2005,22(2):76-78,94

文本是计算机视觉的许多应用中的一项重要特征。大量复杂图像文本的应用,使图像文本分析技术成为研究的新方向。图像文本分析技术和通常的文档图像分析技术之间有着紧密的联系;但是图像文本其自身所具有的特性,又使得图像文本分析技术具有不同于一般的文档图像分析的更加丰富的内容。我们将图像文本分析技术划分为三大组成部分：图像文本定位、图像文本的预处理和图像文本的识别进行讨论。最后,本文也对图像文本分析技术的应用进行了讨论。相似文献

2.

Text information extraction in images and video: a survey

Keechul Jung Author Vitae Kwang In Kim Author Vitae Author Vitae 《Pattern recognition》2004,37(5):977-997

Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research. 相似文献

3.

Integrating natural language understanding with document structure analysis

Suzanne Liebowitz Taylor Deborah A. Dahl Mark Lipshutz Carl Weir Lewis M. Norton Roslyn Weidner Nilson Marcia C. Linebarger 《Artificial Intelligence Review》1994,8(2-3):255-276

Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDUS: text retrieval and automatic generation of hypertext links 相似文献

4.

图象和视频的检索技术 总被引：10，自引：0，他引：10

樊凌涛陈健《计算机工程与应用》2001,37(9):71-77

随着网络技术的发展,多媒体数据将成为网络服务的主要内容,因此对多媒体数据管理问题的研究成为近几年的热点。由于媒体信息表现性质的不同,传统关系数据库的检索方式不再适用于图象和视频,因此,必须采用基于自身内容的检索方式。文章对基于内容的图象和视频检索技术分不同层次进行了全面的总结,内容包括依据基本特征,色彩、纹理、形状、和位置关系的技术,视频的场景分割、关键帧提取技术以及基于声音、文字的检索技术等,并阐述了各种方法的优缺点,现状及发展方向。相似文献

5.

Multimedia Clip Generation From Documents for Browsing on Mobile Devices

Erol B. Berkner K. Joshi S. 《Multimedia, IEEE Transactions on》2008,10(5):711-723

Small displays on mobile handheld devices, such as personal digital assistants (PDAs) and cellular phones, are the bottlenecks for usability of most content browsing applications. Generally, conventional content such as documents and Web pages need to be modified for effective presentation on mobile devices. This paper proposes a novel visualization for documents, called multimedia thumbnails, which consists of text and image content converted into playable multimedia clips. A multimedia thumbnail utilizes visual and audio channels of small portable devices as well as both spatial and time dimensions to communicate text and image information of a single document. The proposed algorithm for generating multimedia thumbnails includes 1) a semantic document analysis step, where salient content from a source document is extracted; 2) an optimization step, where a subset of this extracted content is selected based on time, display, and application constraints; and 3) a composition step, where the selected visual and audible document content is combined into a multimedia thumbnail. Scalability of MMNails that allows generation of multimedia clips of various lengths is also described. A user study is presented that evaluates the effectiveness of the proposed multimedia thumbnail visualization. 相似文献

6.

接触式图像传感器应用于表格文档信息处理

刘建胜汪同庆王贵新居琰彭健《传感器与微系统》2002,21(5):51-54

表格文档在日常生活中运用十分广泛 ,它应用于人口普查、银行票据、各类报表等领域 ,对这类文档进行计算机自动处理具有重要的现实意义。表格文档信息处理系统主要由文档原始图像获取、文档结构提取和填写信息识别等部分组成。在分析了国内外表格文档信息自动录入系统的优缺点后 ,采用一种基于接触式图像传感器 (CIS)摄取表格文档的原始图像信号 ,利用硬件获得了高质量的图像信号。采用光学字符识别 (OCR)技术对填写的表格文档信息进行识别。该表格文档信息处理系统具有对表格文档的纸张和填写的要求低和识别准确度高的特点。相似文献

7.

视频图像文字检测综述

下载免费PDF全文

周东傲林嘉宇《计算机工程与科学》2015,37(4):760-764

自动从视频图像中提取文字信息,对于监控视频图像内容、添加视频标签和建立视频图像检索系统,有重要的意义。文字检测是文字信息提取系统的前端,是文字信息提取中最关键的一步。近年来,视频图像文字信息检测领域有了新的重要的发展,综述从基于区域和基于纹理的文字检测方法进行归纳、比较和分析,概括了近年来文字检测技术的主要进展。此外,为了突出综合性方法的重要性,对其专门进行了总结。最后对视频图像中的文字检测技术的难点进行总结,并对其发展趋势进行展望。相似文献

8.

Text line extraction from multi-skewed handwritten documents

S. Basu Author VitaeAuthor Vitae M. Kundu Author Vitae Author Vitae D.K. Basu Author Vitae 《Pattern recognition》2007,40(6):1825-1839

A novel text line extraction technique is presented for multi-skewed document images of handwritten English or Bengali text. It assumes that hypothetical water flows, from both left and right sides of the image frame, face obstruction from characters of text lines. The stripes of areas left unwetted on the image frame are finally labelled for extraction of text lines. The success rate of the technique, as observed experimentally, are 90.34% and 91.44% for handwritten Bengali and English document images, respectively. The work may contribute significantly for the development of applications related to optical character recognition of Bengali/English text. 相似文献

9.

基于视频的智能交通研究

李洪敏《计算机光盘软件与应用》2012,(2):154-155

现在社会的发展很快,城市交通容量的不断增加,车辆数目的大幅增加,基于视频的交通监控系统就迅速发展起来。它是一种将视频图像处理技术与模式识别相结合的技术。由于传统的视频是比较难实现的。智能交通通过对视频数据中对所包含的视觉内容信息进行了自动研究和分析以及特征的提取,这样人们就可以直接利用计算机视频技术搜索寻找相应的信息。在计算机的视频中采用了图像处理、计算机视觉、模式识别等技术处理计算机的视频图像。在研究过程中视频交通的研究意义很重大主要是是及时准确地掌握所监视路口和交通治安情况等,为指挥人员提供迅速直观的指导信息从而做出准确判断并及时响应。相似文献

10.

基于MODI的文档图像处理的研究

顾李晶赵霁《自动化技术与应用》2013,(11):45-47,66

识别文档图像中的文字,有助于人们管理和使用信息.MODI作为Microsoft Office内建的免费文字识别组件,使开发人员可以方便地,以较低的成本处理文档图像.本文通过研究MODI组件的OCR模块的特点和二次开发,以及与其他商业OCR软件的对比,验证了MODI在文档图像处理方面具有较高的可靠性和应用价值. 相似文献

11.

一种新的彩色图象文字提取算法 总被引：3，自引：0，他引：3

刘文萍付晓玲赵会群李晓丽《计算机工程与应用》2005,41(21):79-82

文字信息在描述图象内容时起着重要的作用,因此文字提取及识别是基于内容视频检索的关键技术。提出了一个从彩色图象背景中提取文字的快速而有效的算法。由于文本字符串的对比度较高,首先用一个改进的sobel算子将彩色图象变换为二值的边缘图象,再对该边缘图象进行涂抹处理,然后基于候选文本区的特征从不同复杂度的彩色图象中提取文本信息,最后将提取出的文本输入到文字识别(OCR)引擎,识别结果证明了此方法的有效性。相似文献

12.

Extraction of Binary Character/Graphics Images from Grayscale Document Images

《CVGIP: Graphical Models and Image Processing》1993,55(3):203-217

The extraction of binary character/graphics images from gray-scale document images with background pictures, shadows, highlight, smear, and smudge is a common critical image processing operation, particularly for document image analysis, optical character recognition, check image processing, image transmission, and videoconferencing. After a brief review of previous work with emphasis on five published extraction techniques, viz., a global thresholding technique, YDH technique, a nonlinear adaptive technique, an integrated function technique, and a local contrast technique, this paper presents two new extraction techniques: a logical level technique and a mask-based subtraction technique. With experiments on images of a typical check and a poor-quality text document, this paper systematically evaluates and analyses both new and published techniques with respect to six aspects, viz., speed, memory requirement, stroke width restriction, parameter number, parameter setting, and human subjective evaluation of result images. Experiments and evaluations have shown that one new technique is superior to the rest, suggesting its suitability for high-speed low-cost applications. 相似文献

13.

Error tolerant document structure analysis

Bertin Klein Peter Fankhauser 《International Journal on Digital Libraries》1998,1(4):344-357

Successful applications of digital libraries require structured access to sources of information. This paper presents an approach to extract the logical structure of text documents. The extracted structure is explicated by means of SGML (Standard Generalized Markup Language). Consequently, the extraction is achieved on the basis of grammars that extend SGML with recognition rules. From these grammars parsing automata are generated. These automata are used to partition a flat text document into its elements, to discard formatting information, and to insert SGML markups. Complex document structures and fallback rules needed for error tolerant parsing make such automata highly ambiguous. A novel parsing strategy has been developed that ranks and prunes ambiguous parsing paths. 相似文献

14.

Document cleanup using page frame detection

Faisal Shafait Joost van Beusekom Daniel Keysers Thomas M. Breuel 《International Journal on Document Analysis and Recognition》2008,11(2):81-96

When a page of a book is scanned or photocopied, textual noise (extraneous symbols from the neighboring page) and/or non-textual noise (black borders, speckles, ...) appear along the border of the document. Existing document analysis methods can handle non-textual noise reasonably well, whereas textual noise still presents a major issue for document analysis systems. Textual noise may result in undesired text in optical character recognition (OCR) output that needs to be removed afterwards. Existing document cleanup methods try to explicitly detect and remove marginal noise. This paper presents a new perspective for document image cleanup by detecting the page frame of the document. The goal of page frame detection is to find the actual page contents area, ignoring marginal noise along the page border. We use a geometric matching algorithm to find the optimal page frame of structured documents (journal articles, books, magazines) by exploiting their text alignment property. We evaluate the algorithm on the UW-III database. The results show that the error rates are below 4% each of the performance measures used. Further tests were run on a dataset of magazine pages and on a set of camera captured document images. To demonstrate the benefits of using page frame detection in practical applications, we choose OCR and layout-based document image retrieval as sample applications. Experiments using a commercial OCR system show that by removing characters outside the computed page frame, the OCR error rate is reduced from 4.3 to 1.7% on the UW-III dataset. The use of page frame detection in layout-based document image retrieval application decreases the retrieval error rates by 30%. 相似文献

15.

基于增强现实的英语视听说移动教学软件设计与实现

黄敏兰红《计算机与现代化》2018,(3):122

由于当前大多数笔记本电脑不再配置光驱,但英语教材依然以光盘形式提供视频资料,影响学习效果,因此采用Unity3D集成Vuforia SDK设计实现基于增强现实技术的英语视听说移动教学软件VBook。系统首先构建识别图数据库存于云端,根据识别图名称命名对应的视频文件;然后利用Unity3D设计和渲染场景,设计ImageTarget对象的虚拟视频播放按钮,编写脚本代码实现对识别图数据库及其相应视频的访问;最后生成便于用户使用的移动端应用。用户只需将Camera镜头对准书本插图,即可呈现出虚实叠加的视觉效果,实现移动设备的英语教学视频播放。将增强现实技术应用于英语视频教学,能使用户享受到新颖的学习方法和虚实结合的交互体验。相似文献

16.

AUTOMATIC TEXT LOCATION IN IMAGES AND VIDEO FRAMES

ANIL K. JAIN BIN YU 《Pattern recognition》1998,31(12):2055-2076

Textual data is very important in a number of applications such as image database indexing and document understanding. The goal of automatic text location without character recognition capabilities is to extract image regions that contain only text. These regions can then be either fed to an optical character recognition module or highlighted for a user. Text location is a very difficult problem because the characters in text can vary in font, size, spacing, alignment, orientation, color and texture. Further, characters are often embedded in a complex background in the image. We propose a new text location algorithm that is suitable in a number of applications, including conversion of newspaper advertisements from paper documents to their electronic versions, World Wide Web search, color image indexing and video indexing. In many of these applications, it is not necessary to extract all the text, so we emphasize on extracting important text with large size and high contrast. Our algorithm is very fast and has been shown to be successful in extracting important text in a large number of test images. 相似文献

17.

Video OCR: indexing digital news libraries by recognition of superimposed captions 总被引：4，自引：0，他引：4

Toshio Sato Takeo Kanade Ellen K. Hughes Michael A. Smith Shin'ichi Satoh 《Multimedia Systems》1999,7(5):385-395

The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader), which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multi-frame integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method, and intermediate character recognition results are used to improve the segmentation. We also include a method for locating text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content. 相似文献

18.

A scalable algorithm for extraction and clustering of event-related pictures

Massimiliano Ruocco Heri Ramampiaro 《Multimedia Tools and Applications》2014,70(1):55-88

The event detection problem, which is closely related to clustering, has gained a lot of attentions within event detection for textual documents. However, although image clustering is a problem that has been treated extensively in both Content-Based Image Retrieval (CBIR) and Text-Based Image Retrieval (TBIR) systems, event detection within image management is a relatively new area. Having this in mind, we propose a novel approach for event extraction and clustering of images, taking into account textual annotations, time and geographical positions. Our goal is to develop a clustering method based on the fact that an image may belong to an event cluster. Here, we stress the necessity of having an event clustering and cluster extraction algorithm that are both scalable and allow online applications. To achieve this, we extend a well-known clustering algorithm called Suffix Tree Clustering (STC), originally developed to cluster text documents using document snippets. The idea is that we consider an image along with its annotation as a document. Further, we extend it to also include time and geographical position so that we can capture the contextual information from each image during the clustering process. This has appeared to be particularly useful on images gathered from online photo-sharing applications such as Flickr. Hence, our STC-based approach is aimed at dealing with the challenges induced by capturing contextual information from Flickr images and extracting related events. We evaluate our algorithm using different annotated datasets mainly gathered from Flickr. As part of this evaluation we investigate the effects of using different parameters, such as time and space granularities, and compare these effects. In addition, we evaluate the performance of our algorithm with respect to mining events from image collections. Our experimental results clearly demonstrate the effectiveness of our STC-based algorithm in extracting and clustering events. 相似文献

19.

Logo and seal based administrative document image retrieval: A survey

《Computer Science Review》2016

With the advance of technology, business offices and organizations together with their clients create a massive amount of administrative documents every day. Administrative documents commonly contain some salient entities such as logos, stamps or seals as the means of their authentication and proprietorship. These salient entities provide quite discriminative information, which can effectively be used for different tasks of document image retrieval, classification and recognition in document-based applications. Thus, proper detection/recognition of these entities in document images increases the performance of such applications in terms of document retrieval, classification, and recognition. To present the state-of-the-art research on the retrieval of administrative document images, this paper deals with a survey of administrative document image retrieval in relation to seals and logos. All the available datasets, feature extraction and classification techniques for logo and seal detection/recognition are discussed systematically. The shortcomings of the present technologies on logo and seal based document processing are also highlighted. Avenues of the future works are further given for the benefit of readers. To the best of authors’ knowledge, there is no survey on administrative document image retrieval and hence the authors hope that this work will be helpful to the researchers of the document analysis community. 相似文献

20.

Integrating computer vision with web‐based knowledge for medical diagnostic assistance

Aviv Segev 《Expert Systems》2010,27(4):247-258

Abstract: The analysis of medical documents necessitates context recognition for diverse purposes such as classification, performance analysis and decision making. Traditional methods of context recognition have focused on the textual part of documents. Images, however, provide a rich source of information that can support the context recognition process. A method is proposed for integrating computer vision in context recognition using the web as a knowledge base. The method is implemented on medical case studies to determine the main symptoms or achieve possible diagnoses. In experiments the method for integrating computer vision in context recognition achieves better results than term frequency and inverse document frequency and only context recognition. The proposed method can serve as a basis for an image and text based decision support system to assist the physician in reviewing medical records. 相似文献