首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
为实现基于关键词的维吾尔文文档图像检索,提出一种基于由粗到细层级匹配的关键词文档图像检索方法。使用改进的投影切分法将经过预处理的文档图像切分成单词图像库,使用模板匹配对关键词进行粗匹配;在粗匹配的基础上,提取单词图像的方向梯度直方图(HOG)特征向量;通过支持向量机(SVM)分类器学习特征向量,实现关键词图像检索。在包含108张文档图像的数据库中进行实验,实验结果表明,检索准确率平均值为91.14%,召回率平均值为79.31%,该方法能有效实现基于关键词的维吾尔文文档图像检索。  相似文献   

2.
介绍了一种基于版面结构距离的文档图像检索算法,使用版面特征作为文档图像的特征检索图像.先将文档图像进行梯度和最大梯度差(MGD)计算,然后使用MGD值作为一个窗口对文本区域进行融合,将文档图像以行线的形式标示出来.同时给出了检索的匹配方法,并对匹配方法进行了实验.实验结果表明,该检索方法具有较高的查准率,具有很好的抗倾斜和抗缩放效果.  相似文献   

3.
针对图像局部特征的词袋模型(Bag-of-Word,BOW)检索研究中聚类中心的不确定性和计算复杂性问题,提出一种由不同种类的距离进行相似程度测量的检索和由匹配点数来检索的方法。这种方法首先需要改进文档图像的SURF特征,有效降低特征提取复杂度;其次,对FAST+SURF特征实现FLANN双向匹配与KD-Tree+BBF匹配,在不同变换条件下验证特征鲁棒性;最后,基于这两种检索方法对已收集整理好的各类维吾尔文文档图像数据库进行检索。实验结果表明:基于距离的相似性度量复杂度次于基于匹配数目的检索,而且两种检索策略都能满足快速、精确查找需求。  相似文献   

4.
提出一种融合模糊语义概念和精细视觉特征的纹理图像检索方法.首先根据语言表达式和模糊语义概念对整个图像库进行快速有效的粗搜索,得到具有“软”边界的语义检索结果;然后根据视觉特征在语义检索结果中(而不是整个图像库)进行精细的检索.该方法很好地结合了基于内容的图像检索和基于语义的图像检索两者的优势,使得用户既可以根据语义概念对图像库进行快速浏览和检索,也能根据查询用例图像的视觉特征进行精细的匹配;另外,由粗到细的二阶段策略也明显地提高了其检索性能.在Brodatz纹理库中的实验结果表明,通过调整合适的语义检索边界,该方法的检索性能明显优于基于视觉特征的图像检索方法.  相似文献   

5.
图像语义的图形化标注和检索研究   总被引:1,自引:0,他引:1  
基于图像语义进行检索的目的是希望能够更好地从用户的角度出发,查找出与用户理解相一致的图像。针对目前图像语义检索过程中存在的问题,提出一个基于对象的图像语义内容标注模型和检索框架。首先利用分割算法获取图像中的语义对象区域,然后以MPEG-7标准中的语义描述方案为基础,利用图形化结构实现图像语义内容的标注。在检索过程中,用户把查询内容转化为图形化描述结构,通过提取该描述图的不同长度的路径信息形成查询文档,与图像库中的图像语义标注文档进行匹配实现图像检索。实验结果表明,提出的方法能够有效地实现基于语义的图像标注和检索,与全文检索相比,有较高的查全率和查准率。  相似文献   

6.
基于形状和空间结构的商标图像检索方法   总被引:6,自引:0,他引:6  
本文提出一种基于单元子图像形状和空间结构的多级商标图像检索算法,根据单元子图像特征相似性对商标图像进行粗检索,然后对结果图像的空间结构用位置字符串匹配的方法进行分析。实验结果表明,本文提出的方法是有效的。  相似文献   

7.
结合流形排序和区域匹配的图像检索   总被引:1,自引:0,他引:1  
给出一种基于数据流形排序(Manifold Ranking)和分割区域匹配的图像检索方法.在Manifold Ranking方法的基础上,提出区域匹配图(Region Matching Graph,RMG)的方法,通过计算图像的区域匹配权值,进行第二次相似性匹配,提高了匹配准确性.在Corel图像数据库对该方法进行了检索仿真,结果表明该方法能有效提高检索的准确性.  相似文献   

8.
一种基于关键词的中文文档图像检索方法   总被引:1,自引:0,他引:1  
本文提出了一种基于关键词的中文文档图像检索方法,能在不经OCR(Optical Character Recognition)识别的情况下,直接利用中文字符的图像特征进行关键词检索。首先将文档图像分割成单个中文字符图像,接着对字符图像进行汉字笔画的特征数据提取,然后在特征数据间进行基于WMHD(Weighted Modified Hausdorff Distance)的相似性测量。该方法不受字号的影响,也有一定的抗字体能力,实验证明其具有较高的检索效果。  相似文献   

9.
基于投影的文档图像倾斜校正方法   总被引:5,自引:0,他引:5       下载免费PDF全文
针对文档图像的倾斜校正问题,提出了一种新的基于投影的文档图像倾斜角检测方法。首先采用一种高效的像素遍历算法对文档图像从不同角度进行投影,然后对投影数据进行累加求和,通过比较不同角度下的累加和来确定倾斜角度。该方法在投影过程中只需对文档图像进行极少部分投影,因而大大减少了运算量。基于该方法的特点,提出了由“粗”到“精”的投影策略,在确保检测精度的同时大幅提高了检测速度。实验结果表明,方法非常有效,可以获得很高的检测精度。  相似文献   

10.
沈学东 《计算机应用与软件》2007,24(11):156-158,221
在基于内容的图像检索中,颜色作为一种重要的图像视觉信息已得到了广泛的应用,它将图像颜色空间各分量的统计信息用直方图的形式反映了出来,具有简单、明了和抗干扰能力强等特点.提出了一种基于尺度空间理论的直方图定性匹配算法.该算法提取尺度空间中直方图的定性特征,然后根据规定的匹配度量公式得到直方图的匹配度量.实现了由粗到精逐层匹配直方图,逐渐检索到最匹配图像的方法,经实践检验取得了较理想的效果.  相似文献   

11.
Imaged document text retrieval without OCR   总被引:6,自引:0,他引:6  
We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the vertical traverse density (VTD) and horizontal traverse density (HTD), are extracted. An n-gram-based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from the UW1 (University of Washington 1) database confirms the validity of the proposed method  相似文献   

12.
Word searching in non-structural layout such as graphical documents is a difficult task due to arbitrary orientations of text words and the presence of graphical symbols. This paper presents an efficient approach for word searching in documents of non-structural layout using an efficient indexing and retrieval approach. The proposed indexing scheme stores spatial information of text characters of a document using a character spatial feature table (CSFT). The spatial feature of text component is derived from the neighbor component information. The character labeling of a multi-scaled and multi-oriented component is performed using support vector machines. For searching purpose, the positional information of characters is obtained from the query string by splitting it into possible combinations of character pairs. Each of these character pairs searches the position of corresponding text in document with the help of CSFT. Next, the searched text components are joined and formed into sequence by spatial information matching. String matching algorithm is performed to match the query word with the character pair sequence in documents. The experimental results are presented on two different datasets of graphical documents: maps dataset and seal/logo image dataset. The results show that the method is efficient to search query word from unconstrained document layouts of arbitrary orientation.  相似文献   

13.
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution. Received March 30, 2000 / Revised June 26, 2001  相似文献   

14.
15.
This paper considers the use of text signatures, fixed-length bit string representations of document content, in an experimental information retrieval system: such signatures may be generated from the list of keywords characterising a document or a query. A file of documents may be searched in a bit-serial parallel computer, such as the ICL Distributed Array Processor, using a two-level retrieval strategy in which a comparison of a query signature with the file of document signatures provides a simple and efficient means of identifying those few documents that need to undergo a computationally demanding, character matching search. Text retrieval experiments using three large collections of documents and queries demonstrate the efficiency of the suggested approach.  相似文献   

16.
研究LeNet-5在扫描文档中手写体日期字符识别的应用,由于文档扫描的过程中会引入各种噪声,特别是光照和颜色干扰,直接使用LeNet-5算法不能取得较好效果。先在整份文档中对特定待识别字符的进行定位和划分,并对划分出的字符图像进行去噪、灰度化和二值化处理等预处理,接着将字符图像分割成一个个单个字符,然后在LeNet-5网络基础上结合模型匹配法实现对手写体日期字符的识别。分析在不同参数组合下的识别效果,调整算法模型参数有效地提升了模型对于实际对象的性能,实现出一种能够对手写体日期字符集实现较好识别效果的算法。实验结果表明了算法的有效性,并应用于具体工程实践。  相似文献   

17.
Information retrieval in document image databases   总被引:2,自引:0,他引:2  
With the rising popularity and importance of document images as an information source, information retrieval in document image databases has become a growing and challenging problem. In this paper, we propose an approach with the capability of matching partial word images to address two issues in document image retrieval: word spotting and similarity measurement between documents. First, each word image is represented by a primitive string. Then, an inexact string matching technique is utilized to measure the similarity between the two primitive strings generated from two word images. Based on the similarity, we can estimate how a word image is relevant to the other and, thereby, decide whether one is a portion of the other. To deal with various character fonts, we use a primitive string which is tolerant to serif and font differences to represent a word image. Using this technique of inexact string matching, our method is able to successfully handle the problem of heavily touching characters. Experimental results on a variety of document image databases confirm the feasibility, validity, and efficiency of our proposed approach in document image retrieval.  相似文献   

18.
基于语义网的电子政务文档智能检索   总被引:7,自引:0,他引:7  
杨芳  杨振山 《计算机应用》2005,25(10):2434-2435
根据电子政务文档的特点,通过电子政务主题词表计算检索文档集和检索请求的特征值。讨论了检索文档集和检索请求的相似性计算,从而找到与检索请求匹配的文档。根据电子政务文档元数据的语义组织形式,研究电子政务文档元数据的检索问题。对所检索到的文档进行元数据语义组织,从而在语义推理的基础上实现智能检索。  相似文献   

19.
Document Similarity Using a Phrase Indexing Graph Model   总被引:3,自引:1,他引:2  
Document clustering techniques mostly rely on single term analysis of text, such as the vector space model. To better capture the structure of documents, the underlying data model should be able to represent the phrases in the document as well as single terms. We present a novel data model, the Document Index Graph, which indexes Web documents based on phrases rather than on single terms only. The semistructured Web documents help in identifying potential phrases that when matched with other documents indicate strong similarity between the documents. The Document Index Graph captures this information, and finding significant matching phrases between documents becomes easy and efficient with such model. The model is flexible in that it could revert to a compact representation of the vector space model if we choose not to index phrases. However, using phrase indexing yields more accurate document similarity calculations. The similarity between documents is based on both single term weights and matching phrase weights. The combined similarities are used with standard document clustering techniques to test their effect on the clustering quality. Experimental results show that our phrase-based similarity, combined with single-term similarity measures, gives a more accurate measure of document similarity and thus significantly enhances Web document clustering quality.  相似文献   

20.
This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a word shape coding scheme is proposed, which converts each word image into a word shape code by using a few shape features. The text contents of imaged documents are thus captured by a document vector constructed with the converted word shape code and word frequency information. Similarities between different document images are then gauged based on the constructed document vectors. We divide the retrieval process into two stages. Based on the observation that documents of the same language share a large number of high-frequency language-specific stop words, the first stage retrieves documents with the same underlying language as that of the query document. The second stage then re-ranks the documents retrieved in the first stage based on the topic similarity. Experiments show that document images of different languages and topics can be retrieved properly by using the proposed word shape coding scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号