共查询到20条相似文献,搜索用时 0 毫秒
1.
Alajlan N Kamel MS Freeman GH 《IEEE transactions on pattern analysis and machine intelligence》2008,30(6):1003-1013
In this paper, a geometry-based image retrieval system is developed for multi-object images. We model both shape and topology of image objects using a structured representation called curvature tree (CT). The hierarchy of the CT reflects the inclusion relationships between the image objects. To facilitate shape-based matching, triangle-area representation (TAR) of each object is stored at the corresponding node in the CT. The similarity between two multi-object images is measured based on the maximum similarity subtree isomorphism (MSSI) between their CTs. For this purpose, we adopt a recursive algorithm to solve the MSSI problem and a very effective dynamic programming algorithm to measure the similarity between the attributed nodes. Our matching scheme agrees with many recent findings in psychology about the human perception of multi-object images. Experiments on a database of 13500 real and synthesized medical images and the MPEG-7 CE-1 database of 1400 shape images have shown the effectiveness of the proposed method. 相似文献
2.
Michael Edberg Hansen Author Vitae Jens Michael Carstensen Author Vitae 《Pattern recognition》2004,37(11):2155-2164
Many image classification problems can fruitfully be thought of as image retrieval in a “high similarity image database” (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce a method for HSID retrieval using a similarity measure based on a linear combination of Jeffreys-Matusita distances between distributions of local (pixelwise) features estimated from a set of automatically and consistently defined image regions. The weight coefficients are estimated based on optimal retrieval performance. Experimental results on the difficult task of visually identifying clones of fungal colonies grown in a petri dish and categorization of pelts show a high retrieval accuracy of the method when combined with standardized sample preparation and image acquisition. 相似文献
3.
Recent technological advances have made it possible to process and store large amounts of image data. Perhaps the most impressive example is the accumulation of image data in scientific applications such as medical or satellite imagery. However, in order to realize their full potential, tools for efficient extraction of information and for intelligent searches in image databases need to be developed. This paper describes a new approach to image data retrieval which allows queries to be composed of local intensity patterns. The intensity pattern is converted into a feature representation of reduced dimensionality which can be used for searching similar-looking patterns in the database. This representation is obtained by filtering the pattern with a bank of scale and orientation selective filters modeled using Gabor functions. Experimental results are presented which illustrate that the proposed representation preserves the perceptual similarities, and provides a powerful tool for content-based image retrieval. 相似文献
4.
5.
Virtual images for similarity retrieval in image databases 总被引:1,自引:0,他引:1
Petraglia G. Sebillo M. Tucci M. Tortora G. 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(6):951-967
We introduce the virtual image, an iconic index suited for pictorial information access in a pictorial database, and a similarity retrieval approach based on virtual images to perform content-based retrieval. A virtual image represents the spatial information contained in a real image in explicit form by means of a set of spatial relations. This is useful to efficiently compute the similarity between a query and an image in the database. We also show that virtual images support real-world applications that require translation, reflection, and/or rotation invariance of image representation 相似文献
6.
Clustering of related or similar objects has long been regarded as a potentially useful contribution of helping users to navigate an information space such as a document collection. Many clustering algorithms and techniques have been developed and implemented but as the sizes of document collections have grown these techniques have not been scaled to large collections because of their computational overhead. To solve this problem, the proposed system concentrates on an interactive text clustering methodology, probability based topic oriented and semi-supervised document clustering. Recently, as web and various documents contain both text and large number of images, the proposed system concentrates on content-based image retrieval (CBIR) for image clustering to give additional effect to the document clustering approach. It suggests two kinds of indexing keys, major colour sets (MCS) and distribution block signature (DBS) to prune away the irrelevant images to given query image. Major colour sets are related with colour information while distribution block signatures are related with spatial information. After successively applying these filters to a large database, only small amount of high potential candidates that are somewhat similar to that of query image are identified. Then, the system uses quad modelling method (QM) to set the initial weight of two-dimensional cells in query image according to each major colour and retrieve more similar images through similarity association function associated with the weights. The proposed system evaluates the system efficiency by implementing and testing the clustering results with Dbscan and K-means clustering algorithms. Experiment shows that the proposed document clustering algorithm performs with an average efficiency of 94.4% for various document categories. 相似文献
7.
为实现基于关键词的维吾尔文文档图像检索,提出一种基于由粗到细层级匹配的关键词文档图像检索方法。使用改进的投影切分法将经过预处理的文档图像切分成单词图像库,使用模板匹配对关键词进行粗匹配;在粗匹配的基础上,提取单词图像的方向梯度直方图(HOG)特征向量;通过支持向量机(SVM)分类器学习特征向量,实现关键词图像检索。在包含108张文档图像的数据库中进行实验,实验结果表明,检索准确率平均值为91.14%,召回率平均值为79.31%,该方法能有效实现基于关键词的维吾尔文文档图像检索。 相似文献
8.
9.
WALRUS: a similarity retrieval algorithm for image databases 总被引:2,自引:0,他引:2
Natsev A. Rajeev Rastogi Shim K. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(3):301-316
Approaches for content-based image querying typically extract a single signature from each image based on color, texture, or shape features. The images returned as the query result are then the ones whose signatures are closest to the signature of the query image. While efficient for simple images, such methods do not work well for complex scenes since they fail to retrieve images that match the query only partially, that is, only certain regions of the image match. This inefficiency leads to the discarding of images that may be semantically very similar to the query image since they may contain the same objects. The problem becomes even more apparent when we consider scaled or translated versions of the similar objects. We propose WALRUS (wavelet-based retrieval of user-specified scenes), a novel similarity retrieval algorithm that is robust to scaling and translation of objects within an image. WALRUS employs a novel similarity model in which each image is first decomposed into its regions and the similarity measure between a pair of images is then defined to be the fraction of the area of the two images covered by matching regions from the images. In order to extract regions for an image, WALRUS considers sliding windows of varying sizes and then clusters them based on the proximity of their signatures. An efficient dynamic programming algorithm is used to compute wavelet-based signatures for the sliding windows. Experimental results on real-life data sets corroborate the effectiveness of WALRUS'S similarity model. 相似文献
10.
11.
A similarity measure for silhouettes of 2D objects is presented, and its properties are analyzed with respect to retrieval of similar objects in image databases. To reduce influence of digitization noise as well as segmentation errors the shapes are simplified by a new process of digital curve evolution. To compute our similarity measure, we first establish the best possible correspondence of visual parts (without explicitly computing the visual parts). Then the similarity between corresponding parts is computed and summed. Experimental results show that our shape matching procedure gives an intuitive shape correspondence and is stable with respect to noise distortions. 相似文献
12.
Justine LebrunPhilippe-Henri Gosselin Sylvie Philipp-Foliguet 《Image and vision computing》2011,29(11):716-729
In the framework of online object retrieval with learning, we address the problem of graph matching using kernel functions. An image is represented by a graph of regions where the edges represent the spatial relationships. Kernels on graphs are built from kernel on walks in the graph. This paper firstly proposes new kernels on graphs and on walks, which are very efficient for graphs of regions. Secondly we propose fast solutions for exact or approximate computation of these kernels. Thirdly we show results for the retrieval of images containing a specific object with the help of very few examples and counter-examples in the framework of an active retrieval scheme. 相似文献
13.
With rapidly decreasing storage costs, temporal document databases are now a viable solution in many contexts. However, storing an ever-growing database can still be too costly, and as a consequence it is desirable to be able to physically delete old versions of data. Traditionally, this has been performed by an operation called vacuuming, where the oldest versions are physically deleted or migrated from secondary storage to less costly tertiary storage. In temporal document databases on the other hand, it is often more appropriate to remove intermediate versions instead of removing the oldest versions. We call this operation granularity reduction. In this paper we describe the concept of granularity reduction, and present six strategies for selecting the document versions to eliminate. Three of the strategies have been implemented in the V2 temporal document database system, and in this context we discuss the cost of applying the strategies. 相似文献
14.
The major emphasis is on analytical techniques for predicting the performance of various collection fusion scenarios. Knowledge of analytical models of information retrieval system performance, both with single processors and with multiple processors, increases our understanding of the parameters (e.g., number of documents, ranking algorithms, stemming algorithms, stop word lists, etc.) affecting system behavior. While there is a growing literature on the implementation of distributed information retrieval systems and digital libraries, little research has focused on analytic models of performance. We analytically describe the performance for single and multiple processors, both when different processors have the same parameter values and when they have different values. The use of different ranking algorithms and parameter values at different sites is examined. 相似文献
15.
16.
Huaigu Cao Venu Govindaraju Anurag Bhardwaj 《International Journal on Document Analysis and Recognition》2011,14(2):145-157
With the ever-increasing growth of the World Wide Web, there is an urgent need for an efficient information retrieval system
that can search and retrieve handwritten documents when presented with user queries. However, unconstrained handwriting recognition
remains a challenging task with inadequate performance thus proving to be a major hurdle in providing robust search experience
in handwritten documents. In this paper, we describe our recent research with focus on information retrieval from noisy text
derived from imperfect handwriting recognizers. First, we describe a novel term frequency estimation technique incorporating
the word segmentation information inside the retrieval framework to improve the overall system performance. Second, we outline
a taxonomy of different techniques used for addressing the noisy text retrieval task. The first method uses a novel bootstrapping
mechanism to refine the OCR’ed text and uses the cleaned text for retrieval. The second method uses the uncorrected or raw
OCR’ed text but modifies the standard vector space model for handling noisy text issues. The third method employs robust image
features to index the documents instead of using noisy OCR’ed text. We describe these techniques in detail and also discuss
their performance measures using standard IR evaluation metrics. 相似文献
17.
Text compression for dynamic document databases 总被引:2,自引:0,他引:2
Moffat A. Zobel J. Sharman N. 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(2):302-313
For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained on dynamic collections. The authors show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a word-based model and over 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text 相似文献
18.
Ranking functions are an important component of information retrieval systems. Recently there has been a surge of research in the field of “learning to rank”, which aims at using labeled training data and machine learning algorithms to construct reliable ranking functions. Machine learning methods such as neural networks, support vector machines, and least squares have been successfully applied to ranking problems, and some are already being deployed in commercial search engines.Despite these successes, most algorithms to date construct ranking functions in a supervised learning setting, which assume that relevance labels are provided by human annotators prior to training the ranking function. Such methods may perform poorly when human relevance judgments are not available for a wide range of queries. In this paper, we examine whether additional unlabeled data, which is easy to obtain, can be used to improve supervised algorithms. In particular, we investigate the transductive setting, where the unlabeled data is equivalent to the test data.We propose a simple yet flexible transductive meta-algorithm: the key idea is to adapt the training procedure to each test list after observing the documents that need to be ranked. We investigate two instantiations of this general framework: The Feature Generation approach is based on discovering more salient features from the unlabeled test data and training a ranker on this test-dependent feature-set. The importance weighting approach is based on ideas in the domain adaptation literature, and works by re-weighting the training data to match the statistics of each test list. We demonstrate that both approaches improve over supervised algorithms on the TREC and OHSUMED tasks from the LETOR dataset. 相似文献
19.
This paper presents the main current theoretical issues in Information Retrieval. The principles of conceptual modelling, as they have emerged in the database area, are presented and their application to document modelling in order to enhance document retrieval is discussed. Finally, the main features of the MULTOS project are presented and critically reviewed confronting them with the requirements which have been identified during the general discussion on document conceptual modelling for information retrieval. 相似文献
20.
String techniques for detecting duplicates in document databases 总被引:2,自引:0,他引:2
Detecting duplicates in document image databases is a problem of growing importance. The task is made difficult by the various degradations suffered by printed documents, and by conflicting notions of what it means to be a "duplicate". To address these issues, this paper introduces a framework for clarifying and formalizing the duplicate detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution adapted from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data derived from real-world noise sources. Also described are several heuristics that have the potential to speed up the computation by several orders of magnitude. 相似文献