期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiresolution similarity search in image databases

Martin?Heczko Email author Alexander?Hinneburg Daniel?Keim Markus?Wawryniuk 《Multimedia Systems》2004,10(1):28-40

Typically searching image collections is based on features of the images. In most cases the features are based on the color histogram of the images. Similarity search based on color histograms is very efficient, but the quality of the search results is often rather poor. One of the reasons is that histogram-based systems only support a specific form of global similarity using the whole histogram as one vector. But there is more information in a histogram than the distribution of colors. This paper has two contributions: (1) a new generalized similarity search method based on a wavelet transformation of the color histograms and (2) a new effectiveness measure for image similarity search. Our generalized similarity search method has been developed to allow the user to search for images with similarities on arbitrary detail levels of the color histogram. We show that our new approach is more general and more effective than previous approaches while retaining a competitive performance. 相似文献

2.

Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space 总被引：3，自引：3，他引：0

Ming Zhang Reda Alhajj 《Knowledge and Information Systems》2010,22(1):1-26

Similarity search (e.g., k-nearest neighbor search) in high-dimensional metric space is the key operation in many applications, such as multimedia databases, image retrieval and object recognition, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are built by partitioning the data set based on distances to certain reference point(s). Using the index, search is confined to a small number of partitions. However, these methods either ignore the property of the data distribution (e.g., VP-tree and its variants) or produce non-disjoint partitions (e.g., M-tree and its variants, DBM-tree); these greatly affect the search efficiency. In this paper, we study the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency. 相似文献

3.

A multistep approach for shape similarity search in image databases

Ankerst M. Kriegel H.-P. Seidl T. 《Knowledge and Data Engineering, IEEE Transactions on》1998,10(6):996-1004

Shape similarity searching is a crucial task in image databases, particularly in the presence of errors induced by segmentation or scanning images. The resulting slight displacements or rotations have not been considered so far in the literature. We present a new similarity model that flexibly addresses this problem. By specifying neighborhood influence weights, the user may adapt the similarity distance functions to his or her own requirements or preferences. Technically, the new similarity model is based on quadratic forms for which we present a multi-step query processing architecture, particularly for high dimensions as they occur in image databases. Our algorithm to reduce the dimensionality of quadratic form-based similarity queries results in a lower-bounding distance function that is proven to provide an optimal filter selectivity. Experiments on our test database of 10,000 images demonstrate the applicability and the performance of our approach, even in dimensions as high as 1,024 相似文献

4.

CM-tree: A dynamic clustered index for similarity search in metric databases

《Data & Knowledge Engineering》2008,64(3):919-946

Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献

5.

CM-tree: A dynamic clustered index for similarity search in metric databases

Lior Israel 《Data & Knowledge Engineering》2007,63(3):919-946

Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献

6.

Indexing high-dimensional data for main-memory similarity search 总被引：1，自引：0，他引：1

Xiaohui Yu Junfeng Dong 《Information Systems》2010

As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR⁺-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR⁺-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR⁺-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR⁺-tree consistently outperforms other index structures. 相似文献

7.

WALRUS: a similarity retrieval algorithm for image databases 总被引：2，自引：0，他引：2

Natsev A. Rajeev Rastogi Shim K. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(3):301-316

Approaches for content-based image querying typically extract a single signature from each image based on color, texture, or shape features. The images returned as the query result are then the ones whose signatures are closest to the signature of the query image. While efficient for simple images, such methods do not work well for complex scenes since they fail to retrieve images that match the query only partially, that is, only certain regions of the image match. This inefficiency leads to the discarding of images that may be semantically very similar to the query image since they may contain the same objects. The problem becomes even more apparent when we consider scaled or translated versions of the similar objects. We propose WALRUS (wavelet-based retrieval of user-specified scenes), a novel similarity retrieval algorithm that is robust to scaling and translation of objects within an image. WALRUS employs a novel similarity model in which each image is first decomposed into its regions and the similarity measure between a pair of images is then defined to be the fraction of the area of the two images covered by matching regions from the images. In order to extract regions for an image, WALRUS considers sliding windows of varying sizes and then clusters them based on the proximity of their signatures. An efficient dynamic programming algorithm is used to compute wavelet-based signatures for the sliding windows. Experimental results on real-life data sets corroborate the effectiveness of WALRUS'S similarity model. 相似文献

8.

Indexing high-dimensional data for efficient in-memory similarity search 总被引：3，自引：0，他引：3

Bin Cui Beng Chin Coi Jianwen Su Tan K.-L. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(3):339-353

In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., >32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. We present a novel index structure, called /spl Delta/-tree, to speed up the high-dimensional query in main memory environment. The /spl Delta/-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using principal component analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called /spl Delta//sup +/-tree, that globally clusters the data space and then partitions clusters into small regions. The /spl Delta//sup +/-tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the /spl Delta//sup +/-tree is superior in most cases. 相似文献

9.

The TV-tree: An index structure for high-dimensional data 总被引：20，自引：0，他引：20

King-Ip Lin H. V. Jagadish Ph.D. Christos Faloutsos Ph.D. 《The VLDB Journal The International Journal on Very Large Data Bases》1994,3(4):517-542

We propose a file structure to index high-dimensionality data, which are typically points in some feature space. The idea is to use only a few of the features, using additional features only when the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such varying length feature vectors. Finally, we report simulation results, comparing the proposed structure with theR ^*-tree, which is one of the most successful methods for low-dimensionality spaces.The results illustrate the superiority of our method, which saves up to 80% in disk accesses. 相似文献

10.

Virtual images for similarity retrieval in image databases 总被引：1，自引：0，他引：1

Petraglia G. Sebillo M. Tucci M. Tortora G. 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(6):951-967

We introduce the virtual image, an iconic index suited for pictorial information access in a pictorial database, and a similarity retrieval approach based on virtual images to perform content-based retrieval. A virtual image represents the spatial information contained in a real image in explicit form by means of a set of spatial relations. This is useful to efficiently compute the similarity between a query and an image in the database. We also show that virtual images support real-world applications that require translation, reflection, and/or rotation invariance of image representation 相似文献

11.

EmbAssi: embedding assignment costs for similarity search in large graph databases

Bause Franka Schubert Erich Kriege Nils M. 《Data mining and knowledge discovery》2022,36(5):1728-1755

Data Mining and Knowledge Discovery - The graph edit distance is an intuitive measure to quantify the dissimilarity of graphs, but its computation is $$mathsf {NP}$$ -hard and challenging in... 相似文献

12.

Building a web-scale image similarity search system

Michal Batko Fabrizio Falchi Claudio Lucchese David Novak Raffaele Perego Fausto Rabitti Jan Sedmidubsky Pavel Zezula 《Multimedia Tools and Applications》2010,47(3):599-629

相似文献

13.

CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces 总被引：1，自引：0，他引：1

Castelli V. Thomasian A. Chung-Sheng Li 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(3):671-685

Nearest-neighbor search of high-dimensionality spaces is critical for many applications, such as content-based retrieval from multimedia databases, similarity search of patterns in data mining, and nearest-neighbor classification. Unfortunately, even with the aid of the commonly used indexing schemes, the performance of nearest-neighbor (NN) queries deteriorates rapidly with the number of dimensions. We propose a method, called Clustering with Singular Value Decomposition (CSVD), which supports efficient approximate processing of NN queries, while maintaining good precision-recall characteristics. CSVD groups homogeneous points into clusters and separately reduces the dimensionality of each cluster using SVD. Cluster selection for NN queries relies on a branch-and-bound algorithm and within-cluster searches can be performed with traditional or in-memory indexing methods. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than plain SVD for the same normalized mean squared error (NMSE), which translates into a higher efficiency in processing approximate NN queries. 相似文献

14.

Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications

Seungdo Jeong Sang-Wook Kim Byung-Uk Choi 《Multimedia Tools and Applications》2009,42(2):251-271

In multimedia information retrieval, multimedia data are represented as vectors in high-dimensional space. To search these vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high-dimensional space into vectors in low-dimensional space before the data are indexed. This paper proposes a novel method for dimensionality reduction based on a function that approximates the Euclidean distance based on the norm and angle components of a vector. First, we identify the causes of, and discuss basic solutions to, errors in angle approximation during the approximation of the Euclidean distance. Then, this paper propose a new method for dimensionality reduction that extracts a set of subvectors from a feature vector and maintains only the norm and the approximated angle for every subvector. The selection of a good reference vector is crucial for accurate approximation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector. Also, we define a novel distance function using the norm and angle components, and formally prove that the distance function consistently lower-bounds the Euclidean distance. This implies information retrieval with this function does not incur any false dismissals. Finally, the superiority of the proposed approach is verified via extensive experiments with synthetic and real-life data sets.

Byung-Uk ChoiEmail:

相似文献

15.

CVA file: an index structure for high-dimensional datasets

Jiyuan An Hanxiong Chen Kazutaka Furuse Nobuo Ohbo 《Knowledge and Information Systems》2005,7(3):337-357

Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data. 相似文献

16.

Graph similarity search on large uncertain graph databases 总被引：1，自引：0，他引：1

Ye Yuan Guoren Wang Lei Chen Haixun Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2015,24(2):271-296

相似文献

17.

MKL-tree: an index structure for high-dimensional vector spaces

Annalisa Franco Alessandra Lumini Dario Maio 《Multimedia Systems》2007,12(6):533-550

In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries. 相似文献

18.

Linking identical neighborly partitions for efficient high-dimensional similarity search in unstructured peer-to-peer systems

Bin Cui Linhao Xu Jiakui Zhao 《Distributed and Parallel Databases》2009,26(2-3):207-229

Peer-to-Peer (P2P) computing has recently attracted a great deal of research attention. In a P2P system, a large number of nodes can potentially be pooled together to share their resources, information, and services. However, existing unstructured P2P systems lack support for content-based search over data objects which are generally represented by high-dimensional feature vectors. In this paper, we propose an efficient and effective indexing mechanism to facilitate high-dimensional similarity query in unstructured P2P systems, named Linking Identical Neighborly Partitions (LINP), which combines both space partitioning technique and routing index technique. With the aid of LINP, each peer can not only process similarity query efficiently over its local data, but also can route the query to the promising peers which may contain the desired data. In the proposed scheme, each peer summarizes its local data using the space partitioning technique, and exchanges the summarized index with its neighboring peers to construct routing indices. Furthermore, to improve the system performance with peer updates, we propose an extension of the LINP, named LINP⁺, where each peer can reconfigure its neighboring peers to keep relevant peers nearby. The performance of our proposed scheme is evaluated over both synthetic and real-life high-dimensional datasets, and experimental results show the superiority of our proposed scheme. 相似文献

19.

An object-oriented fuzzy data model for similarity detection in image databases

Majumdar A.K. Bhattacharya I. Saha A.K. 《Knowledge and Data Engineering, IEEE Transactions on》2002,14(5):1186-1189

We introduce a fuzzy set theoretic approach for dealing with uncertainty in images in the context of spatial and topological relations existing among the objects in the image. We propose an object-oriented graph theoretic model for representing an image and this model allows us to assess the similarity between images using the concept of (fuzzy) graph matching. Sufficient flexibility has been provided in the similarity algorithm so that different features of an image may be independently focused upon. 相似文献

20.

CKDB-Tree:一种有效的高维动态索引结构 总被引：1，自引：0，他引：1

下载免费PDF全文

孙劲光王淑娥《计算机工程与应用》2009,45(30):157-160

在高维数据空间中提出了一种新的索引结构:CKDB-Tree(Compact KDB-Tree),该索引结构采用一种新的分裂策略,在进行分裂时,引入插入安全点和删除安全点的概念,不仅考虑到将来的数据,而且对已经进行索引的数据也进行考虑;给出了CK-DB-Tree的定义以及节点结构的特点,针对CKDB-Tree,给出了相应的插入、查找、删除操作的算法;对该索引结构的存储性能进行定量分析和推理;最后经实验证明,CKDB-Tree是高维空间中一种有效的动态索引结构。相似文献