首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Content-based image retrieval by hierarchical linear subspace method   总被引:1,自引:0,他引:1  
We describe a hierarchical linear subspace method to query large on-line image databases using image similarity as the basis of the queries. The method is based on the generic multimedia indexing (GEMINI) approach which is used in the IBM query through the image content search system. Our approach is demonstrated on image indexing, in which the subspaces correspond to different resolutions of the images. During content-based image retrieval, the search starts in the subspace with the lowest resolution of the images. In this subspace, the set of all possible similar images is determined. In the next subspace, additional metric information corresponding to a higher resolution is used to reduce this set. This procedure is repeated until the similar images can be determined. For evaluation we used three image databases and two different subspace sequences.  相似文献   

2.
Metric indexing is the state of the art in general distance-based retrieval. Relying on the triangular inequality, metric indexes achieve significant online speed-up beyond a linear scan. Recently, the idea of Ptolemaic indexing was introduced, which substitutes Ptolemy's inequality for the triangular one, potentially yielding higher efficiency for the distances where it applies. In this paper we have adapted several metric indexes to support Ptolemaic indexing, thus establishing a class of Ptolemaic access methods (PtoAM). In particular, we include Ptolemaic Pivot tables, Ptolemaic PM-Trees and the Ptolemaic M-Index. We also show that the most important and promising family of distances suitable for Ptolemaic indexing is the signature quadratic form distance, an adaptive similarity measure which can cope with flexible content representations of multimedia data, among other things. While this distance has shown remarkable qualities regarding the search effectiveness, its high computational complexity underscores the need for efficient search methods. We show that these distances are Ptolemaic metrics and present a study where we apply Ptolemaic indexing methods on real-world image databases, resolving exact queries nearly four times as fast as the state-of-the-art metric solution, and up to three orders of magnitude times as fast as sequential scan.  相似文献   

3.
Advances in geographical tracking, multimedia processing, information extraction, and sensor networks have created a deluge of probabilistic data. While similarity search is an important tool to support the manipulation of probabilistic data, it raises new challenges to traditional relational databases. The problem stems from the limited effectiveness of the distance metrics employed by existing database systems. On the other hand, several more complicated distance operators have proven their values for better distinguishing ability in specific probabilistic domains. In this paper, we discuss the similarity search problem with respect to Earth Mover’s Distance (EMD). EMD is the most successful distance metric for probability distribution comparison but is an expensive operator as it has cubic time complexity. We present a new database indexing approach to answer EMD-based similarity queries, including range queries and k-nearest neighbor queries on probabilistic data. Our solution utilizes primal-dual theory from linear programming and employs a group of B + trees for effective candidate pruning. We also apply our filtering technique to the processing of continuous similarity queries, especially with applications to frame copy detection in real-time videos. Extensive experiments show that our proposals dramatically improve the usefulness and scalability of probabilistic data management.  相似文献   

4.
利用高维数据空间合理划分,提出一种简单有效的KNN检索算法-LBD。通过聚类将数据划分成多个子集空间,对每个聚类子集内的高维向量,利用距离和位码定义简化表示形式。KNN搜索时,首先利用距离信息确定候选范围,然后利用某些维上的位码不相同信息进一步缩小搜索范围,提高剪枝效率。位码字符串比较时,按照维度贡献优先顺序,大大加快非候选点过滤。LBD利用特殊的B+树组织,降低I/O和距离计算代价。采用模拟数据和真实数据,实验验证了LBD具有更高的检索效率。  相似文献   

5.
The Signature Quadratic Form Distance on feature signatures represents a flexible distance-based similarity model for effective content-based multimedia retrieval. Although metric indexing approaches are able to speed up query processing by two orders of magnitude, their applicability to large-scale multimedia databases containing billions of images is still a challenging issue. In this paper, we propose a parallel approach that balances the utilization of CPU and many-core GPUs for efficient similarity search with the Signature Quadratic Form Distance. In particular, we show how to process multiple distance computations and other parts of the search procedure in parallel, achieving maximal performance of the combined CPU/GPU system. The experimental evaluation demonstrates that our approach implemented on a common workstation with 2?GPU cards outperforms traditional parallel implementation on a high-end 48-core NUMA server in terms of efficiency almost by an order of magnitude. If we consider also the price of the high-end server that is ten times higher than that of the GPU workstation then, based on price/performance ratio, the GPU-based similarity search beats the CPU-based solution by almost two orders of magnitude. Although proposed for the SQFD, our approach of fast GPU-based similarity search is applicable for any distance function that is efficiently parallelizable in the SIMT execution model.  相似文献   

6.
We introduce a method that enables scalable similarity search for learned metrics. Given pairwise similarity and dissimilarity constraints between some examples, we learn a Mahalanobis distance function that captures the examples' underlying relationships well. To allow sublinear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality makes it infeasible to learn an explicit transformation over the feature dimensions. We demonstrate the approach applied to a variety of image data sets, as well as a systems data set. The learned metrics improve accuracy relative to commonly used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.  相似文献   

7.
Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest-neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries. We introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach  相似文献   

8.
As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitudeand latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content.In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.  相似文献   

9.
This paper evaluates the use of standard database indexes and query processing as a way to do metric indexing in the LAESA approach. By utilizing B-trees and R-trees as pivot-based indexes, we may use well-known optimization techniques from the database field within metric indexing and search. The novelty of this paper is that we use a cost-based approach to dynamically evaluate which and how many pivots to use in the evaluation of each query. By a series of measurements using our database prototype we are able to evaluate the performance of this approach. Compared to using all available pivots for filtering, the optimized approach gives half the response times for main memory data, but much more varied results for disk resident data. However, by use of the cost model we are able to dynamically determine when to bypass the indexes and simply perform a sequential scan of the base data. The conclusion of this evaluation is that it is beneficial to create many pivots, but to use only the most selective ones during evaluation of each query. R-trees give better performance than B-trees when utilizing all pivots, but when being able to dynamically select the best pivots, B-trees often provide better performance.  相似文献   

10.
The content-based cross-media retrieval is a new type of multimedia retrieval in which the media types of query examples and the returned results can be different. In order to learn the semantic correlations among multimedia objects of different modalities, the heterogeneous multimedia objects are analyzed in the form of multimedia document (MMD), which is a set of multimedia objects that are of different media types but carry the same semantics. We first construct an MMD semi-semantic graph (MMDSSG) by jointly analyzing the heterogeneous multimedia data. After that, cross-media indexing space (CMIS) is constructed. For each query, the optimal dimension of CMIS is automatically determined and the cross-media retrieval is performed on a per-query basis. By doing this, the most appropriate retrieval approach for each query is selected, i.e. different search methods are used for different queries. The query dependent search methods make cross-media retrieval performance not only accurate but also stable. We also propose different learning methods of relevance feedback (RF) to improve the performance. Experiment is encouraging and validates the proposed methods.  相似文献   

11.
Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%.  相似文献   

12.
Traditional nearest-neighbor (NN) search is based on two basic indexing approaches: object-based indexing and solution-based indexing. The former is constructed based on the locations of data objects: using some distance heuristics on object locations. The latter is built on a precomputed solution space. Thus, NN queries can be reduced to and processed as simple point queries in this solution space. Both approaches exhibit some disadvantages, especially when employed for wireless data broadcast in mobile computing environments. In this paper, we introduce a new index method, called the grid-partition index, to support NN search in both on-demand access and periodic broadcast modes of mobile computing. The grid-partition index is constructed based on the Voronoi diagram, i.e., the solution space of NN queries. However, it has two distinctive characteristics. First, it divides the solution space into grid cells such that a query point can be efficiently mapped into a grid cell around which the nearest object is located. This significantly reduces the search space. Second, the grid-partition index stores the objects that are potential NNs of any query falling within the cell. The storage of objects, instead of the Voronoi cells, makes the grid-partition index a hybrid of the solution-based and object-based approaches. As a result, it achieves a much more compact representation than the pure solution-based approach and avoids backtracked traversals required in the typical object-based approach, thus realizing the advantages of both approaches. We develop an incremental construction algorithm to address the issue of object update. In addition, we present a cost model to approximate the search cost of different grid partitioning schemes. The performances of the grid-partition index and existing indexes are evaluated using both synthetic and real data. The results show that, overall, the grid-partition index significantly outperforms object-based indexes and solution-based indexes. Furthermore, we extend the grid-partition index to support continuous-nearest-neighbor search. Both algorithms and experimental results are presented. Edited by R. Guting  相似文献   

13.
An all-k-nearest-neighbor (AkNN) query finds k nearest neighbors for each query object. This problem arises naturally in many areas, such as GIS (geographic information system), multimedia retrieval, and recommender systems. To support various data types and flexible distance metrics involved in real applications, we study AkNN retrieval in metric spaces, namely, metric AkNN (MAkNN) search. Consider that the underlying indexes on the query set and the object set may not exist, which is natural in many scenarios. For example, the query set and the object set could be the results of other queries, and thus, the underlying indexes cannot be built in advance. To support MAkNN search on datasets without any underlying index, we propose an efficient disk-based algorithm, termed as Partition-Based MAkNN Algorithm (PMA), which follows a partition-search framework and employs a series of pruning rules for accelerating the search. In addition, we extend our techniques to tackle an interesting variant of MAkNN queries, i.e., metric self-AkNN (MSAkNN) search, where the query set is identical to the object set. Extensive experiments using both real and synthetic datasets demonstrate the effectiveness of our pruning rules and the efficiency of the proposed algorithms, compared with state-of-the-art MAkNN and MSAkNN algorithms.  相似文献   

14.
Adaptable similarity queries based on quadratic form distance functions are widely popular in data mining application domains including multimedia, CAD, molecular biology or medical image databases. Recently it has been recognized that quantization of feature vectors can substantially improve query processing for Euclidean distance functions, as demonstrated by the scan-based VA-file and the index structure IQ-tree. In this paper, we address the problem that determining quadratic form distances between quantized vectors is difficult and computationally expensive. Our solution provides a variety of new approximation techniques for quantized vectors which are combined by an extended multistep query processing architecture. In our analysis section, we show that the filter steps complement each other. Consequently, it is useful to apply our filters in combination. We show the superiority of our approach over other architectures and over competitive query processing methods. In our experimental evaluation, the sequential scan is outperformed by a factor of 2.3. Compared to the X-tree on 64 dimensional color histogram data, we measured an improvement factor of 5.7.  相似文献   

15.
子序列匹配是时间序列挖掘的经典课题,旨在发现大型数据集中的相似数据序列.很多文献关注固定时间段的序列的查询.但对于多种不同时间段的查询的问题仍然未解决好.基于时间段的查询含义是有时间窗口限制的查询.为了满足多时间段上的查询,简单地为每个时间段的子序列构建索引既耗时又耗存储空间.从目前的文献来看,已有的索引无法满足具有不...  相似文献   

16.
17.
Skyline index for time series data   总被引:4,自引:0,他引:4  
We have developed a new indexing strategy that helps overcome the curse of dimensionality for time series data. Our proposed approach, called skyline index, adopts new skyline bounding regions (SBR) to approximate and represent a group of time series data according to their collective shape. Skyline bounding regions allow us to define a distance function that tightly lower bounds the distance between a query and a group of time series data. In an extensive performance study, we investigate the impact of different distance functions by various dimensionality reduction and indexing techniques on the performance of similarity search, including index pages accessed, data objects fetched, and overall query processing time. In addition, we show that, for k-nearest neighbor queries, the proposed skyline index approach can be coupled with the state of the art dimensionality reduction techniques such as adaptive piecewise constant approximation (APCA) and improve its performance by up to a factor of 3.  相似文献   

18.
Metric search is concerned with the efficient evaluation of queries in metric spaces. In general, a large space of objects is arranged in such a way that, when a further object is presented as a query, those objects most similar to the query can be efficiently found. Most mechanisms rely upon the triangle inequality property of the metric governing the space. The triangle inequality property is equivalent to a finite embedding property, which states that any three points of the space can be isometrically embedded in two-dimensional Euclidean space. In this paper, we examine a class of semimetric space which is finitely four-embeddable in three-dimensional Euclidean space. In mathematics this property has been extensively studied and is generally known as the four-point property. All spaces with the four-point property are metric spaces, but they also have some stronger geometric guarantees. We coin the term supermetric1space as, in terms of metric search, they are significantly more tractable. Supermetric spaces include all those governed by Euclidean, Cosine,2 Jensen–Shannon and Triangular distances, and are thus commonly used within many domains. In previous work we have given a generic mathematical basis for the supermetric property and shown how it can improve indexing performance for a given exact search structure. Here we present a full investigation into its use within a variety of different hyperplane partition indexing structures, and go on to show some more of its flexibility by examining a search structure whose partition and exclusion conditions are tailored, at each node, to suit the individual reference points and data set present there. Among the results given, we show a new best performance for exact search using a well-known benchmark.  相似文献   

19.
Nowadays, the road network has gained more and more attention in the research area of databases. Existing works mainly focus on standalone queries, such as k-nearest neighbor queries over a single type of objects (e.g., facility like restaurant or hotel). In this paper, we propose a k-multi-preference (kMP) query over road networks, involving complex query predicates and multiple facilities. In particular, given a query graph, a kMP query retrieves of the top-k groups of vertices (of k facility types) satisfying the label constraints and their aggregate distances are the smallest. A naïve solution to this problem is to enumerate all combinations of vertices with k possible facility types and then select the one with the minimum sum distance. This method, however, incurs rather high computation cost due to exponential possible combinations. In addition, the existing solutions to other standalone queries are for a single type of facilities and cannot be directly used to answer kMP queries. Therefore, in this paper, we propose an efficient approach to process a kMP query, which utilizes an index with bounded space and reduces the computation cost of the shortest path queries. We also design effective pruning techniques to filter out false alarms. Through our extensive experiments, we demonstrate the efficiency and effectiveness of our proposed solutions.  相似文献   

20.
A real-time matching system for large fingerprint databases   总被引:11,自引:0,他引:11  
With the current rapid growth in multimedia technology, there is an imminent need for efficient techniques to search and query large image databases. Because of their unique and peculiar needs, image databases cannot be treated in a similar fashion to other types of digital libraries. The contextual dependencies present in images, and the complex nature of two-dimensional image data make the representation issues more difficult for image databases. An invariant representation of an image is still an open research issue. For these reasons, it is difficult to find a universal content-based retrieval technique. Current approaches based on shape, texture, and color for indexing image databases have met with limited success. Further, these techniques have not been adequately tested in the presence of noise and distortions. A given application domain offers stronger constraints for improving the retrieval performance. Fingerprint databases are characterized by their large size as well as noisy and distorted query images. Distortions are very common in fingerprint images due to elasticity of the skin. In this paper, a method of indexing large fingerprint image databases is presented. The approach integrates a number of domain-specific high-level features such as pattern class and ridge density at higher levels of the search. At the lowest level, it incorporates elastic structural feature-based matching for indexing the database. With a multilevel indexing approach, we have been able to reduce the search space. The search engine has also been implemented on Splash 2-a field programmable gate array (FPGA)-based array processor to obtain near-ASIC level speed of matching. Our approach has been tested on a locally collected test data and on NIST-9, a large fingerprint database available in the public domain  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号