首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Nearest neighbor (NN) search is emerging as an important search paradigm in a variety of applications in which objects are represented as vectors of d numeric features. However, despite decades of efforts, except for the filtering approach such as the VA-file, the current solutions to find exact kNNs are far from satisfactory for large d. The filtering approach represents vectors as compact approximations and by first scanning these smaller approximations, only a small fraction of the real vectors are visited. In this paper, we introduce the local polar coordinate file (LPC-file) using the filtering approach for nearest-neighbor searches in high-dimensional image databases. The basic idea is to partition the vector space into rectangular cells and then to approximate vectors by polar coordinates on the partitioned local cells. The LPC information significantly enhances the discriminatory power of the approximation. To demonstrate the effectiveness of the LPC-file, we conducted extensive experiments and compared the performance with the VA-file and the sequential scan by using synthetic and real data sets. The experimental results demonstrate that the LPC-file outperforms both of the VA-file and the sequential scan in total elapsed time and in the number of disk accesses and that the LPC-file is robust in both "good" distributions (such as random) and "bad" distributions (such as skewed and clustered)  相似文献   

2.
In this paper, a new approach called ‘instance variant nearest neighbor’ approximates a regression surface of a function using the concept of k nearest neighbor. Instead of fixed k neighbors for the entire dataset, our assumption is that there are optimal k neighbors for each data instance that best approximates the original function by fitting the local regions. This approach can be beneficial to noisy datasets where local regions form data characteristics that are different from the major data clusters. We formulate the problem of finding such k neighbors for each data instance as a combinatorial optimization problem, which is solved by a particle swarm optimization. The particle swarm optimization is extended with a rounding scheme that rounds up or down continuous-valued candidate solutions to integers, a number of k neighbors. We apply our new approach to five real-world regression datasets and compare its prediction performance with other function approximation algorithms, including the standard k nearest neighbor, multi-layer perceptron, and support vector regression. We observed that the instance variant nearest neighbor outperforms these algorithms in several datasets. In addition, our new approach provides consistent outputs with five datasets where other algorithms perform poorly.  相似文献   

3.
The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.  相似文献   

4.
本文针对大规模高维数据近邻检索中的瓶颈问题,提出基于向量量化的一种检索方法—簇内乘积量化树方法.该方法运用向量量化和乘积量化的多层树状结构高效表征大规模高维数据集,与现有方法相比降低了索引表空桶率;其次提出基于贪心队列的近邻簇筛选方法减小了计算复杂度,加快了近邻检索速度;最后提出面量化方法用于近似计算候选数据集向量与查询向量间的距离,与点量化和线量化方法相比量化误差更小,提高了近邻查询准确率.本文提出的簇内乘积量化树算法在算子Sift和Gist描述的大规模高维数据集上与乘积量化树技术相比,首次召回准确率提高了57.7%,索引表空桶率降低幅度在50%以上,与局部优化乘积量化技术相比,查全率高达97%,而查询时间却仅需原来的1/9.实验结果表明本文提出的基于簇内乘积量化的近邻方法提升了近邻检索性能,为大规模高维数据集近邻检索提供了理论支持.  相似文献   

5.
目的 海量图像检索技术是计算机视觉领域研究热点之一,一个基本的思路是对数据库中所有图像提取特征,然后定义特征相似性度量,进行近邻检索。海量图像检索技术,关键的是设计满足存储需求和效率的近邻检索算法。为了提高图像视觉特征的近似表示精度和降低图像视觉特征的存储空间需求,提出了一种多索引加法量化方法。方法 由于线性搜索算法复杂度高,而且为了满足检索的实时性,需把图像描述符存储在内存中,不能满足大规模检索系统的需求。基于非线性检索的优越性,本文对非穷尽搜索的多索引结构和量化编码进行了探索新研究。利用多索引结构将原始数据空间划分成多个子空间,把每个子空间数据项分配到不同的倒排列表中,然后使用压缩编码的加法量化方法编码倒排列表中的残差数据项,进一步减少对原始空间的量化损失。在近邻检索时采用非穷尽搜索的策略,只在少数倒排列表中检索近邻项,可以大大减少检索时间成本,而且检索过程中不用存储原始数据,只需存储数据集中每个数据项在加法量化码书中的码字索引,大大减少内存消耗。结果 为了验证算法的有效性,在3个数据集SIFT、GIST、MNIST上进行测试,召回率相比近几年算法提升4%~15%,平均查准率提高12%左右,检索时间与最快的算法持平。结论 本文提出的多索引加法量化编码算法,有效改善了图像视觉特征的近似表示精度和存储空间需求,并提升了在大规模数据集的检索准确率和召回率。本文算法主要针对特征进行近邻检索,适用于海量图像以及其他多媒体数据的近邻检索。  相似文献   

6.
Currently,the cloud computing systems use simple key-value data processing,which cannot support similarity search efectively due to lack of efcient index structures,and with the increase of dimensionality,the existing tree-like index structures could lead to the problem of"the curse of dimensionality".In this paper,a novel VF-CAN indexing scheme is proposed.VF-CAN integrates content addressable network(CAN)based routing protocol and the improved vector approximation fle(VA-fle) index.There are two index levels in this scheme:global index and local index.The local index VAK-fle is built for the data in each storage node.VAK-fle is thek-means clustering result of VA-fle approximation vectors according to their degree of proximity.Each cluster forms a separate local index fle and each fle stores the approximate vectors that are contained in the cluster.The vector of each cluster center is stored in the cluster center information fle of corresponding storage node.In the global index,storage nodes are organized into an overlay network CAN,and in order to reduce the cost of calculation,only clustering information of local index is issued to the entire overlay network through the CAN interface.The experimental results show that VF-CAN reduces the index storage space and improves query performance efectively.  相似文献   

7.
8.
We propose a new dynamic index structure called the GC-tree (or the grid cell tree) for efficient similarity search in image databases. The GC-tree is based on a special subspace partitioning strategy which is optimized for a clustered high-dimensional image dataset. The basic ideas are threefold: 1) we adaptively partition the data space based on a density function that identifies dense and sparse regions in a data space; 2) we concentrate the partition on the dense regions, and the objects in the sparse regions of a certain partition level are treated as if they lie within a single region; and 3) we dynamically construct an index structure that corresponds to the space partition hierarchy. The resultant index structure adapts well to the strongly clustered distribution of high-dimensional image datasets. To demonstrate the practical effectiveness of the GC-tree, we experimentally compared the GC-tree with the IQ-tree, LPC-file, VA-file, and linear scan. The result of our experiments shows that the GC-tree outperforms all other methods.  相似文献   

9.
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor search which corresponds to a computation of the Voronoi cell of each data point. In a second step, we store conservative approximations of the Voronoi cells in an index structure efficient for high-dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e., it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in other index structures such as the X-tree  相似文献   

10.
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the ordered VA-file (OVA-file) based on the VA-file. OVA-file is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k nearest neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-file, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named ordered VA-LOW (OVA-LOW) based on the proposed OVA-file. OVA-LOW first chooses possible OVA-slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-slices to work out approximate kNN. The number of possible OVA-slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and (distance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance  相似文献   

11.
High-dimensional index structures are a means to accelerate database query processing in high-dimensional data, like multimedia feature vectors. A particular interest in many application scenarios is to rank data items with respect to a certain distance function and, thus, identifying the nearest neighbor(s) of a query item.

In this paper, we propose a novel ranking algorithm that (1) operates on arbitrary high-dimensional filter indexes, like the VA-file, the VA+-file, the LPC-file, or the AV-method. Our ranking algorithm (2) exhibits a nearly balanced I/O load to retrieve subsequent items. Finally, it (3) strictly obeys a predefined main memory threshold and even (4) terminates successfully when memory restrictions are very tight.  相似文献   


12.
Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters which are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MinDist) to prune the search process by avoiding the processing of some of the clusters as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MaxNearestDist) is described. The MaxNearestDist upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (i.e., k=1) as in its previous uses. Both the depth-first and best-first k-nearest neighbor algorithms are modified to use MaxNearestDist, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depth-first algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements.  相似文献   

13.
Imbalanced data sets are a common occurrence in important machine learning problems. Research in improving learning under imbalanced conditions has largely focused on classification problems (ie, problems with a categorical dependent variable). However, imbalanced data also occur in function approximation, and far less attention has been paid to this case. We present a novel stratification approach for imbalanced function approximation problems. Our solution extends the SMOTE oversampling preprocessing technique to continuous-valued dependent variables by identifying regions of the feature space with a low density of examples and high variance in the dependent variable. Synthetic examples are then generated between nearest neighbors in these regions. In an empirical validation, our approach reduces the normalized mean-squared prediction error in 18 out of 21 benchmark data sets, and compares favorably with state-of-the-art approaches.  相似文献   

14.
A technique for creating and searching a tree of patterns using relative distances is presented. The search is conducted to find patterns which are nearest neighbors of a given test pattern. The structure of the tree is such that the search time is proportional to the distance between the test pattern and its nearest neighbor, which suggests the anomalous possibility that a larger tree, which can be expected on average to contain closer neighbors, can be searched faster than a smaller tree. The technique has been used to recognize OCR digit samples derived from NIST data at an accuracy rate of 97% using a tree of 7,000 patterns  相似文献   

15.
人们设计了许多索引以有效地处理高维空间中的近邻查询和区域查询。已经证明,维数较高时利用高维索引处理这两类查询几乎不可能比线性扫描快。提出了一种两层索引以自适应地识别数据集中的聚簇;数据集具有聚簇特性时,用该索引处理邻近查询和区域查询比现有的索引结构快;对其他数据集,利用该索引处理邻近查询和区域查询与线性扫描大致相当。该索引的上层结构将一些参考点组织成一棵二叉树,下层结构是一系列动态哈希表。数据集中的数据点根据它们到参考点的相对距离被哈希到相应的哈希桶中。查询处理时用查询点到参考点的距离进行剪除搜索。实验表明,提出的索引结构具有良好的性能。  相似文献   

16.
Many data centers have archived a tremendous amount of data and begun to publish them on the Web. Due to limited resources and large amount of service requests, data centers usually do not directly support high-cost queries. On the other hand, users are often overwhelmed by the huge data volume and cannot afford to download the whole data sets and search them locally. To support high-dimensional nearest neighbor searches in this environment, the paper develops a multi-level approximation scheme. The coarsest-level approximations are stored locally and searched first. The result is then refined gradually via accesses to remote data centers. Data centers need only to deliver data items or their precomputed finer level approximations by their identifiers. The searching process is usually long in this environment, since it involves remote sites. This paper describes an online search process: the system periodically reports a data item and a positive integer M. The reported item is guaranteed to be one of the M nearest neighbors of the query one. The paper proposes two algorithms to minimize M in each period. Experiments show that one of them performs similarly as a theoretical a posteriori algorithm and significantly outperforms the online extensions of two state-of-the-art nearest neighbor search methods. Received 25 July 2000 / Revised 25 July 2001 / Accepted in revised form 16 October 2001 Correspondence and offprint requests to: Xiaoyang Sean Wang, Department of Information and Software Engineering, George Mason University, Fairfax, VA 22030, USA. Email: xywang@gmu.eduau  相似文献   

17.
18.
黄维辉  熊翱 《软件》2013,(11):77-79
多维数据的处理已经成为影响很多领域发展的关键因素,特别是多维数据的相似性查询已经被用在很多领域中。当数据维度很大的时候,大多数索引结构处理的性能下降,这现象被称为“维度灾难”。针对多维度灾难,RAKDB-Tree是本文提出的一种高效处理多维数据的索引结构。该索引结构首先把数据空间划分为子空间,然后使用改进的KDB—Tree对子空间建立索引。RAKDB—Tree的查询、插入、删除等算法使得,索引结构一直保持较优状态。实验结果表明,RAKDB.Tree能够很好解决因为数据维度增加而带来的各种问题。  相似文献   

19.
Many standard image processing operations can be implemented using quadtrees as a simple tree traversal where, at each terminal node, a computation is performed involving some of that node's neighbors. Most of this work has involved the use of bottom-up neighbor-finding techniques which search for a nearest common ancestor. Recently, top-down techniques have been proposed which make use of a neighbor vector as the tree is traversed. A simplified version of the top-down method for a quadtree in the context of a general-purpose tree traversal algorithm is presented. It differs, in part, from prior work in its ability to compute diagonally adjacent neighbors rather than just horizontally and vertically adjacent neighbors. It builds a neighbor vector for each node using a minimal amount of information. Analysis of the algorithm shows that its execution time is directly proportional to the number of nodes in the tree. However, it does require some extra storage. Use of the algorithm leads to lower execution time bounds for some common quadtree image processing operations such as connected component labeling.  相似文献   

20.
移动对象的动态反向最近邻查询技术   总被引:4,自引:2,他引:4       下载免费PDF全文
李松  郝忠孝 《计算机工程》2008,34(10):40-42
为了处理移动对象的动态反向最近邻,对时空动态反向最近邻查询问题进行形式化的定义,利用时空距离函数及限界区域等概念给出计算移动对象的动态反向最近邻的定理与算法,提出移动查询点的动态最近邻的全域查询及局域查询的方法,利用动态检测圆及时空距离函数进行动态反向最近邻的查询判断,其计算量可减少40%~60%。构建新的时空索引结构——TPRDNN树,给出操作TPRDNN树的查询算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号